This reference surfaces the full docstring content for the primary modules inside the project so that contributors can review the documentation without diving into the source. Every entry below mirrors the corresponding NumPy-style docstring: short summary, parameter descriptions, return values, and example snippets.
parquet_converter.cliparse_args(args: Optional[List[str]] = None) -> argparse.Namespacesys.argv when None).argparse.Namespace with input_path, output_dir, config, verbose, save_config, mode, and report_dir.namespace = parse_args(["input.csv", "--mode", "convert"])
assert namespace.input_path == "input.csv"
main(args: Optional[List[str]] = None) -> intsys.argv.main(["/path/to/data", "--mode", "analyze"])parquet_converter.converterconvert_file(input_path, output_dir, config) -> ConversionStatsload_config output).ConversionStats capturing rows processed, errors, and per-column stats.convert_directory(input_dir, output_dir, config) -> List[ConversionStats].csv, .txt) in a directory.convert_file, and aggregates statistics._convert_with_polars(input_path, output_dir, config)_convert_with_pandas(input_path, output_dir, config)_resolve_file_options(input_path, config)_build_polars_csv_kwargs(options)polars.scan_csv keyword arguments (separator, encoding, headers, etc.)._normalize_polars_encoding(value)utf-8) to Polars vocabulary (utf8, utf8-lossy)._analyze_sample_with_polars(input_path, options, sample_rows)_stream_polars_conversion(input_path, output_path, options, schema, compression, chunk_size)(success, total_rows, elapsed_seconds)._collect_polars_column_stats(output_path, column_limit)column_limit)._verify_conversion(output_path, source_path, verify_rows)parquet_converter.analyzerscan_parquet_files(input_dir, recursive=True) -> List[Path].parquet files inside a directory tree.get_file_size(file_path) -> strhumanize.naturalsize.get_file_modification_time(file_path) -> strmtime as YYYY-MM-DD HH:MM:SS.calculate_summary_stats(df)calculate_null_counts(df)get_unique_values_info(df)analyze_parquet_file(file_path)format_analysis_report(analyses, width=150)analyze_directory(input_dir, output_dir=None)parquet_analysis_report.txt.parquet_converter.configCSVOptions, TXTOptions, DateTimeFormats, Config — all Pydantic models with attributes for delimiters, encodings, headers, NA tokens, datetime patterns, logging, engine selection, sample sizes, chunk sizes, and analyzer directories.load_config(config_path=None) -> ConfigLOG_LEVEL, COMPRESSION_CODEC, CONVERTER_ENGINE, etc.), and returns a validated Config object.save_config(config, output_path)validate_config(config_dict)parquet_converter.loggingJSONEncoderpathlib.Path objects when building reports.setup_logging(level="INFO", log_file=None, verbose=False)parquet_converter loggers.format_stats_table(stats_list)tabulate to render per-file conversion outcomes (Success/Failed).save_conversion_report(stats_list, output_dir, config)conversion_report.json containing timestamp, configuration snapshot, summary counts, and per-file stats.log_conversion_summary(stats_list)parquet_converter.parserparse_file(input_path, config)ValueError on unsupported suffixes, and runs dtype inference unless disabled.parse_csv(input_path, options)pandas.read_csv honoring delimiter, encoding, header, column names, dtypes, NA tokens, and skip rows.parse_txt(input_path, options)parse_csv but defaults to \t delimiters and whitespace fallbacks.infer_dtypes(df, config)parquet_converter.statsConversionStatsinput_path, output_path, success, error_count, warning_count, rows, columns) plus helpers add_column_stats, to_dict, and from_dict.For any functions not explicitly listed above (e.g., additional helpers introduced later), please ensure the module docstrings follow the NumPy template and update this document with their summaries, parameters, returns, and doctest-style examples. This guarantees parity between inline documentation and the human-readable API guide.***