parquet-converter

Parquet Converter API Reference

This reference surfaces the full docstring content for the primary modules inside the project so that contributors can review the documentation without diving into the source. Every entry below mirrors the corresponding NumPy-style docstring: short summary, parameter descriptions, return values, and example snippets.


Module: parquet_converter.cli

parse_args(args: Optional[List[str]] = None) -> argparse.Namespace

main(args: Optional[List[str]] = None) -> int


Module: parquet_converter.converter

convert_file(input_path, output_dir, config) -> ConversionStats

convert_directory(input_dir, output_dir, config) -> List[ConversionStats]

_convert_with_polars(input_path, output_dir, config)

_convert_with_pandas(input_path, output_dir, config)

_resolve_file_options(input_path, config)

_build_polars_csv_kwargs(options)

_normalize_polars_encoding(value)

_analyze_sample_with_polars(input_path, options, sample_rows)

_stream_polars_conversion(input_path, output_path, options, schema, compression, chunk_size)

_collect_polars_column_stats(output_path, column_limit)

_verify_conversion(output_path, source_path, verify_rows)


Module: parquet_converter.analyzer

scan_parquet_files(input_dir, recursive=True) -> List[Path]

get_file_size(file_path) -> str

get_file_modification_time(file_path) -> str

calculate_summary_stats(df)

calculate_null_counts(df)

get_unique_values_info(df)

analyze_parquet_file(file_path)

format_analysis_report(analyses, width=150)

analyze_directory(input_dir, output_dir=None)


Module: parquet_converter.config

load_config(config_path=None) -> Config

save_config(config, output_path)

validate_config(config_dict)


Module: parquet_converter.logging

JSONEncoder

setup_logging(level="INFO", log_file=None, verbose=False)

format_stats_table(stats_list)

save_conversion_report(stats_list, output_dir, config)

log_conversion_summary(stats_list)


Module: parquet_converter.parser

parse_file(input_path, config)

parse_csv(input_path, options)

parse_txt(input_path, options)

infer_dtypes(df, config)


Module: parquet_converter.stats

ConversionStats


For any functions not explicitly listed above (e.g., additional helpers introduced later), please ensure the module docstrings follow the NumPy template and update this document with their summaries, parameters, returns, and doctest-style examples. This guarantees parity between inline documentation and the human-readable API guide.***