Automated parser
Enter the password to unlock the code viewer.
The wins
- 🚀 Automated FAA NFDD parsing with advanced NLP & LLMs!
- 🤖 Achieved >94% accuracy on complex aviation data, far surpassing the 75% production threshold.
- ⏱️ Saved 18 hours of manual labor per day—yayayay!
- 📦 Ready for production: validated, robust, and trusted by Boeing for operational use.
- 🧠 Engineered a parallelized, prompt-optimized LLM pipeline for scalable, accurate parsing.
- 📝 Comprehensive post-processing, validation, and cross-checks ensure data integrity and reliability.
- 🔬 Recommended future: rule-based validation, domain fine-tuning, and more annotated data for even higher accuracy.
Project Root
- README.md Project overview and usage instructions for the Boeing NFDD Parsing Pipeline.
- requirements.txt Lists all Python dependencies needed to run the pipeline.
src/
- section_splitter.py Splits cleaned text files into logical NFDD sections for further processing.
- read_whole_pdf.py Main entry point: orchestrates PDF-to-text conversion and section splitting.
- process_unified_LLM.py Runs LLM-based parsing on each section, producing structured JSON outputs.
- pdf_miner_grouper.py Extracts tabular text from PDFs using pdfminer, preserving layout.
- llm_open_pipeline.py Handles LLM-based section processing and output file management.
- 2099_cycle_dates.xlsx Excel file containing cycle dates for reference (not code).
special_utilities/
- run_tests.py Runs all test evaluations and prints a summary of results.
- test_evaluator.py Compares parsed outputs to golden data and computes accuracy.
- extract_headers.py Extracts and logs all unique headers from PDFs for troubleshooting.
- download_pdfs.py Downloads all FAA NFDD PDFs using Playwright and BeautifulSoup.
- count_files.py Counts files and section headers in a folder for quick analysis.
helper/
- utils.py A collection of utility functions for parsing, normalization, and file handling.
- stringify_json_object.py Cleans and pretty-prints stringified JSON files in a folder.
processed_files_in_json/
- nfdd-2025-04-23-77_SECTION_02_NAVAIDS.json Example output: structured JSON for the NAVAIDS section, as parsed by the pipeline.
Note: Click any file to preview its contents. This code is for viewing only and is password protected for privacy.