SOS Services
← Engineering
engineering · Vimal Bahuguna

How we built FDR ingestion — and what shadow mode caught

We have a working Python parser. We had a finished C++ one sitting on the shelf. Here's how we wired them in parallel and what the comparison surfaced.

Flight data recorder ingestion has two paths in our codebase. The Python path uses a honeywell_dlu package and ships in production. The C++ path is a 7,000-line CMake project that’s been sitting in src/ for months, never running anywhere. Both claim to parse the same Honeywell DLU files. We wanted to know if they actually agree.

This post is the story of wiring them in parallel — what we call shadow mode — and what the first real test surfaced.

The two parsers

Production today: a Python worker imports honeywell_dlu, gets parse_arinc717_dlu_subframes(bytes) back, hands the result to the API. On the 18.8 MB sample file we use for smoke tests, it returns 501 frames in 16.6 seconds. Works. Ships.

The C++ project, fdr_ingest, exists in the same repo. CMake. C++17. nlohmann/json, libcurl, libuuid. Two separate parsers inside it — Arinc717Parser for raw ARINC 717 streams and DluParser for the Honeywell DLU wrapper format. Tests in tests/. Looks production-ready on the surface.

It didn’t compile.

Six bugs blocked the build, none of them deep, all of them the kind of thing you only catch when you actually run CMake:

  1. pkg_check_modules(UUID REQUIRED ossp-uuid) — but main.cpp uses uuid_generate_random from util-linux, not the ossp API. Mismatched library, would have linked wrong.
  2. The pinned SHA256 for nlohmann/json v3.11.3 was outdated. GitHub republished the release tarball at some point; the hash check failed before any compile happened.
  3. FDR_LOG_WARNING macro used in six call sites. The macro is FDR_LOG_WARN. Pre-existing typo nobody had ever exercised.
  4. SupabaseClient::GetStats() was defined inline in the header AND out-of-line in the .cpp. Linker rightly objected.
  5. engineing_value in arinc717_parser.cpp (missing an “er”). Silent until referenced.
  6. main.cpp unconditionally calls DatabaseManager constructors, but DatabaseManager is only compiled with FDR_ENABLE_PQXX=ON, and libpqxx-6.4 doesn’t compile cleanly with gcc 12. Wrapped the usage in #ifdef FDR_HAS_PQXX.

After the bugs: a 469 KB fdr_ingest binary that runs. Took roughly an hour to find them all.

Wiring them in parallel

Production runs Celery workers. The plan was: when a conversions.process task succeeds via the Python parser, fire-and-forget a sibling cpp.shadow_parse task with the same input bytes. The C++ writes to its own cpp_flights / cpp_fdr_samples tables in Supabase (prefixed so they can’t collide with the production schema). Discrepancies become visible by comparing the two.

This needed more changes than expected:

  • The upload pipeline. Files were being saved to /tmp/fdr_uploads in the API container, then read by the worker — except the API and worker are separate Coolify apps with separate filesystems. The worker was getting empty bytes for every upload and silently parsing nothing. Python and C++ both. Pre-existing, surfaced by us actually checking. Rewired the upload service to write to a Supabase Storage bucket (fdr-uploads) and the worker to fetch via a fetch_upload_bytes() helper.
  • The C++ binary’s config model. It reads SUPABASE_URL, SUPABASE_KEY, etc. from a .env file — not from OS environment variables. The Python wrapper now writes a temp env file alongside the input file and passes --env <path>.
  • Table isolation. Added a --input-format dlu flag and a SUPABASE_TABLE_PREFIX setting so the binary writes to cpp_* tables. Created the migration. Doesn’t touch a single row of production data.

What shadow mode caught

On the same 18.8 MB 09042026_1558.dlu file:

ParserOutputTime
Python honeywell_dlu501 frames decoded16.6 s
C++ DluParser0 parameter records, 324,370 status records80 ms

The C++ DluParser has expected_record_type=0x000C baked in. On this file, that value matches zero records. 324,370 records get classified as “status” instead. The Python implementation apparently uses a different filter or a different framing model — we haven’t yet checked which side is correct.

This is exactly what shadow mode is designed to catch. Without it, you’d have to either swap the C++ binary into production blindly (and watch your samples table fill with the wrong data type) or keep the C++ on the shelf forever. Shadow mode lets you see disagreements early, without exposure.

So what shipped

Production runs the Python parser. The C++ binary is built into the worker image and ready, but the dispatch is gated behind CPP_SHADOW_PARSE_ENABLED=false — flip the env, restart the worker, it’s on again. The wiring is real; we’ll use it when we have the DLU format question answered.

The whole exercise was about 15 commits. The headline finding wasn’t “C++ is faster than Python” — it was that we have two implementations that disagree on a fundamental field, and we’d never have known otherwise.

That, more than the C++ binary, is what we actually wanted to build.