Luis SanchezLuis Sanchez
Genpro · 2025

Replacing Cleo: an Internal EDI + ETL + API Integration Service

Designed and built the internal EDI/ETL/API integration system that replaced Cleo at Genpro. The replacement handles EDI parsing, ETL pipelines, and REST integrations from one in-house codebase and lands all trading-partner data directly in our BigQuery warehouse, where the pricing platform, data governance layer, and the rest of the analytics stack can use it natively. The $80K/year Cleo subscription was decommissioned.

Replacing Cleo: an Internal EDI + ETL + API Integration Service
Stack
Python · BigQuery · custom EDI parser · REST/API connectors
Scope
All Genpro trading-partner integrations
Replaces
Cleo middleware ($80K/yr eliminated)
Outcome
Trading-partner data lives in BigQuery, accessible internally
$80K
Annual licensing eliminated
Cleo subscription decommissioned after cut-over
EDI + ETL + API
All channels in one system
From ingestion through warehouse landing
BigQuery
Where data lands by default
Same warehouse the rest of analytics already uses

Constraints that shaped the build

Cut over without dropping any trading-partner connectionMatch Cleo's reliability and ack-handling behavior on every transactionLand data in the warehouse schemas the rest of the org already usesStay maintainable by a small in-house team
Symptom

$80K a year for a black box, and a new TMS that needed to talk through it.

Cleo was the EDI middleware Genpro used to exchange data with trading partners: purchase orders, invoices, shipment statuses, the usual freight-industry traffic. It worked, but it worked behind a vendor wall. Every schema change took a vendor ticket. Trading-partner data was effectively trapped inside Cleo and had to be manually re-exported to land anywhere we could query it.

The forcing function was a new TMS onboarding. The development required to fit the new TMS through Cleo's platform exposed the limits of the existing workflow, so we built the integration ourselves instead of paying Cleo to extend their model. Once that was live, backfilling the rest of the partner connections in-house followed naturally.

Why people use Cleo: dozens of messy B2B applications and partner systems that need normalizing into one feed. Cleo's pitch.
Diagnosis

Modern tooling can do this in-house at a fraction of the cost.

What Cleo was actually doing, parse EDI messages, normalize schemas, route data to downstream systems, is well-defined and increasingly straightforward with modern data tooling. What the $80K/year was buying us was vendor lock-in, plus the integration overhead of keeping Cleo wired into the rest of our stack.

Bringing it in-house unlocked something Cleo never could: every trading-partner record would land directly in BigQuery, alongside the operational and pricing data the rest of the analytics work depended on.

Cleo's integration cloud: how Cleo positions itself as the connective tissue between data providers, customers, suppliers, and internal apps. We replaced this entire surface with an in-house service.
Hypothesis

One internal service that handles EDI, ETL, and API in the same codebase.

A single integration service, owned in-house, that ingests trading-partner traffic in any format the partner uses (EDI, file-based ETL, REST APIs), normalizes everything into our warehouse schemas, and makes the data immediately queryable to internal systems.

Trading partners shouldn't notice the cut-over. They keep sending data the same way; we change what's on our side.

Implementation

Cut over partner-by-partner, never broke a connection.

I mapped every Cleo connection first: protocol, schema, downstream consumer, ack expectations. Each was a contract we had to preserve exactly.

The replacement was built in Python on top of BigQuery as the warehouse: an EDI parser handling the common transaction sets (810, 850, 856, etc.), an ETL pipeline framework for file-based feeds, and a REST connector layer for API-driven partners. Schema normalization happened once on ingestion. Monitoring and KPI reporting got built into the service from day one, so the team could see exactly what Cleo had been doing for us, but now in our own dashboards instead of theirs.

Then we migrated connection by connection, validating data parity against Cleo at each step before cutting traffic over. After every partner was on the new system, we cancelled the Cleo subscription.

Cleo's KPI dashboard: the kind of reporting we'd been paying for, that the in-house service now produces itself. Errors, throughput, partner-level health, all visible to the team that runs the integrations.
Results

$80K saved, data unlocked, foundation for everything that came next.

The Cleo line item is gone. Every byte of trading-partner data now lands inside Genpro's BigQuery warehouse alongside the rest of our operational data.

The bigger win was downstream. The pricing intelligence platform and the data governance layer both rely on this integration. They wouldn't exist if trading-partner data still lived inside a vendor system we couldn't query directly.

Want the longer version?

I'm happy to walk through the architecture, the trade-offs we considered but didn't ship, and what I'd do differently next time. Drop me a line.