Genpro · 2025 — Present

ML Pricing Intelligence Platform

I led the third-party engineering team that built Genpro's ML pricing and lane analytics platform. A React frontend over a Python ML stack (scikit-learn, XGBoost) and a BigQuery warehouse, it takes the messy industry-standard DAT peer-rate feed plus our own historical loads and turns them into clean, model-backed lane prices brokers can use in the moment. The same model powers daily GTM market targeting across the desk.

Stack: React · Python (scikit-learn, XGBoost) · BigQuery
Scale: Thousands of lanes priced daily
Inputs: DAT peer rates · Genpro historical loads · external market signals
Outcome: Powers daily GTM market-targeting + lane pricing decisions

1000s

Lanes priced daily

Across the active U.S. truckload network, off cleaned DAT + internal signals

Real-time

Quote-time latency

Brokers get a model-backed price before they pick up the phone

Daily

GTM cadence

Sales and capacity teams target lanes off the same model output as pricing

Constraints that shaped the build

DAT is industry-standard but noisy: peer-reported, sparse on thin lanes, biased on volatile onesPricing must be defensible to a broker, not just accurate. Every rec needs an explainability trailGenpro's downside must be protected: a bad print should never get quoted to a customerSame model has to serve sales targeting and quote pricing without divergenceOutputs need to reconcile against finance reporting at the month-end view

Symptom

Brokers were quoting off a feed everyone knows is messy.

DAT is the leading rate feed in trucking. Everyone in the industry uses it. The problem is that DAT is peer-reported, which means it's noisy on its own: thin coverage on uncommon lanes, lagging signal on volatile ones, and rate bands wide enough to drive a truck through.

Genpro brokers were quoting off DAT directly, padded with intuition and recent memory. Rates came out inconsistent across the desk, slow on RFPs, and structurally exposed to whichever way the peer noise was leaning that week.

DAT trendlines — the industry-standard peer rate feed every brokerage in trucking quotes off. Useful, but visibly noisy, especially on thinner lanes.

Diagnosis

The feed isn't broken. The relationship with the feed is.

DAT is fine as an input. It is not fine as a quote. The leverage point wasn't replacing DAT. It was wrapping it with a model that knew where DAT was reliable, where it wasn't, and what to do in either case.

Underneath that, we had years of Genpro historical loads in BigQuery and external market signals nobody had wired together yet. Combined with DAT, that was enough to clean the noise, fill the sparse lanes, and produce a single defensible price per lane.

Hypothesis

One model, three surfaces, downside-protected by construction.

If we trained a pricing model on Genpro's historical loads and market signals together with DAT (treating DAT as one input among many, not the answer), we could (1) clean DAT's noise on the lanes it covered, (2) infer prices on the lanes it didn't, and (3) flag the lanes where our confidence was too low to quote at all.

That same model then powers three surfaces from one source of truth: the quoting UI brokers use, the GTM targeting view sales uses, and the market view capacity uses. Three teams, one set of numbers.

Implementation

Led the build end-to-end with a third-party engineering team.

I owned the spec, the data model, and the broker workflow integration. The third-party engineering team I led owned the React frontend and the Python ML service implementation. We standardized on scikit-learn and XGBoost for the core pricing models, BigQuery as the system of record, and a thin API layer so retraining never blocks quote-time serving.

On the data side, the work was building the lane network: mapping every active and adjacent lane, attaching DAT, internal load history, and market signals to each, and producing a continuous price surface where lanes that DAT covered well, lanes it covered sparsely, and lanes it didn't cover at all all came out with the same shape of answer.

The hardest part wasn't the model. It was making the outputs trustworthy. Every recommended price gets an explainability trail a broker can push back on, every low-confidence lane is flagged before it gets quoted (downside protection), and every aggregate reconciles against finance's monthly numbers.

Network mapping for pricing: model output projected across the active and adjacent lane network. Where DAT is reliable, where it isn't, and where we have enough internal signal to price anyway.

The CCM tool in action: the in-product UI brokers and pricing analysts use to interact with the model, configure lane rules, and review recommendations.

Results

From spreadsheet pricing to a system the GTM org runs on.

The platform is the default surface for lane pricing at Genpro and is wired directly into daily GTM market-targeting decisions. Sales, capacity, and pricing are working off the same numbers for the first time, and the broker desk is no longer exposed to whichever way the DAT peer noise leaned that week.

Just as important, the underlying warehouse, models, and serving layer are now the substrate the next set of internal ML projects sit on. Not a one-off.