Balancing machine learning automation with human intervention for operations teams managing €60 million a year.
Lead Product Designer, 2024
Stuart uses incentives to balance courier supply with customer demand. When there aren’t enough couriers in an area, it increases payment rates to attract more. Area Multipliers are the main tool to manage this. They control €60 million a year in payments across Europe.
In 2022, the tooling broke. A script running from someone’s laptop failed. Orders were late & Stuart breached key SLAs, putting major enterprise client relationships at risk.
When we looked into it, the problems went deeper than one broken script. Each country had built their own solutions. People didn’t even understand the mess that had grown organically.
Operations wanted manual control because models couldn’t always handle their complexity. Engineering wanted scalability because manual processes kept breaking. The company overall wanted more automation.
As a Lead Product Designer, I was brought in to help work out what to do.
Before designing anything, I needed to understand the whole system. It hadn’t been mapped and no one held a complete enough picture.
I spent time with the Head of Global Operations documenting what existed, how tools connected, who used them, what problems they solved, and where things broke.
We found 13 separate solutions doing similar work. Some were proper applications. Others were spreadsheets with scripts. Each had been built to solve a local problem.
At the same time, an argument kept coming up in most conversations:
We need to control this manually. Models can’t handle our complexity.
We need to automate everything.
As long as it was framed as a choice between one or the other, things weren’t moving fast enough.
I proposed something different. Instead of asking whether we should automate or keep manual control, let’s ask: how do we design for both working together?
We used these 4 ideas to make decisions throughout the project.
The ALMO model generates weekly rate suggestions using historical demand patterns, weather forecasts, local events, and budget constraints.
I designed the calendar to show these suggestions as a starting point. The suggestions are accepted by default, but operations teams can adjust them or replace them entirely.
The model does the volume work. People handle the exceptions and the things the model can’t see.
Teams kept asking for one thing: show us what changes will cost before we publish them.
Overspend hurts margins. Underspend means late deliveries and breached SLAs. They needed to balance both risks before committing.
I designed real-time spend forecasting. As you edit rates, you see total spend updating, with a comparison to the budget target for that area.
Operations teams manage hundreds of delivery areas each week. They needed to work in bulk to be efficient, but they also needed to adjust individual areas when something unusual was happening.
The interface supports both:
The technical constraint was that each change triggers API calls. With 100 areas across 7 days and multiple time slots, that’s potentially thousands of calls.
I worked with Engineering to batch requests and design the interface around a bulk-first-then-refine workflow. We showed processing time so people could see what was happening.
Once the platform was working, I explored having a staging view and letting operations teams describe changes in natural language.
Boost Central London by 0.3 on Saturday evening.
The AI would generate a staging view where the user can review and refine.
In 2024, it was too early. We estimated the effort as much larger than it seems today with recent advancements in AI models.
The system we shipped let the model do the volume work and people handle the exceptions. It had solved the problem, and we could move on to tackle other issues instead.