The Accuracy of the Incumbent

Both the company I started, and the one I work for now have been trying to disrupt. That is to say, there’s an incumbent process or system that exists and we’re trying to do better. In both cases doing better means higher accuracy and delivering answers earlier. First it was in healthcare and diagnostics, now its in logistics and predicting arrivals and departures. Again in both cases, we’ve got the data to show that we are better. As it turns out, having the data is a very small part of product market fit when your industry has a long history with an incumbent.

Within both healthcare and logistics, there’s often a lack of day-to-day measures of quality. At a national level there are statistics, some vendors share their aggregate outcomes, but the in-the-moment measures of quality on data are notoriously hard to validate. If two people tell you your container of good will arrive on different days, how do you know then and there who to trust? These problems become doubly difficult when you provide predictions well in advance. I can fairly reliably tell you if your ship is delayed when its a day away, but thirty days out is another story. The same goes for diagnosis. If you’re unwell right now that’s a much easier thing to spot than knowing if you’ll be unwell in a year. Neither are trivial but time blurs all lines.

When I’ve worked with customers in both cases I’ve noticed that the word of the incumbent tends to be gospel. Sticking to logistics for the moment, we’ve worked with many customers who use us as a management platform while working with other providers to do the day-to-day logistics work. In those instances, our predictions on ETAs are compared against what their freight forwarders are sending. We know, and on the whole the industry knows, that there is a lot of bad data out there when it comes to ETAs. Most customers know their dates are unreliable, that’s why they’re looking to work with us in the first place. That doesn’t negate the bias towards the data they’ve been receiving, often for many years.

Customers will compare the dates we provide to the dates their forwarder provides. If we differ, the base assumption is we are the ones that are wrong. It’s a reasonable assumption. We’re the ones coming in with something new. The burden of proof is on us to provide. Our job, usually during trials, is to find a way to objectively measure, and communicate that what we’re providing is in fact a better measure than their incumbent. It typically involves a head to head comparison of data, usually at multiple time points to show how our predictions, and the incumbent’s predictions vary over time. In a lot of cases this can be a big ask.

When you’re working with larger enterprise customers, a trial or pilot of some description is fairly typical. In these cases you can set expectations up front about the need to run these head to head comparisons. Don’t get me wrong, we present our internal data but that only provides so much confidence. For smaller customers this becomes trickier.

At Maxwell Plus we were working with the smallest sized customer, an individual consumer. In that case, trying to provide early diagnosis. There wasn’t a pilot window. Each individual customer was an study with one participant and we couldn’t lean on the mentality of ‘just wait and see’. We were dealing with something deeply personal and potentially life changing. We saw the same behaviours. Patients often saw any delta between our recommendation and their previous experience as inherently wrong. We worked to soothe that worry by leaning into the incumbent relationships, something I’ve seen work well again at Explorate working in logistics.

Leaning into the incumbent in both these scenarios meant pairing technology with people. At Maxwell Plus we employed doctors who helped manage the relationships with patients and provide a second pair of eyes in a familiar form. At Explorate we provide software as well as freight forwarding services. The services part is optional for customers but there’s a trust that comes from customers knowing we’re in the trenches day to day and know the ‘old ways’ even while trying to push for something better.

Having people in the mix works well but it can be difficult to scale, especially if that expert needs to be closely involved in all interactions. The goal of building a technology company is to scale these kinds of interactions in a way that is impossible at a one-on-one level. Over time, we needed to work out other ways to remove the comparison against a long standing incumbent.

One way is to change the conversation. Sometimes the difference between 80% accuracy and 90% accuracy isn’t enough to convince someone to switch. Sure, in the long run that 10% can be a significant change but in the moment that difference doesn’t feel like much. In these cases its better to focus your messaging elsewhere.

At Maxwell Plus the message we focused on was continuity. Many of our customers had data scattered across medical records at many different practices. We offered a place to centralise that data and keep a focused eye on you over time. We did one thing and we did it well, never claiming to be more than we were. We didn't lean on accuracy as a key metric in the early days as we needed to build consumer trust. As time went on and we had a larger in-market clinical outcomes dataset we were able to reintroduce a marketing position around accuracy. We had strong evidence that we were accurate, and could provide an accurate pathway early. That was only possible if you had the continuity of data.

At Explorate we position accurate tracking as table stakes. We know we need robust and accurate predictions but we’re not here to suggest that knowing the exact minute a ship berths at a dock is going to change your supply chain. Instead we focus on all the things enabled by centralisation of data. Many savvy customers will work with multiple providers to minimise risk and to shop the market for competitive freight services pricing. What they save in cost often results in scattered data. Providing an accurate third party store of data has its advantages. Once the data is together, supply chain managers can stop thinking shipment-by-shipment and start planning more strategically. This message has resonated well with customers and provides different avenues for discussion beyond accuracy.

While shifting the message can help cement your value, it shouldn’t be used as a distraction. At the end of the day you need to be accurate and need to convince your customers to trust you. There is a largely unsolved design problem here. For many measures, presenting things as a single number is bad statistical practice. Any prediction has its uncertainties.

Communication of statistics is a long standing difficult problem. Whether its election results, the chance of rain, probability of cancer or when a shipment is likely to arrive. The further out you try and predict, the less certain you become. The confidence interval is a traditional way to measure this uncertainty. A 95% confidence interval says, based on everything I know, and the statistical assumptions I’ve applied, I’d be willing to say that the final value will be in this range with 95% certainty.

It’s nice to be mathematically correct when possible but for many products, a range is hard to work with. While it is technically correct, many users will bias towards a single number over a range because it is easier to use. It takes a combination of expertise and patience to work with intervals. If you tell me my shipment will arrive between Monday and Friday I need to make contingencies for five different days. Sure, if I expected it last week then knowing it’ll arrive soon can still be useful but I’m not going to book a list mile delivery on every day of the week just in case.

The other difficulty with intervals is that they’re very rarely uniform. That is to say that across a ‘Somewhere between Monday and Friday’ prediction, there may not be an equal one in five chance on each of the days. Wednesday might be far more likely that the other days. Communicating this is another layer of hard to navigate complexity.

When building products like this that means making the technical details opt-in if you include them at all. More sophisticated users or those well trained in your product may grow comfortable enough that they want to know best and worst case as well as the mean. I wouldn’t however, suggest bombarding people with that on day one.

Our job as people building products is to take complex ideas like non-uniform probability intervals and find a way to make them simple and easy to understand. Chances are you’re not building software for math majors and you need to find a way to simplify. Otherwise you’ll lose to the incumbent just because the incumbent is simpler.

We saw this a lot in the early phases of building Maxwell Plus. Our bias towards being as transparently accurate as possible meant that our early product had ranges of probabilities and talked about things like confidence intervals. While technically speaking this was more correct than the incumbent giving a simple yes or no our users didn’t see it that way. They came to us as an alternative to their existing provider and wanted to know that we were there to abstract away the technical aspects and give them clear direction.

Any disruption, big or small, takes a certain amount of trust. Trust takes time to build. You can drawn down a trust debt with things like hype or being over-zealous in your marketing but in the long term you need to pay that debt back. If all companies had the same time and same budget, in theory the most accurate would end up winning. In theory. In reality, how you compare to an incumbent, or even your competitor, comes down to how you communicate just as much as it comes down to a head to head comparison in accuracy.

In a reversal of that old quote from Henry Ford, you need to think long and hard as to whether you should be in the market promoting yourself as a faster horse. Accuracy claims can be addictive if you’re living and breathing that competition every day. For your customers however, there’s a whole lot more at play than percentage points.