How To Improve Predictive Model Performance On Connected Vehicle Data: Define The Problem Better

Modeling on connected vehicle data is hard. 

  • The data involves complex, high-dimensional, time series readings
  • Off-the-shelf machine learning approaches don’t work well (more on why here


But the most common reason that analytics teams struggle to deploy models at scale isn’t actually technical in nature. It’s an unclear definition of success

When the business problem isn’t clearly defined – and isn’t mapped to a concrete definition of success from a modeling standpoint – solid models sit on the shelf, rather than driving decisions in the field. 

Here’s why this happens, and how to fix it. 

Why do so few modeling efforts start with a clear definition of success?

For years, the automotive industry has focused on building a unified data foundation for connected vehicle analytics – from data collection and warehousing to transformation and data governance. Even when the immediate use case for this data is still speculative (e.g., the potential to generate new revenue streams from connected vehicle data), OEMs tend to view data management as a core strategic initiative.

This investment reflects a major cultural and mindset shift. Business leaders, who have long relied on domain expertise for decision-making, are now being challenged to “become data-driven.” 

But moving to a data-driven mindset requires thinking about problems and solutions in entirely new ways. It’s not always easy to translate business objectives into quantifiable outcomes. And when business and analytics teams don’t speak the same language, they miss the opportunity to collaborate on the right types of problems – and agree on meaningful and measurable definitions of success. This disconnect wastes time and costs money.

For example, an analytics team at an automotive manufacturer spent months developing a high-precision predictive model to forecast vehicle risk for specific failure modes. But the model never got deployed in the field because it never crossed internal performance thresholds. 

Unfortunately, these performance metrics were entirely technical and didn’t reflect real-world business considerations. The business didn’t care about statistical performance – they cared about their ability to successfully minimize warranty cost, while taking as few vehicles as possible off the road. But the analytics and business teams weren’t aligned on the project’s definition of success. 

Business leaders need to provide their analytics counterparts with context into their real-world success metrics and constraints, and analytics leaders need to build trust and buy-in with their business counterparts in a modeling approach. 

Defining the problem better can help bridge this gap.

Defining the business problem crisply

Here’s an example of a framework that can tie a business challenge to a concrete analytical approach and success criteria: 

  • What specific business outcome (incremental revenue, cost savings, measurable risk reduction) are we targeting?
  • Via what mechanism? (Specifically, how do we anticipate those results will be achieved?)
  • Over what time horizon?
  • What is our “next best option”? (I.e., assuming we didn’t have a model, what would we be doing – and results would that likely deliver?)


An example of a vague business problem would be “use connected vehicle data to drive predictive maintenance.” Framed in this way, it would be impossible for an analytics team to set up falsifiable performance metrics. (How good is good enough?) This is the kind of problem statement that encourages entirely technical, and often arbitrary, success criteria.   

A better version of that business problem would be: “reduce vehicle downtime by an average of 10% over the next year by moving from one-size-fits-all to customized maintenance schedules for each vehicle on the road.” Framed in this way, the problem statement gives concrete guidance on a) the expected business result b) the specific mechanism (i.e., the analytical strategy for solving the problem) c) over a well-defined time horizon d) relative to what default scenario. 

Tying the problem statement to analytics success metrics

Once teams have defined the business problem, they can translate business goals to concrete evaluation criteria or model performance. 

Let’s say there’s a failure mode that is estimated to affect 5% of vehicles within a given population of 200,000. The estimated cost associated per failure is $900 (including warranty expense along with predicted future cost in brand impact and lower repeat purchase rate among affected vehicle owners). 

So if nothing is done, expected warranty costs are $9 million (200,000 * 5% * 900).

In contrast, the cost of preemptively servicing a vehicle is $400. Clearly, preemptively servicing the entire population is cost-prohibitive ($80 million), so it makes sense to try to tailor service recommendations to vehicles that are actually at risk of the failure mode.

The analytics team builds a few candidate models to predict vehicle risk for the failure mode in question:

Naive success criteria would favor either model 1 (highest precision) or model 3 (highest F-score and recall). 

But neither of these technical thresholds actually take into account the business objective of minimizing cost. To understand this, let’s dissect the cost associated with different field actions that might be taken on the basis of the model.   

  • Cost of a true positive (i.e., servicing a vehicle that was going to fail): $400 per vehicle
  • Cost of a false positive (i.e., servicing a vehicle that wasn’t going to fail): $400 per vehicle
  • Cost of a true negative (i.e., ignoring a vehicle that wasn’t going to fail): $0 per vehicle
  • Cost of a false negative (i.e., not servicing a vehicle that fails and incurring warranty cost): $900 per vehicle


With the cost in mind, the strongest model would actually be Model 2 – the model that strikes the most effective balance between identifying most affected vehicles, while reducing unnecessary customer impact:

The cost of doing nothing

One area where business and analytics leaders need to be tightly aligned is around the cost of doing nothing. What would be the “status quo” outcome if nothing changed from the world of today?

Continuing the previous example, assume that the modeling team has a 0.7 threshold for F-score before releasing models to production. The team estimates that it will take one additional month to fine tune a model that will hit that threshold. 

Based on the current rate of warranty claim filings, the team is likely to accumulate an additional 1,000 warranty claims over the next month – totaling $900,000. 

The cost of inaction is high. Once the cost of delay is factored in, a new model would actually have to perform significantly better (with an F-score of roughly 0.8) in order to make up for the cost of lost time.

Analytics leaders can collaborate with business counterparts to define a status quo baseline for contextualizing model performance and quantify the cost of doing nothing. 

The bottom line

There are significant challenges associated with predictive modeling on high-dimensional time series data. But more often than not, unsuccessful modeling efforts stem from an unclear definition of the problem – resulting in modeling efforts that are disconnected from actual business needs. 

The lessons for analytics teams are clear. When defining success metrics for predictive modeling projects:

  • Partner with the business to build a shared definition of success tying business outcomes to technical success criteria. This requires educating the business on what a “model-driven mindset” looks like – and listening to fully incorporate real-world business objectives and constraints. 
  • No success criteria exist in a vacuum. Work with the business to define the baseline (“next best alternative”) and cost of inaction as you formulate benchmarks for technical model performance.


Viaduct works with the world’s leading automotive OEMs, helping them clearly define and efficiently solve predictive modeling challenges on their connected vehicle data. Reach out to learn more about how we help.

More articles