Why off-the-shelf LLMs don’t work for time series data

The automotive industry is bullish on generative AI. 

93% of leaders
in the automotive industry believe that generative AI is a gamechanger – and 75% plan to adopt it this year. Use cases span the entire vehicle lifecycle, from design through personalized driving experiences and customer engagement. 

But how exactly does generative AI work? And how well does it do at core tasks around understanding and predicting vehicle health?

The underpinning of generative AI is large language models, or LLMs. While they excel in many areas, particularly using linguistic context to summarize texts and generate their own, LLMs fall short when it comes to analyzing large-scale time series data, like telematics. 

In this blog post we’ll highlight key challenges around LLM performance on time series data, and what we recommend instead for reducing vehicle downtime. 

What LLMs are made for  

LLM adoption has skyrocketed, driven by the power and versatility of models that can make inferences based on linguistic context – as well as summarize texts and generate texts that are (often, if not quite always) coherent and contextually relevant.

But the qualities that make large language models good at processing and generating language are not exactly the same qualities that are needed to process vast amounts of multidimensional time series data.

“LLMs are designed to understand and generate text like a human,” writes IBM in its guide to LLMs.

And how do they do that?

Relying on textual context, the multiple layers of neural networks that make up LLMs are trained to predict the next word in a sentence by assigning a probability score to the recurrence of “tokenized” words, meaning that they’ve been broken down into smaller sequences of characters. The training process involves ingesting massive amounts of text that allow the LLM to learn grammar, semantics and conceptual relationships.

This has already made a significant impact on many industries, and we are only beginning to figure out how much deeper the impact will be in the coming years.

With that said, large language models are not a ready-made solution for every problem – not even every problem that we already know requires artificial intelligence or machine learning to solve.

The main challenges for repurposing LLMs for time series analysis are that they are simply not built to accurately represent continuous data, they perform poorly at time series anomaly detection and forecasting, and they need major adaptations to even approach usable performance.

Let’s take those issues one at a time.

Challenge 1: LLMs Aren’t Built to Represent Continuous and Sequential Data

One core challenge with using LLMs for time series analysis is that they were designed around language, where information is broken into discrete units or "tokens" (like words or parts of words). Time series data, however, is inherently continuous – meaning it involves data points recorded in sequence over time without breaks. This fundamental difference creates problems when LLMs are used to analyze time series data, as they lack the mechanisms to naturally represent and track the continuous flow and nuances of this type of data.

LLMs typically handle text by analyzing patterns within these discrete tokens, which are isolated from each other except for a limited context window. In time series data, continuity is essential because each data point is closely tied to the ones around it, representing a fluid change over time (e.g., variations in engine temperature or battery charge over milliseconds). To compensate for this gap, researchers have tried techniques like "reprogramming," where models are modified to work with continuous data formats. However, even with reprogramming methods like TIME-LLM, these models struggle to maintain the precise temporal relationships required for reliable analysis. Specialized time series models, by contrast, are built to capture these relationships as part of their architecture.

Furthermore, when LLMs attempt to process time series data, they often rely on encoding methods developed for text, such as tokenization. But encoding continuous numerical data like temperature fluctuations or voltage readings into tokens is challenging because LLMs are forced to approximate values, losing critical timing and continuity details. Specialized tokenization methods have been explored, but they generally fall short of preserving the sequence’s integrity – particularly over longer time spans, where even minor inaccuracies can accumulate and compromise forecasting accuracy.

Challenge 2: LLMs Struggle with Anomaly Detection and Forecasting

Time series analysis in the automotive context requires both anomaly detection and forecasting to be effective. For example, anomaly detection might involve spotting unusual temperature spikes that could indicate a potential issue, while forecasting could predict when a component is likely to fail based on patterns in sensor data. Both tasks demand models that can capture complex, nuanced trends and detect subtle deviations. LLMs, however, aren’t optimized for these types of time-dependent patterns.

In anomaly detection, LLMs fall short because they aren’t naturally suited to recognize the small, continuous shifts that often indicate early warning signs in time series data. For instance, a slight but consistent increase in engine vibration might signify an emerging mechanical issue, but LLMs, which focus on token-based relationships in text, tend to overlook such subtle variations. While adding visual input representations (e.g., converting time series data into images) can help LLMs perform slightly better in anomaly detection, this approach adds complexity and still doesn’t provide the level of accuracy seen with models specifically built for time series.

Forecasting with LLMs introduces additional complications. When applied to time series forecasting, LLMs often require special prompting and encoding methods, like Prompt-as-Prefix, where added context or instructions help guide the model. However, these techniques are workarounds rather than solutions. Even with specialized prompting, LLM forecasts tend to be inconsistent, especially over longer horizons. They lack the architecture to interpret the deep, long-term dependencies that specialized models like Long Short Term Memory (LSTM) networks and time series-optimized Transformers handle natively, making LLMs less effective for forecasting applications in automotive use cases.

Challenge 3: Significant Model Adaptations Are Required for Time Series Analysis

Using LLMs for time series data often requires substantial adaptation efforts. One common approach is to transform time series data into visual or spatial representations, as LLMs can interpret image data more intuitively than raw sequences. This adaptation can improve the model’s ability to detect basic patterns or trends, but it’s not a natural fit, especially for large-scale applications like analyzing continuous telematics data from thousands of vehicles. This type of adaptation adds complexity and still falls short of the accuracy needed for real-world scenarios.

For example, TIME-LLM, an adaptation framework, demonstrated that with enough reprogramming and structured prompts, LLMs can process time series data more effectively. But even then, they face performance issues at scale. Automotive telematics data is highly multidimensional, encompassing hundreds of simultaneous data streams – such as speed, fuel usage, temperature, and brake pressure – captured in real time. LLMs are not structured to handle such high-dimensional data efficiently, as their architecture lacks the attention mechanisms optimized for long-range dependencies typical of time series data.

Moreover, LLMs’ core architecture, which is based on Transformers, includes an “attention mechanism” designed to focus on relevant parts of a sentence for language tasks. However, this mechanism struggles to accurately capture the kind of long-term dependencies that are common in time series data. Time series models are built with these dependencies in mind, which allows them to track trends over extended periods. LLMs, on the other hand, can lose track of long-term dependencies due to their attention limitations, which limits their performance on large or complex time series datasets without additional support.

Time Series Data Requires a Purpose-Built Solution

For automotive manufacturers, analyzing connected vehicle data demands tools that can capture complex, continuous signals to detect anomalies and predict failures. Off-the-shelf LLMs aren’t designed for this task – especially when it comes to understanding subtle, time-based relationships within vehicle sensor data.

A dedicated AI engine, built specifically for time series analysis, is essential to spot issues sooner and accurately trace them back to their root causes. This approach enables faster, more precise corrective actions, ultimately reducing costly failures and downtime.

Detecting Issues Earlier

In many cases, automotive manufacturers need to sift through multiple service events or warranty claims before realizing that a problem is recurring. This delay can be costly, as some issues only become visible after they’ve already affected vehicle reliability.

A specialized AI engine trained on telematics and vehicle data can catch early warning signals in sensor data, even when repair events look different from one vehicle to the next. It can quickly filter out noise, focusing on patterns that point to real issues. By assigning anomaly ratings, the engine helps teams prioritize critical signals, cutting down the data that needs investigation by over 85%. This lets manufacturers start their investigations with the signals most likely to lead them to the root cause.

Finding the Root Cause and Accelerating Fixes

To resolve issues effectively, manufacturers need to isolate defective vehicles precisely, identifying only those that are affected without over-recalling. This challenge grows exponentially with the high volume of vehicle configurations, production variables, and sensor data in play.

A purpose-built AI engine can segment the vehicle population based on shared manufacturing characteristics or design features, revealing patterns in abnormal behavior across groups. By grouping vehicles showing similar patterns – like sudden sensor spikes or unusual temperature readings – the engine guides teams toward targeted investigations. If anomalies are specific to engines from a particular supplier, for example, the investigation can start there; if they are isolated to certain production plants, the issue may lie in the manufacturing process itself.

This segmentation turns problem identification into actionable steps, speeding up the path to a fix and helping reduce quality costs.

The Bottom Line

While LLMs have proven capabilities, they’re not built for the demands of multidimensional time series analysis in automotive applications. For manufacturers looking to leverage connected vehicle data to predict failures and address issues proactively, a dedicated AI engine – like Viaduct’s patented Temporal Structural Inference (TSI) Engine – offers a more reliable solution.

Much like a search engine that indexes and categorizes web content, Viaduct’s TSI Engine organizes vehicle data into a detailed, queryable map of telematics signals, operational data, and historical failures. This framework is designed specifically for high-dimensional time series data, identifying rare events, like part quality issues, that standard models might miss. By structuring this data into a network of correlations and relationships, the TSI Engine delivers critical insights, helping manufacturers trace the origins of an issue and focus on the signals that matter most.

For a deeper look at how Viaduct’s TSI Engine outperforms general-purpose models in automotive time series analysis, read our TSI Engine white paper.

More articles