Why generic LLMs miss the mark in manufacturing service

Kasper Roed · 7 min · 2026-03-14

Generic LLMs — Copilot, ChatGPT, the assistant baked into your CRM — are remarkable general-purpose tools, and they hit a wall the moment a technician asks them about a real machine. The failure isn’t in the model. It’s in everything around the model that nobody built.

The failure modes are domain-specific

Parts SKU confusion is the first thing that breaks. A bearing carries a different part number across three model years and two regional catalogues, and the generic model will confidently pick one — usually the one most represented in its training data, which is rarely the one in front of the technician. There is no clever prompt that fixes this; the model doesn’t know what it doesn’t know.

Then come the service bulletins. OEMs issue them, supersede them, withdraw them. Generic models trained on a snapshot don’t model supersession at all and will happily quote a bulletin that was pulled eighteen months ago. Then there’s the multilingual question: a Polish technician asks in Polish, the manual is in German, the answer comes back in English. And then EU compliance — data residency, audit logs, retention — which the generic procurement conversation has barely begun to acknowledge.

A concrete example: “E-12”

Across one customer’s product lines, “E-12” means three different things. On the older diesel platform it’s a fault code for low oil pressure. On the newer electric platform it’s a battery management warning. In the parts catalogue it’s the SKU prefix for a family of harnesses. Ask a generic LLM “what does E-12 mean on our machines?” and you’ll get a confident, plausible, wrong answer.

A vertical model with the corpus tagged by product family, model year and document type doesn’t have to guess. It returns the right answer with the right citation because the retrieval narrows on metadata before the language model ever sees the candidates. That’s not a smarter model; that’s a system designed for the problem.

”Bring your own data” isn’t enough

The standard answer from the generic vendors is: upload your PDFs, we’ll RAG it. That misses what the work actually requires. The schema — parts, machines, bulletins, work orders, customers — isn’t in the PDFs; it has to be modelled. The citation pattern that earns technician trust isn’t a default; it has to be built. Parts-aware reasoning across cross-references and supersessions isn’t a prompt; it’s a retrieval layer. And the voice channel — the one technicians actually use because their hands are occupied — isn’t a feature you bolt on after.

A general-purpose model dropped into an industrial service workflow is a brilliant graduate on day one of the job: fluent, confident, and wrong about specifics in ways that nobody senior has time to keep correcting. The fix isn’t a better graduate. It’s the apprenticeship — the schema, the corpus, the citations, the channel — and that’s what has to be built, not configured.

Where this leaves industrial buyers

The companies getting real value out of AI in service operations in 2026 are not the ones who picked the biggest generic model. They’re the ones who picked a system designed around their corpus, their parts, their bulletins, their languages and their compliance posture, and accepted that the model is one component of five rather than the whole product.

That’s the bet Opero is built on, and it’s why we don’t compete on benchmarks — we compete on whether the technician opens the app the second time. Industrial service is a vertical problem. Treat it like one.

Read as .md Talk to us