What the TV show, "The Office," can teach us about AI consumption
Caveman prompting and why fewer words can mean lower AI costs

Once in a while, a technical idea comes along that is equal parts silly, intriguing, and surprisingly useful. Caveman prompting is one of those ideas.
If you remember the episode of "The Office" where Kevin Malone decides he can be more efficient by talking like a caveman, you already understand the premise. He removes grammar, sentence structure, and emotional nuance, yet everyone still understands him.
Caveman prompting is the same. You strip your prompts down to their most basic form. Fewer words. No politeness. No emotional filler. Just intent.
Think less “Could you please help me understand how tokens work in large language models?” Think more “Explain token. Simple.”
It sounds ridiculous. And yet, after experimenting with it across multiple large language models (LLMs), I have found that not only does it work, it opens up a very real conversation about how much unnecessary cost we may be introducing into our AI implementations strictly by trying to maintain human pleasantries and what we can do about it.
What are AI tokens?
Tokens are the unit of measurement behind consumption based AI pricing. Roughly speaking, a token is a fraction of a word. When you use an LLM, you are often paying based on how many tokens you send in and how many tokens come back out.
The challenge with AI is that many organizations do not realize how quickly small inefficiencies add up. Things like verbose prompts and overly detailed system instructions cost almost nothing individually. At scale, they matter.
When AI consumption hits the bottom line
Just four months into 2026, Uber reportedly blew through its entire AI budget. The transportation tech company pointed to surging use of AI coding tools as the primary culprit.
Uber had encouraged high AI adoption among its employees, and the AI was working exactly as intended, but it was working at a scale and cost that caught people off guard.
This is important because a large, sophisticated technology company publicly acknowledged that AI usage can get expensive very quickly when it is not optimized.
The end of the black box AI spend
For the last few years, AI spend has often been treated as a bit of a black box. Teams focused on speed to market and proving value, not on optimizing cost. That approach made sense early on.
Now AI is operational. It is embedded in development workflows, customer experience platforms, and day to day processes. Usage is ramping up fast, sometimes faster than anyone expects.
Organizations are hitting the point where they cannot just ask, “Does the AI work?” They have to ask, “Are we using AI efficiently and within budget?”
This is not just a technical issue – it’s also extremely relevant to customer experience.
In advanced CX environments, AI runs constantly. Virtual assistants, self service flows, agent assist tools, and analytics pipelines can all generate token usage.
Saving fractions of a cent per interaction may sound insignificant, but multiply that by thousands or millions of interactions and it becomes a real line item.
How teams can optimize AI usage
This is usually the point in the conversation where someone says, “Okay, I get it. AI can get expensive fast. Now what?”
It’s time to revisit caveman prompting, not because writing like a caveman is the answer, but because it forces you to see where the waste is hiding. Once you notice that, the path to optimization gets a lot clearer.
Here is where teams tend to see the biggest opportunities:
- Rightsizing model choices. Prototyping with powerful LLMs makes sense early on. The issue is leaving them in place forever. In customer experience environments, many interactions are narrow and predictable, and cheaper models often perform just as well.
- Look beyond human language. Not all token usage comes from prompts and responses. How systems exchange data matters too. Traditional formats include a lot of overhead AI does not need. Cleaning up those exchanges can save real money.
- Make usage visible. You cannot optimize what you cannot see. Tracking token usage by application, model, and use case turns vague concern into clear action. That is when teams can point to exactly where costs are coming from and why.
Caveman prompting is the lens, not the solution
Kevin Malone wasn’t trying to optimize AI costs. He was just trying to say less and still be understood. That’s the takeaway.
When the meaning stays the same but the words disappear, it is worth asking why those extra words were there in the first place. And with AI, extra words or overly-powerful models often translate directly into cost.
Caveman prompting works best as a lens for noticing opportunities, not as an end goal. It helps teams notice where AI is doing more work, and consuming more tokens, than it needs to.
Kevin wanted to save words. What he accidentally showed us is a way to look at AI more intentionally.
Want to avoid the hidden pitfalls of scaling AI?
Watch our on-demand webinar, "AI Adoption Roadmap: Navigating the Tech Traps Nobody Is Talking About," for practical guidance on adopting AI with fewer surprises.

Ryan leads the AWS solution architecture team at TTEC Digital. His team is focused on supporting customers and sales opportunities from small and medium businesses to global enterprises.