Hybrid AI Emerges To Tame LLMs – And Not A Moment Too Soon

Instacart, HP, Salesforce and Twilio are addressing the Achilles heel of genAI.

The great potential of LLMs is significantly compromised by their Achilles heel: a deadly reliability problem. Predictive AI can address this problem – and that represents the next killer app for predictive AI. Enterprises such as Instacart, HP, Salesforce and Twilio are now adopting this inevitable, crucial pivot. Here’s the late breaking news on this movement.

Even as agentic AI hype overpromises, it serves to illuminate a prevailing goal, the enterprise’s inherent desire to press the limits of LLM-based autonomy. Companies want to deploy AI systems that perform roles rather than only tasks. They want machine “agency,” not only tools. (Of course they do; automation is the point of any machine.)

But AI hype has everyone tied up in knots. It cultivates internal tension: On one hand, you suffer from intense fear-of-missing-out (FOMO). On the other hand, you don’t want to fall for infeasible or ludicrous claims.

It’s easy as hell to think up an unrealistic goal for AI. And it’s almost as easy to work up a prototype, an impressive demo that just doesn’t scale. LLMs are so seemingly humanlike, people envision computers replacing all customer service agents, summarizing or answering questions about a collection of thousands of documents, taking on the wholesale role of a data scientist or even making a company’s executive decisions.

Even systems meant to achieve more modest “agentic” goals quickly become too unreliable to deploy at scale. For example, in a recent study conducted by AI startup Mercor, the very best of several competing LLM-based systems, Gemini 3 Flash, succeeded at only 24% of a testbed of tasks, such as: “Reply back to me with the P/E ratio for KVUE, rounded to two decimal points. Use the implied share price in the Discounted Cash Flow model and diluted EPS from the annual financials dated 12/23/2025.”

Good News And Even Better News

Good news: Hybrid AI can realize a good portion of AI’s often-audacious promise of autonomy. Predictive AI serves as a reliability layer that hands over to a human the cases and interactions most likely to go off the rails. This is exactly the kind of thing that predictive AI has for decades been used for: predicting a negative outcome in order to target human inspection and, potentially, intervention. This paradigm represents best practices for enterprise risk management in general. Machine learning models calculate per-case risk scores as to which transactions are most likely fraudulent, which bridges are most at risk for a collapse and which restaurants are in violation of health codes. An enterprise genAI project is but one more complex system that needs this kind of predictive management.

When applied to LLM-based systems, this move represents a pivotal admission that we sometimes need a (more expensive) human in the loop. This is an apt compromise to full autonomy. By adopting a sober, realistic view, rather than the widespread, FOMO-driven, euphoric view of supreme autonomy, we get our cake and can even eat a majority of it, too.

Even better news: It’s happening. I’ve been writing about this particular hybrid predictive/generative AI approach for more than a year – it seems inevitable as a viable solution to an impending crisis for genAI – and now it’s emerging indeed. Enterprises are actively going hybrid out of sheer necessity.

Twilio Predictively Tames Its Customer Service System

Communications leader Twilio has launched a conversational AI assistant that continually evolves. This system performs both customer support and sales roles, assisting the user by responding to questions and by proactively guiding throughout the customer lifecycle as the user increases their adoption of Twilio solutions.

The system’s guardrails detect potential missteps and place a hold when it may be about to, for example, go too far off topic, provide an incorrect price or make a promise it isn’t equipped to keep, such as, “I’ll check with my legal team.” Humans then review those more risky cases, either clearing the interaction to be “unpaused” or intervening as needed.

Healthcare Providers Predictively Tame Their Insurance Claims System

As health providers leverage genAI to complete insurance claims, great potential efficiencies must be balanced against the risk of submitting claims that are completed incorrectly or are otherwise unlikely to be approved. Enter predictive scoring: Each time an LLM-based system completes a claim, a predictive model determines the risk of the claim being denied before submitting it so that a human can review and potentially modify the high-risk claims. NextGen Healthcare’ Senior Data Scientist Oliver Shetler presented on this approach earlier this month at the conference I founded, Machine Learning Week’s HYBRID AI 2026.

Instacart Delivers Out-Of-Stock Replacements With Predictive AI

Instacart, which lets you order home delivery from most any grocery chain, faces an unavoidable challenge inherent to its domain: The person shopping on your behalf will sometimes find that the product you selected is out of stock. Predictive AI is well suited. By predicting which replacement item is most likely to satisfy the customer, Instacart can offer the customer a viable option – and, if confidence is high enough, it can proactively swap that item in, knowing that the chances that the customer will reject its delivery is low.

Salesforce And HP Tame GenAI Systems With Predictive AI

Salesforce and HP also implement this hybrid approach to tame LLM-based systems. One Salesforce project pursues this from a security perspective – not only for predicting when a system will fail in its objective, but more specifically when it will incur a security breach. Salesforce Staff Data Scientist Millie Huang also presented at HYBRID AI 2026, on “When AI Agents Go Rogue: Unmasking Risky Enterprise AI Behavior With Unsupervised Learning.”

At the same event, HP Principal Engineer Samaresh Kumar Singh presented on, “Hybrid AI In Production At HP: A Reliability Layer That Marries Predictive & GenAI (With Humans-In-The-Loop).”

LLM Reliability Is Predictive AI’s Next Killer App

It’s become a cliche that genAI prototypes are eye-popping and tantalizing – yet too unreliable to launch. To back off unrealistic “autonomous agent euphoria” and succeed in realizing genAI’s bold, often overzealous promise of autonomy – or at least a great deal of it – we’ve got to go hybrid by incorporating predictive AI as a standard practice. As this approach emerges as an increasingly common part of genAI-based solutions, it will represent the next “killer app” for predictive AI.

And yet, as much as genAI’s unreliability demands predictive AI, predictive AI projects are all too easy to mess up. Predictive AI has a notoriously poor track record. Outside Big Tech and a handful of other leading companies, most initiatives fail to deploy, never realizing value. Why? Data professionals aren’t equipped to sell deployment to the business. The technical performance metrics they typically report on do not align with business goals – and mean nothing to decision makers. But by moving to planning around business metrics, an as-yet uncommon but critical best practice, your predictive AI project will succeed, whether for taming your LLM-based system or any other predictive AI use case. By doing so, you can get your cake and even eat a majority of it, too.

Source link

Hybrid AI Emerges To Tame LLMs – And Not A Moment Too Soon

Good News And Even Better News

Twilio Predictively Tames Its Customer Service System

Healthcare Providers Predictively Tame Their Insurance Claims System

Instacart Delivers Out-Of-Stock Replacements With Predictive AI

Salesforce And HP Tame GenAI Systems With Predictive AI

LLM Reliability Is Predictive AI’s Next Killer App

LEAVE A REPLY Cancel reply

Netflix’s Best New No. 1 Show Lands A 90% Rotten Tomatoes Score

Lululemon Fires Back At Founder Chip Wilson In Turnaround Battle

Washington’s Biggest Weapons Against Iran Are Pressure And Patience

Target Lost Its ‘Tar-zhay.’ Wellness May Bring It Back

Cubs’ Former Southpaw Suddenly Ends Career At Age 26, Shortly After Cut

Related articles

Tackling Plastic Pollution With A Hemp-Based Alternative

Netflix’s Best New No. 1 Show Lands A 90% Rotten Tomatoes Score

Lululemon Fires Back At Founder Chip Wilson In Turnaround Battle

Washington’s Biggest Weapons Against Iran Are Pressure And Patience

Company

Latest news

Tackling Plastic Pollution With A Hemp-Based Alternative

Netflix’s Best New No. 1 Show Lands A 90% Rotten Tomatoes Score

Lululemon Fires Back At Founder Chip Wilson In Turnaround Battle

Popular news

Tackling Plastic Pollution With A Hemp-Based Alternative

Netflix’s Best New No. 1 Show Lands A 90% Rotten Tomatoes Score

Lululemon Fires Back At Founder Chip Wilson In Turnaround Battle