LLM-Orchestrated Upsell

 

LLM-Orchestrated Upsell

LLM-Orchestration in action, with tool calls and reasoning

 

Context & Goal

We began with a simple idea: send an upsell text to customers with next-day appointments and convert a small percentage into add-on services. The early system was intentionally lightweight—good enough to test demand, but not built for speed or scale. My goal was to prove lift, then systematically engineer conversion, coverage, and operational fit.

V1 (March): Human-in-the-Loop, Template-Driven

The first iteration sent a templated SMS to all customers with upcoming appointments, filtered by a basic eligibility report. When customers replied, conversations were routed into a simple Twilio-based console where customer care agents handled responses. We converted ~3% of total sends, but response times were slow (often 60+ minutes) and coverage was limited to business hours. The team also absorbed new volume of non-upsell intents—reschedules, questions, and general service requests—which diluted focus.

Key Bottlenecks

Three issues emerged quickly: latency, availability, and context. Latency hurt because customers decide within minutes, not hours. Availability hurt because replies arrived at all times, including evenings and weekends. And context hurt because agents needed to look up account details, available offers, and scheduling constraints before crafting a reply, which made responses inconsistent and slow.

 
 

V3 (September): Multi-Agent Orchestration

Next, we split the work into specialized agents to reduce prompt bloat and improve reasoning. A Customer Context Agent assembled a customer profile from internal systems. An Eligibility Agent selected the best add-on, considering history, location, and seasonality. A Message Creation Agent crafted unique outreach, and a Send-Gate Agent suppressed risky or duplicate sends, reducing volume ~15% while improving quality. A leaner SMS Interaction Agent handled back-and-forth with tools. With this architecture, conversion rose to ~10% on total sends.

Benchmarking & Regression Checks

We added benchmarking to compare new prompts and policies against a held-out set of historical conversations. This made it clear when a change introduced regressions (for instance, a cancellation-handling quirk we quickly fixed). The result was faster iteration with fewer surprises in production.

V4 (October): Variables, Reuse, and Scale

We broke large prompts into smaller, named variables and constraints, making the system easier to tune and safer to deploy. Critically, we reused the Eligibility and Message Creation agents inside our technician app and customer care app. That allowed field and phone teams to offer the same personalized upsells during live interactions, turning the orchestration engine into a shared capability across channels. By this stage, the program was generating roughly $13,000/day in incremental ARR.

 

split up system prompts into manageable, reusable chunks

 

Outcomes

Conversion improved from ~3% → ~5% → ~10% while message volume stayed controlled and CSAT remained healthy. Median response time dropped from ~60 minutes to seconds, and coverage expanded to 24/7 across time zones. The orchestration pattern (context → eligibility → create → gate → converse) became a reusable asset adopted by adjacent teams.

What I Did

I defined the architecture, guardrails, and success metrics; wrote and refactored the core prompts; and built the evaluation and rollback pathways. I partnered with engineering on tool integrations and with operations on escalation rules and training. I also led the change-management work: communicating wins, capturing feedback loops, and aligning incentives so the system stuck.

Why It Worked

This worked because we treated “speed + fit” as first-class goals. We separated concerns across agents to improve reasoning, applied a strict send-gate to protect customers, and reused the same decision logic wherever humans talk to customers. The system balanced autonomy with safe handoffs, which made it both scalable and trustworthy.