§ 00 — LOADING STACK
◆
LangGraphLangGraph
Azure OpenAIAzure OpenAI
QdrantQdrant
Arize PhoenixArize Phoenix
LLangfuse
MMastra
Next.jsNext.js
SSupabase
LANGGRAPH ◆ AZURE ◆ QDRANT ◆ ARIZE ◆ LANGFUSE ◆ MASTRA ◆ NEXT.JS ◆ SUPABASE ◆ LANGGRAPH ◆ AZURE ◆ QDRANT ◆ ARIZE ◆ LANGFUSE ◆ MASTRA ◆ NEXT.JS ◆ SUPABASE
KKomal Vardhan.
HomeWorkAboutWritingResourcesContact
HomeWorkWritingResourcesAboutContact
Build like an engineer. Teach like a friend.

© 2026 Komal Vardhan Lolugu

Sitemap
  • Home
  • Work
  • About
  • Writing
  • Contact
  • Resources
Elsewhere
  • LinkedIn · 3.5K
  • Medium · Writing
  • Instagram
  • GitHub
  • Topmate
Newsletter

A field note every other Sunday. No hype, no AI spam. Unsubscribe anytime.

Designed & built by Komal. Made in India.
← All work
2025 · Agentic AILiveLead Engineer

Visit Agent

A conversational AI backend supporting voice interactions for structured activity logging. Used daily by field teams to log customer visits hands-free via a live audio session.

20 minAvg. logging time (down from 1–2 hrs)
6+Structured fields extracted per session
100%Hands-free — no keyboard required
DailyUsed in active production
§ 01

The Problem

Field teams were spending 1–2 hours after every customer visit manually entering activity logs — context decayed fast, data was incomplete, and reps resented the overhead. The organisation had no reliable visit-level data to train on or forecast from. A task that should take minutes was eating hours every day.

§ 02

The Solution

Built a voice-first agent that lets a rep speak naturally after a visit. The system captures the audio session, extracts 6+ structured fields — customer, outcome, summary, action items, blockers, and a knowledge graph of relationships — and commits everything in under 20 minutes. No keyboard, no forms. Every session is fully traced in Langfuse with LLM-as-a-Judge evaluators for coverage and hallucination, so the team can see exactly what was captured and why.

§ 02b

How it works

01
Rep speaks after visit

The rep starts a session and speaks naturally — what happened, who they met, what was agreed. No forms, no typing.

02
Live audio session

Audio streams in real time over a secure WebSocket. The model listens, responds, and asks follow-up questions to fill any gaps.

03
Background extraction

Once the session ends, workers automatically extract a structured summary, action items, blockers, and a knowledge graph of customer relationships from the transcript.

04
Traced and evaluated

Every session is logged in Langfuse with LLM-as-a-Judge evaluators scoring coverage and hallucination. The team has full visibility into what was captured and why.

§ 03

What I Learnt

  • 01

    One-shot session tickets with a short TTL for WebSocket auth prevent replay attacks without needing a full OAuth flow — a clean pattern for any real-time backend.

  • 02

    Separating live audio from post-call extraction was the right call: the audio path is fire-and-forget at low latency; the extraction jobs are async, retriable, and isolated from the voice experience.

  • 03

    Langfuse prompt management in production means zero deploys to iterate on the system prompt — fetch the live version at session start and changes are immediate.

  • 04

    Structured extraction with Pydantic response formats is significantly more reliable than raw prompting for typed fields — ActionItemList and summary responses parse cleanly even on messy transcripts.

§ 04

Technologies Used

FastAPIFastAPI

Async API server and WebSocket proxy to Azure OpenAI Realtime

Azure OpenAI RealtimeAzure OpenAI Realtime

Browser-to-model audio — STT, TTS, and VAD handled natively by the model

LangGraphLangGraph

Typed state machine for transcript persistence and conversation state

Redis + RQRedis + RQ

One-shot session tickets, distributed locks, and post-call background job queue

LLangfuse

Per-turn traces, session-level evals, and prompt version management

QdrantQdrant

Vector store for customer knowledge graph embeddings

MongoDBMongoDB

Primary persistence — voice logs, sessions, customers, transcripts

OpenFGAOpenFGA

Fine-grained authorisation — ownership and role-based access per voice log

PythonPython

Core agent, tool-call implementation, and structured extraction workers

FastAPIFastAPI
Azure OpenAI RealtimeAzure OpenAI Realtime
LangGraphLangGraph
Redis + RQRedis + RQ
LLangfuse
QdrantQdrant
MongoDBMongoDB
OpenFGAOpenFGA
PythonPython
← All workWork together ↗
← PreviousMemory AgentNext →Profit Intelligence