The Polymath Tool for All Your Audio and Document Needs



This content originally appeared on DEV Community and was authored by Karneeshkar V

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

I built a Command-Line Interface (CLI) tool designed to help users manage their medical and legal conversations more effectively.

This tool can transcribe audio files or calls with your doctor or Financial advisor, then organize and retrieve relevant insights to assist in decision-making.

The idea stems from a personal pain point — often during important medical or legal discussions, I found it difficult to:

  • Ask detailed follow-up questions
  • Recall key points accurately
  • Understand complex terminology on the spot

By using AssemblyAI’s accurate transcription, especially for domain-specific (medical/legal) vocabulary, the project came to life

All the CLI commands and flags can be found in the README.md
To set it up you will need this in your .env file

ASSEMBLY_AI_API_KEY=""
OPENAI_API_KEY=""
QDRANT_URL=""

Make sure to run Qdrant in your local system

Demo

Using Assembly AI for transcription and injecting it to rag

Using memory from past call from doctor

GitHub Repository

https://github.com/KarneeshkarV/-AssemblyAI-Domain-Expert-Voice-Agent

Technical Implementation & AssemblyAI Integration

  • Built using the Agno agent framework
  • Each domain-specific agent (medical or legal) is powered by a team of sub-agents
    • One for RAG (retrieval)
    • One for memory/context management
    • One for web search and knowledge lookups
    • So on ….
  • I used OpenAI models in the primary implementation due to cost-effectiveness, though I found Claude models to perform better in tool use during testing
  • Made some audio optimizations to effectively use TTS credits
  • Core transcription powered by AssemblyAI, enabling robust handling of domain-specific vocabulary

    Future Work

    I had plans to:

  • Make the entire injecting of data more easier and user Friendly
  • Integrate SIP Sorcery for capturing and analyzing VoIP call streams
  • Add another specialized agent focused on legal document processing

However, due to my time constraints — they remain on my Todo list!

I am all hears to know how I can improve this project


This content originally appeared on DEV Community and was authored by Karneeshkar V