flow-run: LLM Orchestration, Prompt Testing & Cost Monitoring



This content originally appeared on DEV Community and was authored by Vitalii Honchar

Introduction

Over the past couple of years, I’ve been observing a trending phenomenon on X (Twitter): “build in public.” Developers building products share screenshots, code snippets, and progress updates from their projects, posting them with the hashtag #buildinpublic.

While this trend is fascinating, the projects being showcased are typically closed source and proprietary. I believe that #buildinpublic should be truly public, with projects being open sourced from day one.

That’s why I’m excited to announce my new open source project flow-run, which I’ll build completely in public and document every step of the journey. The source code will be available on GitHub from the very first day of development.

The idea for this project was inspired by my previous product ai-svc, developed for AI Founder, which I described here: Building ai-svc: A Reliable Foundation for AI Founder

Context

AI Engineering is an emerging trend, much like building in public, and we’re seeing an explosion of AI-native applications being released. While I enjoy building AI-native applications myself, they all share common challenges:

  • LLM providers lack reliability (I previously published an article exploring this issue)
  • LLM prompt development differs fundamentally from classical programming, yet modern approaches tightly couple prompts with application code
  • AI application frameworks are limited to specific languages like Python or TypeScript, restricting the choice of programming languages for efficient application development

Drawing from my infrastructure experience, LLM integration resembles an infrastructure component rather than an application component. Prompts are remarkably similar to SQL queries – we have a dedicated engine (the LLM) that executes prompts, and we send requests to it. So why should we restructure our applications to embed prompts directly in application code when this is fundamentally an infrastructure concern?

The same principle applies to prompt versioning. While prompts resemble SQL, they’re far more complex. It’s insufficient to run only integration and load tests to verify functionality. With prompts, we need evaluation tests to ensure new versions perform better than previous ones. Embedding prompts within application code makes these tests unnecessarily complex.

Through reflection on these challenges, a product idea crystallized:

  • What if we treated prompts as code, similar to Infrastructure as Code tools like Terraform?
  • What if prompt execution was completely decoupled from application execution, with applications simply calling an execution engine?
  • What if prompt developers could focus exclusively on prompt development, evaluation, and deployment?

User Stories

Let’s define the scope and requirements for this project. I’ll use the user stories technique to understand user needs and derive project requirements from them.

User Personas

I’m building flow-run for two distinct user personas:

User Persona 1: Prompt Developer

  • Role: Develop, debug, evaluate, and deploy prompts
  • Background: Former software engineer with knowledge of building software products and using development tools
  • Primary Pain Point: No unified approach to prompt development; constantly writing Python scripts for quick testing, implementing workarounds for prompt evaluation

User Persona 2: Application Developer

  • Role: Develop, debug, and deploy application servers
  • Background: Software engineer with expertise in building software products and using development tools
  • Primary Pain Point: Lacks time or specialized knowledge for prompt development; focuses primarily on business logic implementation; existing LLM integrations are unreliable

Prompt Developer User Stories

US-1-1: Develop Prompts

As a Prompt Developer, I want to define my prompts without traditional programming languages like Python, while maintaining the benefits of source code versioning (Git) and syntax highlighting (IDE).

US-1-1: Develop Prompts

US-1-2: Test prompts

As a Prompt Developer, I want to test my prompts immediately after development without writing custom Python code or waiting for CI builds to execute.

US-1-2: Test prompts

US-1-3: Evaluate prompt versions

As a Prompt Developer, I want to evaluate newly developed prompts against their production versions to ensure the new version performs better than the previous one.

US-1-3: Evaluate prompt versions

US-1-4: Deploy prompts

As a Prompt Developer, I want to deploy my prompts to dev and prod environments easily and reliably.

US-1-4: Deploy prompts

US-1-5: Automated prompts testing

As a Prompt Developer, I want to automate my prompt testing and run tests in CI after each Git push.

US-1-6: Automated prompts deployment

As a Prompt Developer, I want to automate prompt deployments through CD after each Git push to the main branch.

US-1-7: Prompts Workflows

As a Prompt Developer, I want to build workflows with my prompts where each step executes sequentially.

US-1-7: Prompts Workflows

US-1-8: Prompts Agents

As a Prompt Developer, I want to build agents with my prompts and incorporate them into workflows as described in US-1-7.

US-1-8: Prompts Agents

US-1-9: Observability & Costs Management

As a Prompt Developer, I want to monitor prompt execution and track LLM costs.

US-1-9: Observability & Costs Management

US-1-10: Easy LLM swap

As a Prompt Developer, I want to switch LLM providers easily without extensive code changes.

Application Developer User Stories

US-2-1: Execute AI Flow

As an Application Developer, I want to reliably execute AI Flows defined by Prompt Developers in the flow-run service to add AI integration to my application.

US-2-1: Execute AI Flow

US-2-2: Get AI Flow results

As an Application Developer, I want to retrieve AI Flow results from the flow-run service when they’re ready.

US-2-2: Get AI Flow results

Project Requirements

Based on the defined user stories, I can establish the following project requirements:

  • Reliable execution of AI flows using fire-and-forget semantics with guaranteed execution
  • Infrastructure-as-Code approach for prompt development and deployment
  • CLI tool for running, evaluating, and deploying prompts
  • CI/CD support for prompt testing and deployment
  • Support for common AI flow abstractions: tasks, workflows, and agents
  • Observability and cost reporting for AI flow executions
  • Multi-LLM provider support within the execution engine

Project Roadmap

Stage User Story Description
v1 US-1-1: Develop prompts Enable prompt developers to create prompts
v1 US-1-7: Prompts Workflows Implement workflow support in flow-run
v1 US-1-4: Deploy prompts Enable prompt deployment capabilities
v1 US-1-10: Easy LLM swap Support LLM provider switching from day one
v1 US-2-1: Execute AI Flow Enable application developers to execute developed prompts
v1 US-2-2: Get AI Flow results Enable retrieval of execution results
v2 US-1-2: Test prompts Improve prompt quality in flow-run
v2 US-1-3: Evaluate prompt versions Enable prompt evolution without quality degradation
v2 US-1-9: Observability & Costs Management Add monitoring capabilities and cost tracking
v2 US-1-5: Automated prompts testing Enable CI integration for testing
v2 US-1-6: Automated prompts deployment Enable CD integration for deployments
v3 US-1-8: Prompts Agents Implement support for prompt agents

Roadmap Explanation:

  • v1 stage delivers a minimum viable product supporting basic prompt development and workflow execution. This enables applications to begin integrating with flow-run without waiting for full feature completion
  • v2 stage introduces improvements in prompt testing and observability capabilities
  • v3 stage implements AI Agent support, which represents a complex feature requiring dedicated development focus

Conclusions

Thank you for reading this announcement! I’m thrilled to launch this truly public project that I’ve been contemplating for the past 2-3 years. In upcoming articles, I’ll cover the system design and share regular progress updates. All source code will be available on GitHub throughout the development journey.

The idea of current project was inspired by my previous product ai-svc developed for AI Founder which I described here: Building ai-svc: A Reliable Foundation for AI Founder


This content originally appeared on DEV Community and was authored by Vitalii Honchar