This content originally appeared on DEV Community and was authored by Kamal Rawat
AI feels like magic until you get your first bill.
When teams discuss whether to rent a general-purpose LLM (like GPT, Gemini, or Claude) or build their own smaller domain-specific model, the conversation often gets stuck on price tags and technical complexity. But there’s another critical detail that many articles gloss over: general LLMs don’t magically know your company’s data. If you want them to answer real product or order questions, you have to wire them into your systems.
This blog takes a clear look at both paths, using the same example of retail chatbot answering “Where’s my order?”—to highlight the tradeoffs.
Option A: Renting General-Purpose LLMs
At first glance, this feels like the easy button. You call GPT or Gemini’s API, pass in a customer question, and get a natural-language answer. But here’s the reality:
They don’t know your data out of the box
- GPT has no access to your product catalog, your order database, or your policies.
- If a customer asks “Where’s my order?” and you just pass that raw text to GPT, it will respond generically:
“You can usually track your order on the company’s website.”
Clearly, that’s not useful.
How companies make it work
To bridge the gap, teams layer in one (or both) of these approaches:
1. RAG (Retrieval-Augmented Generation)
- At runtime, your backend retrieves the needed info (e.g., from your order system).
-
Example flow:
- User: “Where’s my order #12345?”
- Backend queries DB → Order #12345: in transit, delivery tomorrow.
- This context is inserted into the GPT prompt:
Customer asked: "Where’s my order #12345?" Order system response: "In transit, delivery expected tomorrow." Respond politely.
- GPT outputs: “Your order #12345 is on the way and should arrive tomorrow.”
GPT didn’t “know” your data. You injected it just-in-time.
2. Fine-tuning / Custom Training
- You can fine-tune GPT on your company’s FAQs, chat transcripts, and policies.
- This ensures consistent tone and brand voice.
- But: fine-tuning still doesn’t give live access to customer data—you still need APIs or RAG for dynamic info.
Let’s do the math:
Say your chatbot processes 2 million tokens per day (1.2M input, 0.8M output).
Input: 1.2M × $75 / 1M = $90/day
Output: 0.8M × $150 / 1M = $120/day
Total = $210/day ≈ $6,300/month
Benefits
- No infra to manage.
- Constantly updated model quality.
- Fastest path to a working chatbot.
Option B: Building Your Own Domain Model
This is the opposite extreme: you train a small foundation model (say 7B parameters) on your own data + domain knowledge.
Why it’s attractive
- You own the weights → no per-call API fees.
- You can bake in domain knowledge deeply.
- Potentially cheaper long-term if usage is massive.
What it takes
1. Data preparation
- Collecting, cleaning, and labeling product info, chat history, policies.
- Cost can hit hundreds of thousands if annotation is manual.
2. Training infra
- A 7B parameter model needs multiple A100/H100 GPUs running for weeks.
- Infra costs can run into millions depending on training scale.
3. Inference Infrastructure
- Once trained, you still need GPU servers to host it.
- Each customer query requires an inference, which adds to your power consumption and can increase latency.
4. Maintenance
- You’re now responsible for updates, bias fixes, safety, scaling.
Benefits
- Total control.
- No API vendor lock-in.
- Can fine-tune deeply for efficiency.
Costs
- Initial build: high (millions).
- Ongoing hosting: significant.
- Only makes ROI sense at very high scale.
Comparing the Two Approaches
Factor | Renting GPT/Gemini | Building Own Domain Model |
---|---|---|
Access to your data | Needs RAG/fine-tuning integration | Fully embedded during training, but still needs APIs for live data |
Cost model | Pay per token | Pay upfront infra + ongoing GPU costs |
Time to deploy | Days/weeks | Months/years |
Control | Limited | Full |
Best for | Startups, mid-size orgs | Hyperscale, regulated industries |
The Key Takeaway
If you need a chatbot to answer “Where’s my order?”, GPT won’t magically know. You either:
- Inject the live order data (RAG),
- Or train/fine-tune it on your policies.
That’s why many companies start with Option A (renting), it’s pragmatic and fast. But if your volumes explode, costs spiral, or compliance requires self-hosting, Option B becomes worth considering.
Final Word
The debate isn’t really LLM vs. custom model. It’s about how you balance cost, control, and time-to-market. Smart teams often start with renting, layer in RAG/fine-tuning, and only move to building their own once the business case is undeniable.
That’s my breakdown. Curious, if you were building that retail chatbot, would you rent GPT forever or take the plunge on your own model?
This content originally appeared on DEV Community and was authored by Kamal Rawat