Set Up Your Own Personal AI Frankenstack: Summarized Version

September 4, 2025

This content originally appeared on DEV Community and was authored by Jay

A heavily summarized version (ramble free) of the long form article I posted previously.

You can read that one here:
https://dev.to/ghotet/set-up-your-own-personal-ai-frankenstack-diy-edition-309

Hey folks. I finally have a moment to sit down and lay out the blueprint for setting up your own AI stack, which I dubbed the “Frankenstack”—and it seems to have stuck haha.

This stack consists of:

LLM software
Stable Diffusion (image generation)
Text-to-speech (but not speech-to-text)
Web search for the LLM
All tied together through a unified front end

Just to clarify upfront: this isn’t a tutorial or step-by-step guide. I’m laying out the toolkit, giving notes and caveats for each piece of software. For example, I’ll list my machine specs and the LLMs I run to give you a realistic expectation. This stack is GPU/CPU hungry.

My Specs

Modified Alienware 15 R4 (circa 2018)
Nvidia GTX 1070 8GB (laptop GPU)
Nvidia RTX 3060 12GB (AGA external GPU dock)
Intel i7 (check model)
32GB RAM
All drives are NVMe
Stack uses ~120GB including ~8 LLM/SD models

LLM

LM Studio was my choice:

Offers an in-depth front end with performance tuning and experimental features
Allows offloading KV cache for faster performance (quality may vary)
Lets you run multiple models simultaneously (if your system can handle it)
Easy download of models directly from Hugging Face

I recommend trying it before asking about alternatives like Ollama. I’ve used Ollama in CLI mode, but I wasn’t a fan personally.

Models I use:

GPT-OSS 20B – My favorite for reasoning. Adjustable low/medium/high settings. Low ~2s, High ~2min. Only runs 3-4B parameters at a time, so lighter on resources. Trained for tool use.
Mythalion 13B – Creative writing, fast, decent chat, good for Stable Diffusion prompts. Not for code.
Deepseek-Coder (R1) – Strictly for complex scripts. Slowest model, but handles long code reliably.

Vision models:

I haven’t used these extensively; if you need vision, try a 7B model and test. Smaller models may be better for limited VRAM.
Parameter count isn’t always indicative of performance; adjust based on GPU capacity.

Stable Diffusion (Image Generation)

I use A1111:

Straightforward GUI with deep settings for LoRA training, img2img, VAE support
I mainly use it for cover art or character concepts
Default model: RevAnimated
ComfyUI is an alternative but more node-based; I didn’t use it

Text-to-Speech

Chatterbox – 100% recommend:

Local alternative to ElevenLabs
Streams in chunks for faster playback
Supports voice cloning via ResembleAI: just a 10-second clip for a new voice
Swap default voice by editing the relevant script (check GitHub for details)
Other options (Tortoise, Coqui) were worse in my experience.

Web Search

SearXNG – acts like a meta-search engine:

Searches multiple engines at once (Google, DuckDuckGo, Brave, etc.)
AI can query several sources in one shot
I run it through Cloudflare Warp for privacy; Tor is optional

Frontend

OpenWebUI – central control hub:

Configure multiple models, knowledge bases, tools
Evaluate LLM responses, run pipelines, execute code, manage databases
TTS autoplay option in user settings; speaker icon for manual playback
Offline mode available (set Offline_Mode = true)
Customize branding freely; commercial use over 50 users may require paid plan

Custom prompts/personas:

Set base prompt in LM Studio
OpenWebUI admin panel allows high-priority prompts
Per-user prompts can be layered on top

Linux Launcher Script

I created a aistart alias to sequentially launch all components for proper resource allocation
LM Studio doesn’t auto-load the last model yet
Debug launcher opens multiple terminals for monitoring
Important: GPU assignment isn’t always respected automatically; check NVIDIA settings

Why Not Docker?

Docker caused localhost address issues on Linux
Added dependencies can break the stack; simpler is better
Windows may not have this issue

Connecting to the Web

Requires domain and Cloudflare tunnel
Tunnel forwards traffic to OpenWebUI on your local machine
Lets you access the stack anywhere, including mobile
ChatGPT or documentation can guide setup quickly

Final Thoughts

DO NOT expect this to run perfectly on first try
Troubleshooting is part of the fun and rewarding
Experiment, iterate, optimize
Full tutorial may come later for both OS

Best of luck, have fun, and remember: the pain of troubleshooting makes the success sweeter.

// Ghotet

This content originally appeared on DEV Community and was authored by Jay

ai opensource