Set Up Your Own Personal AI Frankenstack: Summarized Version



This content originally appeared on DEV Community and was authored by Jay

A heavily summarized version (ramble free) of the long form article I posted previously.

You can read that one here:
https://dev.to/ghotet/set-up-your-own-personal-ai-frankenstack-diy-edition-309

Hey folks. I finally have a moment to sit down and lay out the blueprint for setting up your own AI stack, which I dubbed the “Frankenstack”—and it seems to have stuck haha.

This stack consists of:

  • LLM software
  • Stable Diffusion (image generation)
  • Text-to-speech (but not speech-to-text)
  • Web search for the LLM
  • All tied together through a unified front end

Just to clarify upfront: this isn’t a tutorial or step-by-step guide. I’m laying out the toolkit, giving notes and caveats for each piece of software. For example, I’ll list my machine specs and the LLMs I run to give you a realistic expectation. This stack is GPU/CPU hungry.

My Specs

  • Modified Alienware 15 R4 (circa 2018)
  • Nvidia GTX 1070 8GB (laptop GPU)
  • Nvidia RTX 3060 12GB (AGA external GPU dock)
  • Intel i7 (check model)
  • 32GB RAM
  • All drives are NVMe
  • Stack uses ~120GB including ~8 LLM/SD models

LLM

LM Studio was my choice:

  • Offers an in-depth front end with performance tuning and experimental features
  • Allows offloading KV cache for faster performance (quality may vary)
  • Lets you run multiple models simultaneously (if your system can handle it)
  • Easy download of models directly from Hugging Face

I recommend trying it before asking about alternatives like Ollama. I’ve used Ollama in CLI mode, but I wasn’t a fan personally.

Models I use:

  • GPT-OSS 20B – My favorite for reasoning. Adjustable low/medium/high settings. Low ~2s, High ~2min. Only runs 3-4B parameters at a time, so lighter on resources. Trained for tool use.
  • Mythalion 13B – Creative writing, fast, decent chat, good for Stable Diffusion prompts. Not for code.
  • Deepseek-Coder (R1) – Strictly for complex scripts. Slowest model, but handles long code reliably.

Vision models:

  • I haven’t used these extensively; if you need vision, try a 7B model and test. Smaller models may be better for limited VRAM.
  • Parameter count isn’t always indicative of performance; adjust based on GPU capacity.

Stable Diffusion (Image Generation)

I use A1111:

  • Straightforward GUI with deep settings for LoRA training, img2img, VAE support
  • I mainly use it for cover art or character concepts
  • Default model: RevAnimated
  • ComfyUI is an alternative but more node-based; I didn’t use it

Text-to-Speech

Chatterbox – 100% recommend:

  • Local alternative to ElevenLabs
  • Streams in chunks for faster playback
  • Supports voice cloning via ResembleAI: just a 10-second clip for a new voice
  • Swap default voice by editing the relevant script (check GitHub for details)
  • Other options (Tortoise, Coqui) were worse in my experience.

Web Search

SearXNG – acts like a meta-search engine:

  • Searches multiple engines at once (Google, DuckDuckGo, Brave, etc.)
  • AI can query several sources in one shot
  • I run it through Cloudflare Warp for privacy; Tor is optional

Frontend

OpenWebUI – central control hub:

  • Configure multiple models, knowledge bases, tools
  • Evaluate LLM responses, run pipelines, execute code, manage databases
  • TTS autoplay option in user settings; speaker icon for manual playback
  • Offline mode available (set Offline_Mode = true)
  • Customize branding freely; commercial use over 50 users may require paid plan

Custom prompts/personas:

  • Set base prompt in LM Studio
  • OpenWebUI admin panel allows high-priority prompts
  • Per-user prompts can be layered on top

Linux Launcher Script

  • I created a aistart alias to sequentially launch all components for proper resource allocation
  • LM Studio doesn’t auto-load the last model yet
  • Debug launcher opens multiple terminals for monitoring
  • Important: GPU assignment isn’t always respected automatically; check NVIDIA settings

Why Not Docker?

  • Docker caused localhost address issues on Linux
  • Added dependencies can break the stack; simpler is better
  • Windows may not have this issue

Connecting to the Web

  • Requires domain and Cloudflare tunnel
  • Tunnel forwards traffic to OpenWebUI on your local machine
  • Lets you access the stack anywhere, including mobile
  • ChatGPT or documentation can guide setup quickly

Final Thoughts

  • DO NOT expect this to run perfectly on first try
  • Troubleshooting is part of the fun and rewarding
  • Experiment, iterate, optimize
  • Full tutorial may come later for both OS

Best of luck, have fun, and remember: the pain of troubleshooting makes the success sweeter.

// Ghotet


This content originally appeared on DEV Community and was authored by Jay