How I Made a Voice-Activated AI Assistant That Actually Controls My Smart Home



This content originally appeared on Level Up Coding – Medium and was authored by Dr. Ameer Hamza Mahmood

A Deep Dive Into Python, Speech Recognition, and Home Automation

Let’s be honest: most “AI voice assistants” today are dumb.
They’ll play your Spotify playlist, sure — but try asking them to turn off the study lights and start your humidifier if the room humidity is below 40%, and they’ll choke.

Photo by Clay Banks on Unsplash

So I built my own.

Using Python, SpeechRecognition, OpenAI GPT-4, and the Home Assistant API, I created an AI-powered home assistant that understands complex natural language commands and triggers real IoT devices — lights, fans, thermostats, or anything else in my home network.

It doesn’t just listen. It thinks.

⚙ The Tech Stack

  • Python 3.11+
  • SpeechRecognition — converts voice → text
  • PyAudio — microphone access
  • OpenAI GPT-4 API — for natural-language command parsing
  • Home Assistant REST API — triggers real devices
  • Requests + JSON — handles HTTP calls
  • Threading / Asyncio — enables real-time responses
  • (Optional) ElevenLabs or pyttsx3 — converts text → speech

🗣 Step 1: Capture the Voice Input

You can’t control your home if your assistant can’t hear you properly.

Let’s start by getting your voice as clean text.

import speech_recognition as sr
def listen_command():
recognizer = sr.Recognizer()
with sr.Microphone() as source:
print("🎙 Listening for your command...")
recognizer.adjust_for_ambient_noise(source)
audio = recognizer.listen(source)
try:
command = recognizer.recognize_google(audio)
print(f"🗣 You said: {command}")
return command.lower()
except sr.UnknownValueError:
print("❌ Sorry, I didn’t catch that.")
return ""

This snippet uses Google’s free speech recognition engine. For better privacy, you can swap it with Whisper (OpenAI’s model) for on-device processing.

🤖 Step 2: Parse the Command With GPT-4

This is where the intelligence lives.

We’ll feed your raw voice command to GPT-4 and ask it to extract a structured action — something like:

{"device": "bedroom_light", "action": "turn_off"}

Here’s the parsing logic:

import openai, json
openai.api_key = "YOUR_OPENAI_KEY"
def parse_command_with_gpt(command):
prompt = f"""
You are a smart home parser. Convert this command into structured JSON.
Example: "Turn off the bedroom light" → {{"device": "bedroom_light", "action": "turn_off"}}
Command: "{command}"
"""
response = openai.ChatCompletion.create(
model="gpt-4-turbo",
messages=[
{"role": "system", "content": "You translate natural language into device actions."},
{"role": "user", "content": prompt}
],
temperature=0.1
)
try:
parsed = json.loads(response.choices[0].message["content"])
return parsed
except Exception as e:
print("⚠ Parsing failed:", e)
return None

The magic here is semantic understanding — GPT isn’t just keyword-matching. It knows what you mean.
You can say “make it cozy in the living room” and it’ll infer to dim lights + lower AC.

💡 Step 3: Connect to Home Assistant API

Now that we know the intent, let’s trigger real devices.

import requests
HOME_ASSISTANT_URL = "http://192.168.1.2:8123/api/services/"
TOKEN = "YOUR_LONG_LIVED_ACCESS_TOKEN"
def trigger_device(device, action):
headers = {
"Authorization": f"Bearer {TOKEN}",
"content-type": "application/json",
}
if device == "bedroom_light":
endpoint = "light/turn_off" if action == "turn_off" else "light/turn_on"
data = {"entity_id": "light.bedroom_light"}
elif device == "humidifier":
endpoint = "switch/turn_on"
data = {"entity_id": "switch.humidifier"}
else:
print("Unknown device:", device)
return
response = requests.post(HOME_ASSISTANT_URL + endpoint, headers=headers, json=data)
print("✅ Device response:", response.status_code)

This snippet talks directly to your Home Assistant server, which controls your smart devices via MQTT, Zigbee, or Wi-Fi.

🔄 Step 4: Connect Everything Together

Let’s put the voice → intent → action pipeline into a single loop.

def main():
print("🤖 Smart Assistant Ready!")
while True:
command = listen_command()
if not command:
continue
if "stop" in command or "exit" in command:
print("👋 Exiting assistant.")
break
parsed = parse_command_with_gpt(command)
if parsed:
trigger_device(parsed["device"], parsed["action"])
else:
print("❌ Couldn't understand the command.")
if __name__ == "__main__":
main()

🔊 Step 5: Add Voice Feedback (Optional)

You can make your AI assistant talk back using pyttsx3 or ElevenLabs.

import pyttsx3
engine = pyttsx3.init()
def speak(text):
engine.say(text)
engine.runAndWait()

Integrate this into trigger_device() to say things like:

“Turning off bedroom light.”

🧠 Step 6: Add Context Awareness

You can store state between commands to make the system conversational.

For example:

  • “Turn on the light.” → (It remembers you were last in the bedroom.)
  • “Make it cooler.” → (Knows to lower the thermostat.)

Add a simple context dictionary:

context = {"last_room": "bedroom"}
def update_context(parsed):
if "device" in parsed and "room" in parsed["device"]:
context["last_room"] = parsed["device"].split("_")[0]

Then modify the GPT-4 prompt to include this context before interpreting the command.

🌙 Step 7: Automate Conditional Logic

Want automation beyond commands?
We can schedule or trigger based on sensors or weather APIs.

Example: Turn on bedroom AC when humidity > 70%.

import schedule, time
def monitor_humidity():
humidity = get_sensor_value("sensor.bedroom_humidity")
if humidity > 70:
trigger_device("bedroom_ac", "turn_on")
speak("Turning on the AC to reduce humidity.")
schedule.every(5).minutes.do(monitor_humidity)
while True:
schedule.run_pending()
time.sleep(1)

⚙ Step 8: Add Multi-Command Parsing

Let GPT handle multiple instructions in one go:

“Turn off the living room lights and start the air purifier.”
def parse_multi_command(command):
prompt = f"""
Break this into multiple structured actions:
'{command}'
Example output:
[
{{"device": "living_room_light", "action": "turn_off"}},
{{"device": "air_purifier", "action": "turn_on"}}
]
"""
response = openai.ChatCompletion.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": prompt}]
)
return json.loads(response.choices[0].message["content"])

🏡 Step 9: Real-World Demo

You: “Hey PyHome, I’m going to bed.”
Assistant: “Okay. Turning off lights, locking doors, and setting alarm.”
(All devices trigger instantly via Home Assistant API.)

It’s almost eerie how natural it feels once you fine-tune the intent parsing and responses.

💬 Step 10: Make It Truly Conversational

Integrate a GPT-based chat layer that remembers context across sessions — your preferences, lighting moods, or routines.

Example prompt:

“It’s movie night.”

Your assistant responds:

“Setting living room lights to dim red, turning on the projector, and pausing notifications.”

This is where AI assistants start feeling personal — not generic.

⚡ Bonus: Use Whisper for Better Accuracy

Replace Google Speech Recognition with OpenAI Whisper:

import whisper
model = whisper.load_model("base")
def listen_whisper():
result = model.transcribe("input_audio.wav")
return result["text"]

It’s multilingual and far more accurate in noisy environments.

🧩 Advanced Ideas

  • Add emotion detection using voice tone → adapt responses
  • Use face recognition (OpenCV) to identify users
  • Integrate calendar + weather APIs for proactive reminders
  • Add Streamlit dashboard for live status of all devices
  • Create local cache for GPT responses to reduce latency

💡 Key Takeaways

  • Python is powerful enough to control your home.
  • GPT-4 gives human-level understanding to your voice commands.
  • Home Assistant API ties everything together with actual hardware.
  • And yes — you can do all this without cloud dependence or expensive Alexa devices.

🧠 Pro Tip

“If you can describe it, you can automate it.”

The next decade of AI assistants won’t be about pre-set routines — they’ll reason through your requests like humans do.

This project is your first real step toward that.


How I Made a Voice-Activated AI Assistant That Actually Controls My Smart Home was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding – Medium and was authored by Dr. Ameer Hamza Mahmood