Streaming Responses with Gaia Nodes

August 19, 2025

This content originally appeared on DEV Community and was authored by Harish Kotra (he/him)

Gaia nodes provide streaming capabilities similar to OpenAI’s APIs. By default, when you request a completion from a Gaia node, the entire completion is generated before being sent back in a single response.

If you’re generating long completions, waiting for the response can take many seconds. To get responses sooner, you can ‘stream’ the completion as it’s being generated. This allows you to start printing or processing the beginning of the completion before the full completion is finished.

To stream completions, set stream=True when calling the chat completions endpoints. This will return an object that streams back the response as data-only server-sent events. Extract chunks from the delta field rather than the message field.

Prerequisites

import time
from openai import OpenAI

1. What a typical chat completion response looks like

With a typical ChatCompletions API call, the response is first computed and then returned all at once.

# record the time before the request is sent
start_time = time.time()

# send a ChatCompletion request to count to 100
response = client.chat.completions.create(
    model='llama',
    messages=[
        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
    ],
    temperature=0
)

The reply can be extracted with response.choices[0].message.
The content of the reply can be extracted with response.choices[0].message.content.

2. How to stream a chat completion

With a streaming API call, the response is sent back incrementally in chunks via an event stream. In Python, you can iterate over these events with a for loop.

response = client.chat.completions.create(
    model='llama',
    messages=[
        {'role': 'user', 'content': "What's 1+1? Answer in one word."}
    ],
    temperature=0,
    stream=True  # this time, we set stream=True
)

for chunk in response:
    print(chunk)
    print(chunk.choices[0].delta.content)
    print("****************")

As you can see above, streaming responses have a delta field rather than a message field. The delta can contain:

A role token (e.g., {"role": "assistant"})
A content token (e.g., {"content": "text"})
Nothing when stream is over

3. How much time is saved by streaming a chat completion

Let’s look at how quickly we receive content with streaming:

start_time = time.time()

response = client.chat.completions.create(
    model='llama',
    messages=[
        {'role': 'user', 'content': 'Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ...'}
    ],
    temperature=0,
    stream=True
)

collected_chunks = []
collected_messages = []

for chunk in response:
    chunk_time = time.time() - start_time
    collected_chunks.append(chunk)
    chunk_message = chunk.choices[0].delta.content  
    collected_messages.append(chunk_message)
    print(f"Message received {chunk_time:.2f} seconds after request: {chunk_message}")

With streaming:

First token arrives quickly (often <0.5s)
Subsequent tokens arrive every ~0.01-0.02s
User sees partial responses immediately

Without streaming:

Must wait for full response (often several seconds)
No intermediate feedback

Choose streaming when you want to:

Show partial results immediately
Provide responsive user experience
Handle long responses gracefully

Credits

Inspired by this example.

This content originally appeared on DEV Community and was authored by Harish Kotra (he/him)

ai gaia localai