Streaming

Streaming allows you to receive responses token by token as they’re generated, enabling real-time output in your applications.

Basic Streaming

Python
TypeScript

from lunar import Lunar

client = Lunar()

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a short story about a robot."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

import { Lunar } from "lunar";

const client = new Lunar();

const stream = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Write a short story about a robot." }],
  stream: true,
});

for await (const chunk of stream) {
  if (chunk.choices[0].delta.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}

Stream Response Structure

Each chunk is a ChatCompletionChunk object:

Field	Type	Description
`id`	`str`	Completion identifier
`object`	`str`	`"chat.completion.chunk"`
`created`	`int`	Unix timestamp
`model`	`str`	Model used
`choices`	`list`	List of streaming choices

Choice Delta

Python
TypeScript

chunk.choices[0].index          # Position in choices list
chunk.choices[0].delta.role     # Role (only in first chunk)
chunk.choices[0].delta.content  # Content fragment
chunk.choices[0].finish_reason  # None until final chunk, then "stop"

chunk.choices[0].index          // Position in choices list
chunk.choices[0].delta.role     // Role (only in first chunk)
chunk.choices[0].delta.content  // Content fragment
chunk.choices[0].finish_reason  // null until final chunk, then "stop"

Collecting Full Response

Python
TypeScript

full_response = ""

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        full_response += content
        print(content, end="", flush=True)

print()  # New line
print(f"Complete: {full_response}")

let fullResponse = "";

for await (const chunk of stream) {
  const content = chunk.choices[0].delta.content;
  if (content) {
    fullResponse += content;
    process.stdout.write(content);
  }
}

console.log(); // New line
console.log(`Complete: ${fullResponse}`);

Async Streaming (Python)

from lunar import AsyncLunar

async def stream_response():
    async with AsyncLunar() as client:
        stream = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Tell me a joke."}],
            stream=True
        )

        async for chunk in stream:
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)

The TypeScript SDK is async by default — use for await...of to iterate over the stream as shown in the examples above.

Aborting a Stream (TypeScript)

The TypeScript SDK returns a Stream object with an abort() method:

const stream = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Write a long essay." }],
  stream: true,
});

// Abort the stream early
setTimeout(() => stream.abort(), 5000);

for await (const chunk of stream) {
  if (chunk.choices[0].delta.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}

Detecting Stream End

Python
TypeScript

for chunk in stream:
    # Check for final chunk
    if chunk.choices[0].finish_reason:
        print(f"\nStream ended: {chunk.choices[0].finish_reason}")
        break

    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

for await (const chunk of stream) {
  // Check for final chunk
  if (chunk.choices[0].finish_reason) {
    console.log(`\nStream ended: ${chunk.choices[0].finish_reason}`);
    break;
  }

  const content = chunk.choices[0].delta.content;
  if (content) {
    process.stdout.write(content);
  }
}

With Fallbacks

Streaming works with fallbacks. If the primary model fails, the SDK automatically tries the next model:

Python
TypeScript

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
    fallbacks=["mistral-small-latest", "gpt-4o"]
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

const stream = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
  stream: true,
  fallbacks: ["mistral-small-latest", "gpt-4o"],
});

for await (const chunk of stream) {
  if (chunk.choices[0].delta.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}

Streaming with Error Handling

Python
TypeScript

from lunar import Lunar, RateLimitError, ServerError

client = Lunar()

try:
    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}],
        stream=True
    )

    for chunk in stream:
        content = chunk.choices[0].delta.content
        if content:
            print(content, end="", flush=True)

except RateLimitError as e:
    print(f"Rate limited. Retry after: {e.retry_after}s")
except ServerError as e:
    print(f"Server error: {e}")

import { Lunar, RateLimitError, ServerError } from "lunar";

const client = new Lunar();

try {
  const stream = await client.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: "Hello!" }],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0].delta.content;
    if (content) {
      process.stdout.write(content);
    }
  }
} catch (e) {
  if (e instanceof RateLimitError) {
    console.log(`Rate limited. Retry after: ${e.retryAfter}s`);
  } else if (e instanceof ServerError) {
    console.log(`Server error: ${e}`);
  }
}

When to Use Streaming

Use Case	Stream?
Chat interfaces	Yes
Real-time output	Yes
Long-form generation	Yes
Batch processing	No
API integrations	Depends

Performance Considerations

TTFT (Time to First Token): Streaming provides faster perceived response time
Total latency: Similar to non-streaming
Memory: Streaming uses less memory for long responses

Getting Started

Lunar SDK

Pricing

Streaming

Streaming

Basic Streaming

Stream Response Structure

Choice Delta

Collecting Full Response

Async Streaming (Python)

Aborting a Stream (TypeScript)

Detecting Stream End

With Fallbacks

Streaming with Error Handling

When to Use Streaming

Performance Considerations

Getting Started

Lunar SDK

Pricing

​Streaming

​Basic Streaming

​Stream Response Structure

​Choice Delta

​Collecting Full Response

​Async Streaming (Python)

​Aborting a Stream (TypeScript)

​Detecting Stream End

​With Fallbacks

​Streaming with Error Handling

​When to Use Streaming

​Performance Considerations

Streaming

Basic Streaming

Stream Response Structure

Choice Delta

Collecting Full Response

Async Streaming (Python)

Aborting a Stream (TypeScript)

Detecting Stream End

With Fallbacks

Streaming with Error Handling

When to Use Streaming

Performance Considerations