Skip to Content
Welcome to RitoSwap's documentation!

Chat API

The Chat API streams AI-generated responses for the RitoSwap chat interface. It supports multiple model providers (OpenAI, Cloudflare Workers AI, and local LM Studio via OpenAI-compatible API), optional JWT-based access control, and per-token usage quotas.

Overview

This endpoint accepts prior conversation messages and returns a streamed response that renders progressively in the UI. When enabled, JWT authentication associates usage with a specific token and activates quota controls that track total tokens consumed across a time window.

Key capabilities:

  • Provider selection via server configuration (OpenAI / Cloudflare / LM Studio)
  • Optional JWT gate (header, body, cookie, or query parameter)
  • Token-based quota with pre-check and streamed usage accounting
  • Robust error handling and structured server logging

Endpoint Details

PropertyValue
URL/api/chat
MethodPOST
Content-Typeapplication/json
AuthenticationJWT (configurable; default off in public config)
ResponseStreamed text suitable for chat UI rendering

Request Format

The request body must include a messages array. The server accepts the standard Vercel AI message shape (CoreMessage); the simplest variant is a list of { role, content } strings as shown below.

// Minimal message type type Role = 'system' | 'user' | 'assistant' interface ChatMessage { role: Role content: string } interface ChatRequestBody { messages: ChatMessage[] // Optional JWT placements supported by the server: // Prefer the Authorization header. The fields below are convenience fallbacks. jwt?: string data?: { jwt?: string } }

Authentication Options

When NEXT_PUBLIC_AI_CHAT_REQUIRES_JWT is true, include a valid JWT using one of the supported locations:

  • Header: Authorization: Bearer <token> (recommended)
  • Body: {"jwt": "<token>"} or {"data": {"jwt": "<token>"}}
  • Cookie: access_token or jwt
  • Query string: ?jwt=<token> (use sparingly; avoid logging/leaks)

Example Requests

curl -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $JWT" \ -d '{ "messages": [ { "role": "user", "content": "Drop a degen rhyme about Ethereum." } ] }' \ /api/chat
// Streaming with fetch (browser) const res = await fetch('/api/chat', { method: 'POST', headers: { 'Content-Type': 'application/json', // Include when JWT is required/enabled ...(jwt ? { Authorization: `Bearer ${jwt}` } : {}), }, body: JSON.stringify({ messages: [{ role: 'user', content: 'Rap about cross-chain swaps.' }], }), }); const reader = res.body?.getReader(); const decoder = new TextDecoder(); let text = ''; while (reader) { const { value, done } = await reader.read(); if (done) break; text += decoder.decode(value, { stream: true }); // Render incrementally from `text` as it grows }

Response Behavior

The endpoint returns a streaming response that accumulates into the assistant’s reply. The stream is designed to integrate cleanly with a chat UI that renders tokens as they arrive. When the stream finishes, any active quota accounting is finalized server-side.

Persona & Inline Visuals

The system prompt configures a rap persona named “RapBotRito” and supports inline visual tags rendered by the chat UI:

  • <chain-logo chainName="Ethereum" size="200" /> – display chain logos during relevant mentions (max 2 per response)
  • <key-icon bgColor="#000000" keyColor="#FF00FF" /> – display the user’s NFT key icon when contextually appropriate

Place tags inline where visuals should appear. The UI renderer interprets them into components. The parser is forgiving and also accepts name instead of chainName, and width instead of size.

Note: The “chainName required” and “max 2 logos” are system‑prompt rules for content generation, not server‑side validation.

Error Responses

{ "error": "Missing \"messages\" array" }

HTTP Status Code: 400 Bad Request

{ "error": "Unauthorized: missing JWT" }

HTTP Status Code: 401 Unauthorized

{ "error": "Unauthorized: invalid JWT" }

HTTP Status Code: 401 Unauthorized

{ "error": "Quota exceeded", "remaining": 0, "resetAt": "2025-09-03T07:00:00.000Z" }

HTTP Status Code: 429 Too Many Requests

{ "error": "OpenAI API request failed", "details": "..." }
{ "error": "Cloudflare Workers AI request failed", "details": "..." }
{ "error": "LM Studio connection failed. Ensure it's running at https://localhost:1234/v1", "details": "..." }

HTTP Status Code: 500 Internal Server Error

{ "error": "Chat request failed" }

HTTP Status Code: 500 Internal Server Error

Quota & Usage Tracking

When quota is active, the server performs a pre-check before invoking the model and returns 429 if the window is exhausted. During streaming, a heuristic token counter tracks output; on stream close, total estimated tokens are recorded against the authenticated token.

Configuration knobs:

SettingDescriptionDefault
AI_CHAT_QUOTA_ENABLEDEnable/disable quota accountingtrue
AI_CHAT_QUOTA_TOKENSMax tokens per window (estimated)20000
AI_CHAT_QUOTA_WINDOW_SECWindow size in seconds86400

Activation condition:

quota.active === true only when all are true:

  • NEXT_PUBLIC_AI_CHAT_REQUIRES_JWT (public flag)
  • Redis is active in server config
  • AI_CHAT_QUOTA_ENABLED=true

Provider Configuration

The system supports multiple LLM providers, configured via environment variables. The ProviderRegistry (dapp/app/lib/llm/providers/registry.ts) abstracts the specific client creation.

ProviderEnv Var (AI_PROVIDER)Description
OpenAIopenaiStandard OpenAI API (GPT-4o, etc.). Requires OPENAI_API_KEY.
LM StudiolmstudioLocal inference for testing. Requires AI_BASE_URL (e.g., http://localhost:1234/v1).

Stack & Implementation

The Chat API is built on a robust stack designed for control and observability:

  • Orchestration: LangChain (@langchain/core, @langchain/openai) manages the core chat loop, message history, and tool binding.
  • Streaming: A Custom SSE Handler (sse-stream.ts) replaces the standard AI SDK stream. This allows us to “tee” the stream, sending text to the LLM while simultaneously streaming full JSON tool outputs to the client.
  • Validation: Tools use Raw JSON Schema for precise definition of inputs, ensuring the LLM adheres strictly to the required format.

Additional limits:

KeyDescriptionDefault
AI_CHAT_MAX_OUTPUT_TOKENSMaximum tokens in a single response2000
AI_CHAT_MAX_DURATIONMax server execution time (seconds)30

Client Configuration

Public flags influence the client/UI behavior:

KeyDescriptionDefault
NEXT_PUBLIC_AI_CHAT_REQUIRES_JWTIf true, the client should attach a Bearer tokenfalse
NEXT_PUBLIC_AI_CHAT_API_PATHPath to the chat endpoint/api/chat

Security Notes

  • Prefer the Authorization header for JWTs; avoid long-term tokens in cookies and avoid query strings where possible.
  • Streaming responses should be consumed over HTTPS to prevent token leakage.
  • Quota enforcement requires Redis and proper server configuration; ensure infrastructure is healthy in production.

RitoSwap Docs does not store, collect or access any of your conversations. All saved prompts are stored locally in your browser only.