Architecture Overview
Understanding how OptimaGPT's components relate to each other will help you plan your installation and diagnose any issues.
How the components connect
Users & applications
│ HTTPS
▼
OptimaGateway
├── Admin interface
├── OptimaChat (web UI)
└── OpenAI-compatible API
│ WebSocket (encrypted)
▼
OptimaNode
└── AI Model (GGUF)
Requests from users and applications arrive at the Gateway over HTTPS. The Gateway authenticates the request, selects an available Node, and forwards the inference work to it over an encrypted WebSocket connection. The Node runs the model and returns the result to the Gateway, which delivers it back to the caller.
OptimaGateway
The Gateway is the front door to your OptimaGPT installation. It is the only component that needs to be reachable from your users and applications. It handles:
- Authentication — Users log in with a username and password. Applications authenticate with an API key or a JWT token.
- Routing — Incoming inference requests are distributed across connected, available Nodes.
- The admin interface — All configuration lives here: users, API keys, node management, model settings, MCP servers, and diagnostics.
- OptimaChat — The user-facing chat interface runs inside the Gateway. Users access it by opening a browser and navigating to the Gateway's address.
The Gateway exposes an OpenAI-compatible API, meaning any application that supports OpenAI can be configured to use OptimaGPT without modification.
OptimaNode
The Node handles inference. It does not expose an external API — it communicates only with the Gateway. Each Node:
- Scans a designated local directory for GGUF model files and registers them with the Gateway.
- Launches and supervises a dedicated inference process for each enabled model.
- Accepts inference requests from the Gateway and returns results.
- Reports hardware metrics (CPU, GPU, RAM, VRAM) back to the Gateway in real time.
OptimaChat
OptimaChat is the user-facing chat interface. It runs inside the Gateway — no separate installation is needed. Users open a browser, navigate to the Gateway's address, and log in. The OptimaGPT desktop application provides the same interface in a native window, without needing a browser.
Deployment patterns
Single machine Gateway and Node installed together on one server. The simplest setup, appropriate when a single machine has sufficient hardware for both roles.
Separate machines Gateway on one server, one or more Nodes on other machines. Useful when your inference hardware (a machine with a GPU, for example) is separate from the server you want to expose to your network.
Multiple Nodes One Gateway with several Nodes connected. Inference requests are distributed across Nodes automatically. Useful for handling more concurrent users, or for running different models on different machines.