Executors
An executor is the process a Node runs to serve a particular AI model. Each enabled model has its own dedicated executor. Executors are managed from the Executors panel in a node's detail view in the Gateway.
Executor lifecycle
When you enable a model, the Node launches an executor process for it. The process loads the model into memory (or VRAM if a GPU is available) and starts listening for inference requests. This startup phase can take anywhere from a few seconds to a minute or more depending on the model size and hardware.
Once online, the executor handles requests from the Gateway until it is stopped or the Node shuts down. When the Node service restarts, all previously enabled executors are started automatically.
Executor states
| State | Meaning |
|---|---|
| Online | The model is loaded and accepting requests |
| Starting | The model process is launching and loading into memory |
| Offline | The executor is stopped — the model process is not running |
| Stopping | The executor is in the process of shutting down |
Starting and stopping an executor
Use the start and stop controls in the Executors panel to enable or disable a model at any time. Stopping an executor unloads the model from memory, freeing up RAM and VRAM for other models. Starting it again reloads the model.
Crash recovery
If a model process crashes unexpectedly, the Node will restart it automatically. After 3 consecutive crashes, the Node stops attempting to restart the executor and marks it as offline. This prevents a faulty or misconfigured model from looping in a crash-restart cycle.
If an executor reaches this state: - Check that the model file is not corrupted - Verify the machine has enough RAM or VRAM to load the model (see Hardware and GPU) - Check the node log in the Gateway for error messages from the failed process - Try adjusting parameters such as reducing the context window or changing the GPU backend
Editing executor parameters
Click the gear icon on an executor to open its parameter editor. Parameters are grouped into sections:

Context
| Parameter | Description |
|---|---|
| Context Window | The number of tokens the model can hold in its working memory at once. OptimaGPT sets this automatically based on available hardware; increasing it uses more memory. Minimum 512 tokens. |
Decoding — Controls how the model generates text. These affect response quality and creativity:
| Parameter | Description |
|---|---|
| Temperature | Higher values make output more varied; lower values more deterministic. Range 0–2. |
| Top P | Nucleus sampling threshold. Lower values narrow the token selection pool. |
| Top K | Limits selection to the K most likely next tokens. |
| Min P | Filters out tokens below this probability relative to the top token. |
| Repetition Penalty | Discourages the model from repeating the same phrases. |
| Presence / Frequency Penalty | Further controls for reducing repetition. |
| Mirostat | An alternative sampling strategy. 0 = disabled. |
Hardware — Controls how the model uses the available hardware:
| Parameter | Description |
|---|---|
| GPU Backend | The compute backend used for GPU inference. See Hardware and GPU. |
| Tensor Split | Distributes model layers across multiple GPUs. See Hardware and GPU. |
After changing parameters, the executor shows Restart to apply. Restart it to load the new configuration.