Executors

An executor is the process a Node runs to serve a particular AI model. Each enabled model has its own dedicated executor. Executors are managed from the Executors panel in a node's detail view in the Gateway.

Executor lifecycle

When you enable a model, the Node launches an executor process for it. The process loads the model into memory (or VRAM if a GPU is available) and starts listening for inference requests. This startup phase can take anywhere from a few seconds to a minute or more depending on the model size and hardware.

Once online, the executor handles requests from the Gateway until it is stopped or the Node shuts down. When the Node service restarts, all previously enabled executors are started automatically.

Executor states

State	Meaning
Online	The model is loaded and accepting requests
Starting	The model process is launching and loading into memory
Offline	The executor is stopped — the model process is not running
Stopping	The executor is in the process of shutting down

Starting and stopping an executor

Use the start and stop controls in the Executors panel to enable or disable a model at any time. Stopping an executor unloads the model from memory, freeing up RAM and VRAM for other models. Starting it again reloads the model.

Crash recovery

If a model process crashes unexpectedly, the Node will restart it automatically. After 3 consecutive crashes, the Node stops attempting to restart the executor and marks it as offline. This prevents a faulty or misconfigured model from looping in a crash-restart cycle.

If an executor reaches this state: - Check that the model file is not corrupted - Verify the machine has enough RAM or VRAM to load the model (see Hardware and GPU) - Check the node log in the Gateway for error messages from the failed process - Try adjusting parameters such as reducing the context window or changing the GPU backend

Editing executor parameters

Click the gear icon on an executor to open its parameter editor. Parameters are grouped into sections:

Executor parameter editor showing Context, Decoding, and Hardware sections

Context

Parameter	Description
Context Window	The number of tokens the model can hold in its working memory at once. OptimaGPT sets this automatically based on available hardware; increasing it uses more memory. Minimum 512 tokens.

Decoding — Controls how the model generates text. These affect response quality and creativity:

Parameter	Description
Temperature	Higher values make output more varied; lower values more deterministic. Range 0–2.
Top P	Nucleus sampling threshold. Lower values narrow the token selection pool.
Top K	Limits selection to the K most likely next tokens.
Min P	Filters out tokens below this probability relative to the top token.
Repetition Penalty	Discourages the model from repeating the same phrases.
Presence / Frequency Penalty	Further controls for reducing repetition.
Mirostat	An alternative sampling strategy. 0 = disabled.

Hardware — Controls how the model uses the available hardware:

Parameter	Description
GPU Backend	The compute backend used for GPU inference. See Hardware and GPU.
Tensor Split	Distributes model layers across multiple GPUs. See Hardware and GPU.

After changing parameters, the executor shows Restart to apply. Restart it to load the new configuration.