For SupremeRAID™ AE deployments that run storage acceleration and CUDA application workloads on the same GPU server, NVIDIA Multi-Process Service (MPS) is the recommended GPU-sharing method. MPS allows SupremeRAID™ AE and CUDA applications, such as vLLM inference containers, to attach to a common managed CUDA context instead of running as unrelated GPU clients.
This deployment pattern makes shared-GPU operation explicit and predictable. SupremeRAID™ AE and the application workload use the same MPS control daemon, pipe directory, and log directory, so both services participate in the same GPU-sharing domain.
Reference: NVIDIA Multi-Process Service quick start
Configure NVIDIA MPS before starting SupremeRAID™ AE and the CUDA application workload. In this example, SupremeRAID™ AE and vLLM share the selected GPU resources through a host MPS control daemon.
Confirm MPS is included with the NVIDIA driver
After the NVIDIA driver is installed through the regular NVIDIA driver installation process, the MPS control daemon (nvidia-cuda-mps-control) is already included. No separate MPS package is required.
Stop all CUDA applications
Stop every running CUDA application, including the SupremeRAID™ AE service, before configuring MPS. This allows the MPS daemon to establish the shared GPU context before clients attach.
systemctl stop graid.service
(Multi-user mode) Set exclusive process mode
If the deployment runs in multi-user mode, set the target GPUs to exclusive process mode:
nvidia-smi -c EXCLUSIVE_PROCESS
Start the MPS control daemon (controller)
Set the shared pipe and log directories, then start the daemon:
export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log
nvidia-cuda-mps-control -dM
All CUDA clients in this deployment, including SupremeRAID™ AE and vLLM, must use the same CUDA_MPS_PIPE_DIRECTORY.
Add MPS environment variables to the SupremeRAID™ systemd units
Add the MPS environment variables to the SupremeRAID™ AE systemd unit files so the service attaches to the same MPS daemon as the application workload.
Edit /usr/lib/systemd/system/graid.service and /usr/lib/systemd/system/graidcore@.service, adding the following lines under [Service]:
Environment="CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps"
Environment="CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log"
Reload systemd and restart the SupremeRAID™ service:
systemctl daemon-reload
systemctl start graid.service
Launch the application as an MPS client
Export the matching MPS environment variables, then start the application. The following vLLM container command is an example only. Replace it with the actual application execution command used in your environment while preserving the same MPS pipe directory setting so the application attaches to the host MPS daemon.
export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log
docker run -it --rm --network=host --gpus all --privileged --ipc=host \
--cap-add=SYS_PTRACE --ulimit memlock=-1 --ulimit stack=67108864 \
-v /mnt/graid/lmcache:/lmcache \
-v /mnt/beegfs/models:/workspace \
-e PYTHONHASHSEED=0 \
-e LMCACHE_CONFIG_FILE=/workspace/lmcache_config/local-fs.yaml \
-e CUDA_VISIBLE_DEVICES=4,5,6,7 \
-v /tmp/nvidia-mps:/tmp/nvidia-mps \
-e CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps \
vllm/vllm-openai:v0.18.1 \
--model /workspace/Qwen3-235B-A22B-Instruct-2507/ \
--served-model-name Qwen3-235B-A22B-Instruct-2507 \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.9086 \
--kv-transfer-config '{"kv_connector":"LMCacheConnectorV1", "kv_role":"kv_both"}'
Note
nvidia-cuda-mps-control -dM starts the daemon. The pipe directory (/tmp/nvidia-mps) and log directory (/tmp/nvidia-log) must be identical across the controller, the SupremeRAID™ AE systemd units, and every application client.- Add the systemd
Environment= entries to both graid.service and graidcore@.service. Run systemctl daemon-reload after editing the units. - Because the unit files live under
/usr/lib/systemd/system/, package upgrades may overwrite direct edits. For an upgrade-safe deployment, use systemd drop-in overrides such as /etc/systemd/system/graid.service.d/mps.conf. CUDA_VISIBLE_DEVICES=4,5,6,7 scopes the vLLM container to a subset of GPUs. Account for the SupremeRAID™ AE GPU assignment when planning which devices are shared.- To tear down the MPS deployment, stop the application and SupremeRAID™ AE service, run
echo quit | nvidia-cuda-mps-control, and reset GPU compute mode with nvidia-smi -c DEFAULT if exclusive process mode was set.