ChatGPT


I have 2 Asus Gx10s and a 0.4m 400G QSFP112 DAC Cable.  Please walk me through clustering the GX10s to run larger models

I’ll lay this out as a practical build path: first the physical/network layer over the 400G DAC, then OS/NVIDIA checks, then the realistic software options for multi-node inference. I’ll also flag where “clustering” helps versus where current local-LLM tooling still won’t automatically split one model across two machines.

Thought for 15s
Yes — with 2 ASUS Ascent GX10s and the QSFP112 DAC, the sensible path is:

Use the 10G/Wi-Fi/Ethernet port for normal internet/LAN access

Use the QSFP112 DAC only as the private high-speed cluster link

Verify the direct 200GbE/ConnectX link

Set up passwordless SSH

Then run a multi-node inference stack: vLLM/SGLang/TensorRT-LLM for serious model sharding, or llama.cpp RPC for simpler experiments

A single GX10 has 128GB unified memory and is advertised for models up to around 200B parameters; NVIDIA says two Spark/GX10-class systems can be linked over high-performance ConnectX networking for models up to roughly 405B parameters. ASUS’s GX10 support page gives an official “connect two GX10s” process using the QSFP cable and a netplan file. 

1. Name the two machines
On each GX10, set a clear hostname.

On the first GX10:

sudo hostnamectl set-hostname gx10a
On the second GX10:

sudo hostnamectl set-hostname gx10b
Reboot both, or log out and back in:

sudo reboot
After reboot:

hostname
2. Connect the cable
Plug the QSFP112 DAC directly between the ConnectX ports on the two GX10s.

Keep both machines also connected to the internet through their normal LAN/Wi-Fi/10G network, because the QSFP link is just for private node-to-node traffic.

3. Apply ASUS/NVIDIA’s official GX10 direct-link netplan
ASUS’s official instructions currently use this netplan file from NVIDIA’s DGX Spark playbooks and note that their example interface is enp1s0f1np1. 

Run this on both GX10s:

sudo wget -O /etc/netplan/40-cx7.yaml \
https://github.com/NVIDIA/dgx-spark-playbooks/raw/main/nvidia/connect-two-sparks/assets/cx7-netplan.yaml

sudo chmod 600 /etc/netplan/40-cx7.yaml
sudo netplan apply
Then check the high-speed interface:

ip addr
You are looking for a 169.254.x.x address on the ConnectX interface. ASUS’s example shows one node as 169.254.42.38 and the other as 169.254.213.227, but yours will likely be different. 

On each machine, also run:

ip link
Look for the interface that is UP and associated with the ConnectX/QSFP connection.

4. Confirm the two GX10s can see each other
Suppose you find:

gx10a QSFP IP: 169.254.42.38
gx10b QSFP IP: 169.254.213.227
From gx10a:

ping 169.254.213.227
From gx10b:

ping 169.254.42.38
If ping works, the direct cluster link is alive.

For convenience, add both private IPs to /etc/hosts on both machines:

sudo nano /etc/hosts
Add something like:

169.254.42.38     gx10a-cluster
169.254.213.227   gx10b-cluster
Then test:

ping gx10a-cluster
ping gx10b-cluster
5. Set up passwordless SSH
NVIDIA’s two-Spark playbook says passwordless SSH between nodes is part of the functional distributed setup. 

Use the same Linux username on both GX10s if possible.

On gx10a:

ssh-keygen -t ed25519
ssh-copy-id gx10b-cluster
On gx10b:

ssh-keygen -t ed25519
ssh-copy-id gx10a-cluster
Test both directions:

ssh gx10b-cluster hostname
ssh gx10a-cluster hostname
You want this to work without asking for a password.

6. Verify NVIDIA/GPU/driver stack on both
On each GX10:

nvidia-smi
Also check Python, CUDA, and basic GPU visibility:

python3 --version
nvcc --version || true
If nvidia-smi is missing or broken, stop there and fix the NVIDIA/DGX OS stack first.

7. Decide what kind of “clustering” you actually want
This is the important part.

Option A — easiest useful cluster: two independent model servers
This does not let one huge model span both GX10s, but it is often the most useful production setup.

Run one model server on each GX10, then put a simple load balancer or router in front of them. This is great for:

many users
agent workloads
batch jobs
multiple models
RAG pipelines
parallel experiments
For example, run Ollama, llama.cpp server, LM Studio server, vLLM, or SGLang independently on each GX10.

Option B — true larger-model sharding across both GX10s
This is what you asked for: one model split across two machines.

For that, the strongest candidates are:

Stack	Best use
vLLM + Ray	Good general-purpose OpenAI-compatible serving with multi-node support
SGLang	High-performance serving, structured generation, multi-node tensor parallelism
TensorRT-LLM / Triton	NVIDIA-optimized, serious production path, more setup
llama.cpp RPC	Simpler experimental distributed inference, especially GGUF models
vLLM’s current docs describe Ray as the default runtime for multi-node inference and tensor_parallel_size as the way to shard across GPUs. 
 SGLang’s multi-node docs describe multi-node deployment for models that exceed a single node’s GPU memory and call out high-speed interconnects such as InfiniBand/RoCE/high-bandwidth Ethernet. 
 TensorRT-LLM also supports single-GPU, multi-GPU, and multi-node deployments. 

For your first successful “real cluster” test, I would probably start with vLLM + Ray or llama.cpp RPC, then move to TensorRT-LLM later.

8. Install Ray + vLLM on both GX10s
On both machines, create a clean environment:

python3 -m venv ~/vllm-cluster
source ~/vllm-cluster/bin/activate
pip install --upgrade pip wheel setuptools
pip install vllm ray
Because the GX10 is ARM/Grace Blackwell/DGX OS territory, some packages may already be NVIDIA-optimized or container-based. If pip install vllm fails, the better path may be NVIDIA’s recommended container for Spark/GX10 rather than a plain pip install.

9. Start a Ray cluster over the QSFP link
On gx10a, using its QSFP/private IP:

ray start --head \
  --node-ip-address=169.254.42.38 \
  --port=6379
It will print a join command.

On gx10b, join using the first node’s private QSFP address:

ray start --address='169.254.42.38:6379' \
  --node-ip-address=169.254.213.227
Then on gx10a:

ray status
You should see both nodes.

10. Try a smaller distributed vLLM test first
Do not start with a 405B model. First prove that multi-node serving works with a smaller model.

Example shape:

source ~/vllm-cluster/bin/activate

vllm serve meta-llama/Llama-3.1-70B-Instruct \
  --tensor-parallel-size 2 \
  --distributed-executor-backend ray \
  --host 0.0.0.0 \
  --port 8000
That command is the conceptual structure; the exact model name, quantization format, and flags may need adjustment depending on what models you have access to and what vLLM currently supports cleanly on the GX10.

Then test from another terminal:

curl http://localhost:8000/v1/models
And a completion/chat request:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-70B-Instruct",
    "messages": [
      {"role": "user", "content": "Say hello from the two-node GX10 cluster."}
    ],
    "max_tokens": 80
  }'
11. When that works, move up to larger models
For two GX10s, realistic target experiments are:

Llama 3.1 405B, heavily quantized / FP8 / FP4 depending on stack support
DeepSeek/Qwen MoE models, if supported by the serving stack
Large Nemotron / NVIDIA-optimized models from NGC
The “up to 405B” claim depends heavily on model format, quantization, KV cache size, context length, serving engine, and batch size. Long context can eat a shocking amount of memory even if the weights fit.

Start with conservative settings:

--max-model-len 4096
--gpu-memory-utilization 0.85
Then increase context after it works.

12. Alternative: llama.cpp RPC path
If you want a more “bare metal” test with GGUF models, llama.cpp has an RPC backend that can distribute model weights and KV cache across local and remote devices; its RPC README says llama.cpp distributes weights and KV cache across available local and remote devices in proportion to available memory by default. 

Conceptually:

On gx10b, start the RPC server:

cd ~/llama.cpp
./build/bin/rpc-server -H 169.254.213.227 -p 50052
On gx10a, run the main server using the remote RPC backend:

cd ~/llama.cpp
./build/bin/llama-server \
  -m /models/your-large-model.gguf \
  --rpc 169.254.213.227:50052 \
  --host 0.0.0.0 \
  --port 8080
The exact flags can change as llama.cpp evolves, so treat this as the path to test, not a guaranteed copy/paste final command. llama.cpp’s multi-GPU docs also explain the difference between layer and tensor splitting: layer splitting is generally more compatible and less interconnect-hungry, while tensor splitting can improve token latency but is much more dependent on fast GPU interconnects. 

13. Recommended first-day success checklist
I would do this in this order:

# both nodes
hostname
ip addr
nvidia-smi

# after QSFP netplan
ping gx10a-cluster
ping gx10b-cluster

# SSH
ssh gx10b-cluster hostname
ssh gx10a-cluster hostname

# Ray
ray status

# serving
curl http://gx10a-cluster:8000/v1/models
14. Practical expectation
The cable and hardware make this possible, but software support is still the tricky part. A lot of local tools — LM Studio, Ollama, Open WebUI, etc. — are excellent on one machine but do not automatically turn two GX10s into one giant shared-memory model host.

For true one-model-across-two-GX10s, focus on:

vLLM + Ray
SGLang multi-node
TensorRT-LLM/Triton
llama.cpp RPC
For most reliable everyday usefulness, also run both GX10s as separate inference servers behind Open WebUI or a small routing layer. That gives you immediate value while you work through the more delicate distributed model-sharding setup.


“Over at some point, I'm sure.”
00:02

No problem! When you're ready to pick up from where we left off, just let me know. I’ll be


“Please stop speaking to me. I don't want to talk to me.”
00:03

No worries at all! If you ever feel


“Dorothy...”
00:00

No worries at all! If anything else comes up or you need help later, just let me know. In the meantime, you’re all set to take things at your own pace!



Pasted text(71).txt
Document
gb10a@gx10a:~$ ssh gx10b@gx10b-cluster hostname
gx10b

gx10b@gx10b:~$ ssh-copy-id -i ~/.ssh/id_ed25519.pub gb10a@gx10a-cluster
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/gx10b/.ssh/id_ed25519.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
gb10a@gx10a-cluster's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'gb10a@gx10a-cluster'"
and check to make sure that only the key(s) you wanted were added.

gx10b@gx10b:~$ ssh gb10a@gx10a-cluster hostname
gx10a

gb10a@gx10a:~$ ssh gx10b@gx10b-cluster "ip route get 169.254.10.1"
169.254.10.1 dev enp1s0f0np0 src 169.254.10.2 uid 1000 
    cache 

gx10b@gx10b:~$ ssh gb10a@gx10a-cluster "ip route get 169.254.10.2"
169.254.10.2 dev enp1s0f0np0 src 169.254.10.1 uid 1000 
    cache 



(ray-test) gx10b@gx10b:~$ source ~/ray-test/bin/activate
ray start --address='169.254.10.1:6379' --node-ip-address=169.254.10.2
Local node IP: 169.254.10.2

--------------------
Ray runtime started.
--------------------

To terminate the Ray runtime, run
  ray stop(ray-test) gb10a@gx10a:~$ ray status
======== Autoscaler status: 2026-05-27 15:08:06.687999 ========
Node status
---------------------------------------------------------------
Active:
 1 node_d3b5024ec56577baf039205710339ea8d2dbc3ae8c7f4497992f3215
 1 node_f99375afb1ca83db4ac9423c69338200fba5c658abf28ff272ca4f1a
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Total Usage:
 0.0/40.0 CPU
 0.0/2.0 GPU
 0B/163.27GiB memory
 0B/69.97GiB object_store_memory

From request_resources:
 (none)
Pending Demands:
 (no resource demands)
(ray-test) gb10a@gx10a:~$ 

Fantastic! Both machines are now linked, and Ray is running across your two GX10s. The cluster shows both nodes and no issues. Now you’re ready to try distributed workloads! If you need help running a model or scaling tasks across both machines, I’m here. Otherwise, congrats on getting the cluster set up!



Pasted text(72).txt
Document
(ray-test) gb10a@gx10a:~$ source ~/ray-test/bin/activate
(ray-test) gb10a@gx10a:~$ python - <<'PY'
import ray
import socket
import os

ray.init(address="auto")

@ray.remote(num_gpus=1)
def gpu_task(i):
    return {
        "task": i,
        "hostname": socket.gethostname(),
        "CUDA_VISIBLE_DEVICES": os.environ.get("CUDA_VISIBLE_DEVICES"),
        "ray_gpu_ids": ray.get_gpu_ids(),
    }

results = ray.get([gpu_task.remote(i) for i in range(2)])
for r in results:
    print(r)
PY
2026-05-27 18:40:58,271	INFO worker.py:1814 -- Connecting to existing Ray cluster at address: 169.254.10.1:6379...
2026-05-27 18:40:58,287	INFO worker.py:2012 -- Connected to Ray cluster.
/home/gb10a/ray-test/lib/python3.12/site-packages/ray/_private/worker.py:2051: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(
{'task': 0, 'hostname': 'gx10a', 'CUDA_VISIBLE_DEVICES': '0', 'ray_gpu_ids': [0]}
{'task': 1, 'hostname': 'gx10b', 'CUDA_VISIBLE_DEVICES': '0', 'ray_gpu_ids': [0]}




Pasted text(73).txt
Document
(ray-test) gb10a@gx10a:~$ source ~/ray-test/bin/activate
vllm --version
python -c "import vllm; print(vllm.__version__)"
0.21.0
0.21.0
(ray-test) gb10a@gx10a:~$ 

In another terminal window:

gb10a@gx10a:~$ curl http://127.0.0.1:8000/v1/models
curl: (7) Failed to connect to 127.0.0.1 port 8000 after 0 ms: Couldn't connect to server
gb10a@gx10a:~$ curl http://127.0.0.1:8000/v1/models
curl: (7) Failed to connect to 127.0.0.1 port 8000 after 0 ms: Couldn't connect to server
gb10a@gx10a:~$ 


Pasted text(74).txt
Document


gb10a@gx10a:~$ curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-7B-Instruct",
    "messages": [
      {"role": "user", "content": "Write one paragraph explaining that this response is coming from a two-node GX10 Ray/vLLM cluster."}
    ],
    "max_tokens": 200,
    "temperature": 0.2
  }'
curl: (7) Failed to connect to 127.0.0.1 port 8000 after 0 ms: Couldn't connect to server


Pasted text(75).txt
Document
It still doesn't seem to have finished


Pasted text(76).txt
Document


Pasted text(77).txt
Document

Stopped thinking
Quick answer

Please continue

Thought for 5s
Great — this latest log is progress. The shared /opt/models/... path fix worked: both workers found and loaded the model files this time. The failure moved later, during/after CUDA graph profiling / KV cache initialization.

The key lines are:

Worker_TP1 ... Model loading took 7.16 GiB memory and 2.839555 seconds
Worker_TP0 ... Model loading took 7.16 GiB memory and 25.972658 seconds
world_size=2 ... TP rank 0 / TP rank 1
So the model is now loading on both GX10s. Then the remote Ray worker died during the cache/config phase:

RayWorkerProc rank=[1] died unexpectedly
RuntimeError: cancelled
The log also shows vLLM was doing CUDA graph memory profiling and warned about shared-memory broadcast timing while compilation/profiling was underway. 


Try a more conservative vLLM startup
Stop any leftover vLLM/Ray worker processes first.

On both machines:

source ~/ray-test/bin/activate
ray stop
pkill -f vllm || true
Then restart Ray.

On gx10a:

source ~/ray-test/bin/activate
ray start --head --node-ip-address=169.254.10.1 --port=6379
On gx10b:

source ~/ray-test/bin/activate
ray start --address='169.254.10.1:6379' --node-ip-address=169.254.10.2
On gx10a, verify:

ray status
Now try this safer server command on gx10a:

source ~/ray-test/bin/activate

VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=0 \
vllm serve /opt/models/Qwen2.5-7B-Instruct \
  --served-model-name Qwen/Qwen2.5-7B-Instruct \
  --host 0.0.0.0 \
  --port 8000 \
  --distributed-executor-backend ray \
  --tensor-parallel-size 2 \
  --max-model-len 4096 \
  --gpu-memory-utilization 0.80 \
  --enforce-eager
Why this version:

--enforce-eager
disables CUDA graph execution. vLLM’s engine-argument docs describe enforce_eager as always using eager-mode PyTorch and disabling CUDA graph execution. 

VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=0
turns off the CUDA graph memory profiling behavior that your log says is enabled by default in vLLM 0.21.0. Your log explicitly says CUDA graph memory profiling is enabled and suggests this environment variable as the way to disable it. 


--gpu-memory-utilization 0.80
leaves more headroom while we stabilize distributed startup.

If that starts, test it
In a second terminal on gx10a:

curl http://127.0.0.1:8000/v1/models
Then:

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-7B-Instruct",
    "messages": [
      {"role": "user", "content": "Write one paragraph explaining that this response is coming from a two-node GX10 Ray/vLLM cluster."}
    ],
    "max_tokens": 200,
    "temperature": 0.2
  }'
If it still fails, try pipeline parallel instead of tensor parallel
With one GPU per node, tensor_parallel_size=2 does work conceptually, but it forces tensor-parallel all-reduce across machines. The warnings say vLLM is spreading TP across nodes. Since you have 200G networking that may be okay, but a more natural one-GPU-per-node layout can be pipeline parallelism:

source ~/ray-test/bin/activate

VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=0 \
vllm serve /opt/models/Qwen2.5-7B-Instruct \
  --served-model-name Qwen/Qwen2.5-7B-Instruct \
  --host 0.0.0.0 \
  --port 8000 \
  --distributed-executor-backend ray \
  --tensor-parallel-size 1 \
  --pipeline-parallel-size 2 \
  --max-model-len 4096 \
  --gpu-memory-utilization 0.80 \
  --enforce-eager
What I think is happening
You solved the first real blocker: the remote worker could not see /home/gb10a/models/...; /opt/models/... fixed that.

Now the model loads on both GPUs, then the remote worker dies during vLLM’s V1 startup/profiling/cache initialization. The quickest practical workaround is to remove CUDA graph/profiling complexity with --enforce-eager and reduced memory utilization. If that works, we can later remove one flag at a time to regain performance.


gb10a@gx10a:~$ curl http://127.0.0.1:8000/v1/models
{"object":"list","data":[{"id":"Qwen/Qwen2.5-7B-Instruct","object":"model","created":1779927940,"owned_by":"vllm","root":"/opt/models/Qwen2.5-7B-Instruct","parent":null,"max_model_len":4096,"permission":[{"id":"modelperm-8b062bed74594768","object":"model_permission","created":1779927940,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}gb10a@gx10a:~$ 

gb10a@gx10a:~$ curl http://127.0.0.1:8000/v1/models
{"object":"list","data":[{"id":"Qwen/Qwen2.5-7B-Instruct","object":"model","created":1779927940,"owned_by":"vllm","root":"/opt/models/Qwen2.5-7B-Instruct","parent":null,"max_model_len":4096,"permission":[{"id":"modelperm-8b062bed74594768","object":"model_permission","created":1779927940,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":fgb10a@gx10a:~$ curl http://127.0.0.1:8000/v1/chat/completions \letions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-7B-Instruct",
    "messages": [
      {"role": "user", "content": "Write one paragraph explaining that this response is coming from a two-node GX10 Ray/vLLM cluster."}
    ],
    "max_tokens": 200,
    "temperature": 0.2
  }'

{"id":"chatcmpl-b68f1806856177f0","object":"chat.completion","created":1779927965,"prompt_routed_experts":null,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"This response is generated by a two-node GX10 Ray/vLLM cluster, which consists of two powerful GX10 nodes optimized for efficient distributed inference using the Ray framework and the vLLM library. This setup allows for parallel processing and significantly enhances the speed and scalability of generating text responses, ensuring both efficiency and performance in handling complex queries.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null,"routed_experts":null}],"service_tier":null,"system_fingerprint":"vllm-0.21.0-tp2-nohash","usage":{"prompt_tokens":51,"total_tokens":122,"completion_tokens":71,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"prompt_text":null,"kv_transfer_params":null}gb10a@gx10a:~$ 
gb10a@gx10a:~$ 

Screenshot (51).png
Screenshot (52).png
I'm now getting these connection error messages popping up, which weren't there before setting all this up


Pasted text(78).txt
Document
gb10a@gx10a:~$ nmcli device status
DEVICE          TYPE      STATE                   CONNECTION          
wlP9s9          wifi      connected               TMOBILE-793A        
enp1s0f0np0     ethernet  connected               netplan-enp1s0f0np0 
lo              loopback  connected (externally)  lo                  
docker0         bridge    connected (externally)  docker0             
enP2p1s0f0np0   ethernet  disconnected            --                  
p2p-dev-wlP9s9  wifi-p2p  disconnected            --                  
enP2p1s0f1np1   ethernet  unavailable             --                  
enP7s7          ethernet  unavailable             --                  
enp1s0f1np1     ethernet  unavailable             --   

NAME                 UUID                                  TYPE      DEVICE       AUTOCONNECT 
TMOBILE-793A         82340a3c-655d-436d-b226-451ca41f26bb  wifi      wlP9s9       yes         
netplan-enp1s0f0np0  05ec54af-9f53-3227-867e-c7e879b25c37  ethernet  enp1s0f0np0  yes         
lo                   c8b8418e-bc32-4319-a98e-9ded46ac125c  loopback  lo           no          
docker0              c2e774f1-5975-4545-b2a0-11bd131323a4  bridge    docker0      no          
OobeHotspot          ea258393-ed46-4721-89df-ddfcec79d083  wifi      --           no          
OobeHotspot          a93a979d-6206-4007-8def-36dba55e74b2  wifi      --           no          
OobeHotspot          8d5e070b-9add-40de-aa90-c56bb32c1381  wifi      --           no          
Wired connection 1   ddae69d0-8873-3a7b-811d-b1163cdf287e  ethernet  --           yes         
Wired connection 2   03b2a174-ba20-3a3e-ad59-669f4436d481  ethernet  --           yes         
Wired connection 3   abad5efa-e2b6-3964-ac3f-9d6e8705b43a  ethernet  --           yes         
Wired connection 4   5a9bbdaf-5b23-34fc-9313-709e5b02e0c3  ethernet  --           yes         
Wired connection 5   7d41b90e-0d5b-3c67-b465-fe987bfcc23f  ethernet  --           yes  


Thought for a couple of seconds
Perfect — the logs identify the exact culprit.

The popup is coming from:

Wired connection 4
device: enP2p1s0f0np0
reason: ip-config-unavailable
NetworkManager is repeatedly trying DHCP on enP2p1s0f0np0, timing out, and showing the desktop notification. Your actual working cluster link is not that interface. Your working cluster interface is:

enp1s0f0np0 -> netplan-enp1s0f0np0
The log repeatedly shows NetworkManager auto-activating Wired connection 4 on enP2p1s0f0np0, then failing DHCP. 


Run this on gx10a:

sudo nmcli connection modify "Wired connection 4" connection.autoconnect no
sudo nmcli connection down "Wired connection 4" || true
Then verify:

nmcli -f NAME,UUID,TYPE,DEVICE,AUTOCONNECT con show
nmcli device status
You want to see:

Wired connection 4   ...   ethernet   --   no
And you still want:

wlP9s9          wifi      connected     TMOBILE-793A
enp1s0f0np0     ethernet  connected     netplan-enp1s0f0np0
Then confirm the cluster link still works:

ping -I enp1s0f0np0 169.254.10.2
If you get the same popup on gx10b, run the same diagnostic there:

nmcli device status
nmcli -f NAME,UUID,TYPE,DEVICE,AUTOCONNECT con show
journalctl -u NetworkManager -b --since "30 minutes ago" | egrep -i "failed|activation|enp|enP|wlP|connection"
Then disable whichever unused Wired connection X is trying DHCP and failing.


gb10a@gx10a:~$ sudo nmcli connection modify "Wired connection 4" connection.autoconnect no
sudo nmcli connection down "Wired connection 4" || true
[sudo] password for gb10a: 
Error: 'Wired connection 4' is not an active connection.
Error: no active connection provided.

Thought for a couple of seconds
That is okay — the important first command probably worked.

This part:

Error: 'Wired connection 4' is not an active connection.
Error: no active connection provided.
just means Wired connection 4 was not active at that exact moment, so NetworkManager could not bring it “down.” That is harmless.

Now verify that autoconnect is disabled:

nmcli -f NAME,DEVICE,AUTOCONNECT con show | grep -E "Wired connection 4|netplan-enp1s0f0np0|TMOBILE"
You want to see something like:

TMOBILE-793A          wlP9s9          yes
netplan-enp1s0f0np0   enp1s0f0np0     yes
Wired connection 4    --              no
Then wait a minute or two and check whether NetworkManager is still trying it:

journalctl -u NetworkManager -b --since "2 minutes ago" | egrep -i "Wired connection 4|enP2p1s0f0np0|failed|activation"
If no new Wired connection 4 activation attempts appear, the popup should stop.

Also confirm the cluster link still works:

ping -I enp1s0f0np0 169.254.10.2
If the popup persists, run this stronger version:

sudo nmcli connection delete "Wired connection 4"
That only deletes the stale NetworkManager profile. It should not affect your working cluster link, which is:

netplan-enp1s0f0np0 on enp1s0f0np0

gb10a@gx10a:~$ nmcli -f NAME,DEVICE,AUTOCONNECT con show | grep -E "Wired connection 4|netplan-enp1s0f0np0|TMOBILE"
TMOBILE-793A         wlP9s9       yes         
netplan-enp1s0f0np0  enp1s0f0np0  yes         
Wired connection 4   --           no          
gb10a@gx10a:~$ 

Perfect — that fixed the offending autoconnect profile on gx10a.

You now have:

TMOBILE-793A          wlP9s9        yes
netplan-enp1s0f0np0   enp1s0f0np0   yes
Wired connection 4    --            no
That is exactly what we wanted.

Now check whether new errors stopped:

journalctl -u NetworkManager -b --since "2 minutes ago" | egrep -i "Wired connection 4|enP2p1s0f0np0|failed|activation"
If that returns no new activation failures, the popups should stop on gx10a.

Then confirm the 200G cluster link is still okay:

ping -I enp1s0f0np0 169.254.10.2
If the same popup appears on gx10b, repeat the same diagnostic there and disable the corresponding stale Wired connection X profile.


gb10a@gx10a:~$ ping -I enp1s0f0np0 169.254.10.2
PING 169.254.10.2 (169.254.10.2) from 169.254.10.1 enp1s0f0np0: 56(84) bytes of data.
64 bytes from 169.254.10.2: icmp_seq=1 ttl=64 time=0.216 ms
64 bytes from 169.254.10.2: icmp_seq=2 ttl=64 time=0.788 ms
64 bytes from 169.254.10.2: icmp_seq=3 ttl=64 time=0.861 ms
^C
--- 169.254.10.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2057ms
rtt min/avg/max/mdev = 0.216/0.621/0.861/0.288 ms

Please provide 1 concise copy-paste block for gx10b

Run this on gx10b:

nmcli device status
nmcli -f NAME,DEVICE,AUTOCONNECT con show

journalctl -u NetworkManager -b --since "30 minutes ago" | egrep -i "failed|activation|enp|enP|wlP|connection" | tail -80

sudo nmcli connection modify "Wired connection 4" connection.autoconnect no
sudo nmcli connection down "Wired connection 4" || true

nmcli -f NAME,DEVICE,AUTOCONNECT con show | grep -E "Wired connection 4|netplan-enp1s0f0np0|TMOBILE"

ping -I enp1s0f0np0 169.254.10.1
If the log shows a different failing profile name than "Wired connection 4", replace "Wired connection 4" with that exact name.


nmcli device status
nmcli -f NAME,DEVICE,AUTOCONNECT con show

journalctl -u NetworkManager -b --since "30 minutes ago" | egrep -i "failed|activation|enp|enP|wlP|connection" | tail -80

sudo nmcli connection modify "Wired connection 4" connection.autoconnect no
sudo nmcli connection down "Wired connection 4" || true

nmcli -f NAME,DEVICE,AUTOCONNECT con show | grep -E "Wired connection 4|netplan-enp1s0f0np0|TMOBILE"

ping -I enp1s0f0np0 169.254.10.1

That block is okay only if the failing profile on gx10b is also Wired connection 4.

A safer copy-paste block for gx10b is this:

nmcli device status
nmcli -f NAME,DEVICE,AUTOCONNECT con show

echo
echo "Recent failing NetworkManager activations:"
journalctl -u NetworkManager -b --since "30 minutes ago" | egrep -i "failed|activation|enp|enP|wlP|connection" | tail -80

echo
echo "Disabling likely stale wired autoconnect profiles, preserving Wi-Fi and netplan cluster link..."
for c in "Wired connection 1" "Wired connection 2" "Wired connection 3" "Wired connection 4" "Wired connection 5"; do
  if nmcli -t -f NAME con show | grep -Fxq "$c"; then
    sudo nmcli connection modify "$c" connection.autoconnect no
    sudo nmcli connection down "$c" || true
  fi
done

echo
echo "Current important connection autoconnect status:"
nmcli -f NAME,DEVICE,AUTOCONNECT con show | grep -E "Wired connection|netplan-enp1s0f0np0|TMOBILE|wlP9s9"

echo
echo "Testing cluster link back to gx10a:"
ping -I enp1s0f0np0 169.254.10.1
That disables only the stale Wired connection X profiles and leaves the important ones alone:

TMOBILE / wlP9s9
netplan-enp1s0f0np0 / enp1s0f0np0


Pasted text(79).txt
Document

Thought for a couple of seconds
That looks good. On gx10b, the stale wired profiles are now disabled and the cluster link still works.

You now have the right state on gx10b:

TMOBILE-793A          wlP9s9        yes
netplan-enp1s0f0np0   enp1s0f0np0   yes
Wired connection 1    --            no
Wired connection 2    --            no
Wired connection 3    --            no
Wired connection 4    --            no
Wired connection 5    --            no
And the 200G link is still working:

169.254.10.2 -> 169.254.10.1
0% packet loss
The earlier errors are harmless because the down command was attempted on profiles that were not active at that moment. The key part is that connection.autoconnect is now no. The log also confirms NetworkManager successfully updated and deactivated Wired connection 4. 


The connection popups should stop now on both machines. If one appears again, check only the last few minutes:

journalctl -u NetworkManager -b --since "5 minutes ago" | egrep -i "failed|activation|Wired connection|enP2p1s0f0np0"
Voice chat ended
3s