I am following RKE2's Quick Start guide and this Dell's article to get a pair of RKE2 server and agent to work. I am using v1.24.9+rke2r2. The setup is done with Virtualbox using Ubuntu 20.04. Both VM's are using Adapter 1 for NAT, which always shows up as enp0s3 with an IP of 10.0.2.15. Adapter 2 is used with the Host-only Adapter option. It shows up as enp0s8 with an IP of 192.168.56.101 for the server, and 192.168.56.102 for the agent. vm-01 and vm-02 can ping each other with this setup. I got the server node to work just fine. I modify the file /etc/rancher/rke2/config.yaml as follows before restarting the server service:
tls-san:
- "192.168.56.101"
- "192.168.56.102"
server:
- "https://192.168.56.101:9345"
token:
- "K101795742c954c5c8f5d9aa21588a6e6990f29ccdb3e5412292f01ea4bb41f31ae::server:6bf9ab3e0a1e214d85335657578cac67"
On the agent node (vm-02), I set the /etc/rancher/rke2/config.yaml file as follow:
server:
- "https://192.168.56.101:9345"
token:
- "K101795742c954c5c8f5d9aa21588a6e6990f29ccdb3e5412292f01ea4bb41f31ae::server:6bf9ab3e0a1e214d85335657578cac67"
I then start the agent service. The first issue I notice is that the node kube-proxy-vm-02 never comes up on the initial start. I must restart the agent service for it to appear.
The second issue is that the extra rke2-coredns-rke2-coredns-XXX and rke2-canal-XXX nodes from the agent that come up never succeed. The coredns node is always stuck in Pending state. The canal node ends up in the Init:CrashLoopBackOff state. I just journalctl -u rke2-agent -f to check for error and this shows up:
Jan 18 11:49:48 vm-02 rke2[2346]: time="2023-01-18T11:49:48+07:00" level=info msg="Connecting to proxy" url="wss://10.0.2.15:9345/v1-rke2/connect"
Jan 18 11:49:48 vm-02 rke2[2346]: time="2023-01-18T11:49:48+07:00" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 10.0.2.15:9345: connect: connection refused"
Jan 18 11:49:48 vm-02 rke2[2346]: time="2023-01-18T11:49:48+07:00" level=error msg="Remotedialer proxy error" error="dial tcp 10.0.2.15:9345: connect: connection refused"
It seems to me that the agent service keeps calling the server node at 10.0.2.15:9345. However I clearly specify that the server is located at 192.168.56.101:9345. Looks like this is the reason for my problem. Could someone tell me what I should do to get past this, and proceed further? Many thanks!