I attempt to build a Pod that runs a service that requires:
- cluster-internal services to be resolved and accessed by their FQDN (
*.cluster.local), - while also have an active OpenVPN connection to a remote cluster and have services from this remote cluster to be resolved and accessed by their FQDN (
*.cluster.remote).
The service container within the Pod without an OpenVPN sidecar can access all services provided an FQDN using the *.cluster.local namespace. Here is the /etc/resolv.conf in this case:
nameserver 169.254.25.10
search default.cluster.local svc.cluster.local cluster.local
options ndots:5
When OpenVPN sidecar manages resolv.conf
The OpenVPN sidecar is started in the following way:
containers:
{{- if .Values.vpn.enabled }}
- name: vpn
image: "ghcr.io/wfg/openvpn-client"
imagePullPolicy: {{ .Values.image.pullPolicy | quote }}
volumeMounts:
- name: vpn-working-directory
mountPath: /data/vpn
env:
- name: KILL_SWITCH
value: "off"
- name: VPN_CONFIG_FILE
value: connection.conf
securityContext:
privileged: true
capabilities:
add:
- "NET_ADMIN"
resources:
limits:
cpu: 100m
memory: 80Mi
requests:
cpu: 25m
memory: 20Mi
{{- end }}
and the OpenVPN client configuration contains the following lines:
script-security 2
up /etc/openvpn/up.sh
down /etc/openvpn/down.sh
Then OpenVPN client will overwrite resolv.conf so that it contains the following:
nameserver 192.168.255.1
options ndots:5
In this case, any service in *.cluster.remote is resolved, but no services from *.cluster.local. This is expected.
When OpenVPN sidecar does not manage resolv.conf, but spec.dnsConfig is provided
Remove the following lines from the OpenVPN client configuration:
script-security 2
up /etc/openvpn/up.sh
down /etc/openvpn/down.sh
The spec.dnsConfig is provided as:
dnsConfig:
nameservers:
- 192.168.255.1
searches:
- cluster.remote
Then, resolv.conf will be the following:
nameserver 192.168.255.1
nameserver 169.254.25.10
search default.cluster.local svc.cluster.local cluster.local cluster.remote
options ndots:5
This would work for *.cluster.remote, but not for anything *.cluster.local, because the second nameserver is tried as long as the first times out. I noticed that some folk would get around this limitation by setting up namespace rotation and timeout for 1 second, but this behavior looks very hectic to me, I would not consider this, not even as a workaround. Or maybe I'm missing something. My first question would be: Could rotation and timeout work in this case?
My second question would be: is there any way to make *.cluster.local and *.cluster.remote DNS resolves work reliably from the service container inside the Pod and without using something like dnsmasq?
My third question would be: if dnsmasq is required, how can I configure it, provided, and overwrite resolv.conf by also making sure that the Kubernetes-provided nameserver can be anything (169.254.25.10 in this case).
Best, Zoltán