Issue
After installation of the Contrast Agent Operator the operator's pod fails to startup due to a Connection reset by peer
error.
ERROR Program] Fatal error during application startup.|System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception.
---> System.IO.IOException: Unable to read data from the transport connection: Connection reset by peer.
---> System.Net.Sockets.SocketException (104): Connection reset by peer
Cause
The operator is failing on the following action when it's attempting to communicate with the mutating webhook on the API server
INFO ApplicationStartup] Registered mutation webhook "contrast.k8s.agentoperator.controllers.v1pod.podmutationwebhook" under "/v1/pods/podmutationwebhook/mutate"
The issue is happening here: https://github.com/buehler/dotnet-operator-sdk/blob/v6.4.0/src/KubeOps/Operator/Leadership/LeaderElector.cs#L57
The operator is making a call to the control plane looking for a list of deployments that match ours. The agent operator is trying to setup a leader role in case there are multiple pods running for our operator. This is done very early on in the startup process and that request is failing.
Resolution
This issues was observed when configuring an AKS cluster with API Server VNet Integration ( https://learn.microsoft.com/en-us/azure/aks/api-server-vnet-integration). Where the API Server (control plane) and application nodes are in different VNets. Where a firewall is configured to control traffic coming in an out of the VNets. Essentially the firewall was not allowing communication on port 443 from the application nodes to the control plane. Bi-directional communication between these VNets on port 443 is required for operators to work properly.
While this issue was originally reported in AKS, it can occur in any Kubernetes cluster where networking/restrictions are in place to limit communication between the control plane and application nodes.