Troubleshooting the Contrast Agent Operator

  • Updated

Objective

This article aims to provide some suggestions for steps that can be taken to troubleshoot issues with using the Contrast Agent Operator.

Application not Onboarded to the Contrast UI

If all of the documented steps for installing and configuring the Contrast Agent Operator have been followed but the application has not been onboarded, there are several things to check for.

The first of these to rule out is whether the operator has not been correctly installed and/or configured.  The following steps should help with that:

Are all the necessary resources deployed and in the correct namespaces?

Run the following command:

kubectl get secrets,clusteragentconnections,agentinjectors --all-namespaces

and verify that output looks something like this:

 

These three resources are required at a minimum for the operator to function correctly:

  • The Secret exists and is deployed to the contrast-agent-operator namespace.
  • The ClusterAgentConnection exists and is deployed to the contrast-agent-operator namespace.
  • The AgentInjector exists and is deployed to the same namespace as the application to be instrumented.

Is the application deployment correctly tagged?

The AgentInjector should contain a label definition under spec.selector.labels that indicates to the Agent Operator which deployments should be injected with the agent.

 For example, this kubectl command will display the AgentInjector manifest:

kubectl get agentinjector webgoatdotnetcore -o yaml --namespace default

Note the label definition.  This tells the operator that any deployment tagged with contrast-agent=dotnet-core should be injected.  The name and value here are arbitrary and can be anything you choose.  Glob patterns are also supported.

Now, verify that the application deployment correctly specifies the same label. Note that there are several labels that can be defined in an application deployment - but the important one here is metadata.labels.  You can see the corresponding label here:

kubectl get deployment webgoatdotnetcore -o yaml

The following kubectl command will return the corresponding label unambiguously:

kubectl get deployments --show-labels

Note that, in general, you can review any of the resources currently deployed to K8s, using this command structure:

kubectl get [resource type] [resource name] -o yaml --namespace [namespace]

For example:

kubectl get clusteragentconnection default-agent-connection -o yaml --namespace contrast-agent-operator

Logs and Metrics

The Agent Operator logs

The Agent Operator runs as a pod in the contrast-agent-operator namespace. You can view or tail logs from this pod to look for problems with agent injectors, configurations, and connections.

This kubectl command will display the logs for the deployment:

kubectl logs -f deployment/contrast-agent-operator --namespace contrast-agent-operator

Here's an example showing the operator startup followed by checking available pods for patching and then a successful injection on a pod:

[2023-06-23 19:48:36.0774 INFO Program] Starting the Contrast Security Agent Operator 1.0.0.0.
[2023-06-23 19:48:37.1566 INFO OptionsLogger] Option 'install-source' was changed from 'unknown' (default) -> 'kustomize'.
[2023-06-23 19:48:37.9095 INFO ApplicationStartup] Registered mutation webhook "contrast.k8s.agentoperator.controllers.v1pod.podmutationwebhook" under "/v1/pods/podmutationwebhook/mutate".
[2023-06-23 19:48:38.2095 DEBUG BaseApplier`2:PodResource] Resource 'PodResource/contrast-agent-operator/contrast-agent-operator-5545877df8-8kjg7' was reconciled.
[2023-06-23 19:48:38.2150 INFO MergingStateProvider] Merging state modified events until '06/23/2023 19:48:48 +00:00'.
[2023-06-23 19:48:38.2222 INFO MatchInjectorsHandler] Reactions are disabled, cluster state is settling or instance is not leading.
[2023-06-23 19:48:38.2398 DEBUG BaseApplier`2:PodResource] Resource 'PodResource/contrast-agent-operator/contrast-agent-operator-6547f5c6d8-x2qxg' was reconciled.
[2023-06-23 19:48:38.2398 DEBUG BaseApplier`2:PodResource] Resource 'PodResource/default/webgoatdotnetcore-6f745d4b6b-7ndsj' was reconciled.
[2023-06-23 19:48:38.2398 DEBUG BaseApplier`2:PodResource] Resource 'PodResource/kube-system/aws-node-2cglk' was reconciled.
[2023-06-23 19:48:38.2398 DEBUG BaseApplier`2:PodResource] Resource 'PodResource/kube-system/aws-node-s8zlp' was reconciled.
[2023-06-23 19:48:38.2398 DEBUG BaseApplier`2:PodResource] Resource 'PodResource/kube-system/coredns-5c5677bc78-m7krb' was reconciled.
[2023-06-23 19:48:38.2398 DEBUG BaseApplier`2:PodResource] Resource 'PodResource/kube-system/coredns-5c5677bc78-szqr7' was reconciled.
[2023-06-23 19:48:38.2398 DEBUG BaseApplier`2:PodResource] Resource 'PodResource/kube-system/kube-proxy-79zl5' was reconciled.
[2023-06-23 19:48:38.2398 DEBUG BaseApplier`2:PodResource] Resource 'PodResource/kube-system/kube-proxy-7zq7k' was reconciled.
[2023-06-23 19:48:38.2575 TRACE ClusterIdHandler] Internal cluster id was updated. (Generated: 2023-02-23T15:49:04.6346830+00:00)
[2023-06-23 19:48:38.2635 DEBUG BaseApplier`2:SecretResource] Resource 'SecretResource/contrast-agent-operator/contrast-cluster-id' was reconciled.
[2023-06-23 19:48:38.2651 DEBUG BaseApplier`2:ClusterAgentConnectionResource] Resource 'ClusterAgentConnectionResource/contrast-agent-operator/default-agent-connection' was reconciled.
.....
.....
[2023-06-23 19:50:26.4766 TRACE BaseSyncingHandler`3:ClusterAgentConnectionResource] Checking for cluster 'AgentConnectionSecret' eligible for generation across 1 templates in 3 namespaces.
[2023-06-23 19:50:26.4766 TRACE BaseSyncingHandler`3:ClusterAgentConnectionResource] Completed checking for entity generation after 5ms.
[2023-06-23 19:50:26.4766 TRACE BaseSyncingHandler`3:ClusterAgentConnectionResource] Checking for cluster 'AgentConnection' eligible for generation across 1 templates in 3 namespaces.
[2023-06-23 19:50:26.4766 TRACE BaseSyncingHandler`3:ClusterAgentConnectionResource] Completed checking for entity generation after 0ms.
[2023-06-23 19:50:26.5361 DEBUG PodMutationWebhook] Admission with method "CREATE".
[2023-06-23 19:50:26.5451 TRACE PodPatcher] Selected agent injector 'DotNetCore'.
[2023-06-23 19:50:26.5543 TRACE GlobMatcher] Compiling glob pattern '*'.
[2023-06-23 19:50:26.5596 INFO PodInjectionHandler] Patching pod from 'default/webgoatdotnetcore' using injector 'default/webgoatdotnetcore'.
[2023-06-23 19:50:26.6298 DEBUG PodMutationWebhook] AdmissionHook "contrast.k8s.agentoperator.controllers.v1pod.podmutationwebhook" did return "True" for "CREATE".
[2023-06-23 19:50:26.6585 DEBUG PodMutationWebhook] Admission with method "CREATE".
[2023-06-23 19:50:26.6585 TRACE PodPatcher] Selected agent injector 'DotNetCore'.
[2023-06-23 19:50:26.6585 INFO PodInjectionHandler] Patching pod from 'default/webgoatdotnetcore' using injector 'default/webgoatdotnetcore'.
[2023-06-23 19:50:26.6643 DEBUG PodMutationWebhook] AdmissionHook "contrast.k8s.agentoperator.controllers.v1pod.podmutationwebhook" did return "True" for "CREATE".
[2023-06-23 19:50:26.6800 DEBUG BaseApplier`2:PodResource] Resource 'PodResource/default/webgoatdotnetcore-569f49b79d-swzmn' was reconciled.
[2023-06-23 19:50:35.9086 DEBUG BaseApplier`2:PodResource] Resource 'default/webgoatdotnetcore-6f745d4b6b-7ndsj' of type 'PodResource' was deleted.
[2023-06-23 19:50:36.5137 TRACE MergingStateProvider] Flushing state modified, 2 events were merged.
[2023-06-23 19:50:36.5137 INFO MatchInjectorsHandler] Cluster state changed, re-calculating injection points (2 changes merged).
[2023-06-23 19:50:36.5137 TRACE MatchInjectorsHandler] Calculating changes needed for 'DeploymentResource/contrast-agent-operator/contrast-agent-operator'...
[2023-06-23 19:50:36.5137 TRACE MatchInjectorsHandler] Calculating changes needed for 'DeploymentResource/default/webgoatdotnetcore'...
[2023-06-23 19:50:36.5137 INFO PodTemplateStatusHandler] Pod 'default/webgoatdotnetcore-569f49b79d-swzmn' status was updated 'None' -> 'InjectionComplete'.
[2023-06-23 19:50:36.5477 TRACE ResourcePatcher] Preparing to patch status 'default/webgoatdotnetcore-569f49b79d-swzmn' ('Pod/v1') with '{"lastTransitionTime":"2023-06-23T19:50:36.516960\u002B00:00","message":"The pod is eligible for agent injection and is currently injected.","reason":"InjectionComplete","status":"True","type":"agents.contrastsecurity.com/injection-converged"}'.
[2023-06-23 19:50:36.6026 TRACE ResourcePatcher] Patch complete after 67ms.

If the operator logs indicate status is still in InjectionPending this article should help in tracking down the issue: Contrast-agent-operator stuck in InjectionPending.

Enabling more verbose logging for the Agent Operator

The log snippet above shows output at the TRACE logging level.  The default is INFO. See How to get logs from the Agent Operator for detail on configuring more detailed operator logging.

Getting cluster event logs

These may provide additional insight into problems when injectors are not working.

kubectl get events

The application deployment logs

You can utilize the regular STDOUT on pods to get an indication of whether the agent was successfully injected.

For example - to show the logs for a given deployment:

kubectl logs -f deployment/webgoatdotnetcore

Or to use the pod name - first, fetch the name:

kubectl get pods

kubectl logs pods/webgoatdotnetcore-569f49b79d-swzmn

An injected pod has two containers - one is the contrast-init container.  You can view its logs like so:

kubectl logs pods/webgoatdotnetcore-569f49b79d-swzmn -c contrast-init

Control plane logging

This may be more difficult to get and requires a cluster administrator. This is logging on the entire cluster, Kubernetes APIs, controllers, schedulers and auditors.

If access is available, you are looking to get recent API server and Control manager logging.

Some things to look for:

  • Kubernetes is unable to contact the operator
    • The issue could be due to non-standard security policies or configurations.
  • Focus on webhook-related errors
  • If there are no webhook-related errors, the issue could, again, be related to security policies or configurations.

Agent Logs

If all of the above checks out and it appears that the agent is being injected successfully, but the application is still not showing up in the Contrast UI, the next place to look would be the agent logs themselves.  Connect a terminal to the running pod - for example:

kubectl exec --stdin --tty [pod name] -- /bin/sh

and the agent logs can be found in /contrast/data/logs.

Agent logs can instead be sent to stdout by setting the environment variable CONTRAST__AGENT__LOGGER__STDOUT=true on the pod.
It can also be set globally by setting operator.enableAgentStdout: true in the helm chart.
Then running the following to view the logs:

kubectl logs pods/<pod_name> --namespace <namespace>

Performance Metrics

The Agent Operator generates metrics that can be accessed using an API endpoint as follows:

kubectl exec deployment/contrast-agent-operator --namespace contrast-agent-operator -- bash -c "curl -ks https://localhost:5001/api/v1/metrics | jq"

Output will look something like this:

  {
      "Injected.Java.PodsCount": 16,
      "Injected.NodeJs.PodsCount": 2,
      "Injected.PodsCount": 18,
      "Performance.AllocationRate": 1516244112,
      "Performance.CPUUsage": 3,
      "Performance.ExceptionCount": 2,
      "Performance.GCCommittedBytes": 325,
      "Performance.GCFragmentation": 57.8423222582457,
      "Performance.GCHeapSize": 150,
      "Performance.Gen0GCCount": 19,
      "Performance.Gen0Size": 24,
      "Performance.Gen1GCCount": 3,
      "Performance.Gen1Size": 732488,
      "Performance.Gen2GCCount": 1,
      "Performance.Gen2Size": 177961408,
      "Performance.ILBytesJitted": 1402364,
      "Performance.LOHSize": 60709424,
      "Performance.MonitorLockContentionCount": 12,
      "Performance.NumberofActiveTimers": 19,
      "Performance.NumberofAssembliesLoaded": 187,
      "Performance.NumberofMethodsJitted": 21170,
      "Performance.PercentTimeinGCsincelastGC": 0,
      "Performance.POHPinnedObjectHeapSize": 289688,
      "Performance.ThreadPoolCompletedWorkItemCount": 1566,
      "Performance.ThreadPoolQueueLength": 0,
      "Performance.ThreadPoolThreadCount": 6,
      "Performance.TimespentinJIT": 0,
      "Performance.WorkingSet": 523,
      "Resources.AgentConfigurationResource.NamespacesCount": 37,
      "Resources.AgentConfigurationResource.ResourcesCount": 74,
      "Resources.AgentConnectionResource.NamespacesCount": 37,
      "Resources.AgentConnectionResource.ResourcesCount": 37,
      "Resources.AgentInjectorResource.NamespacesCount": 37,
      "Resources.AgentInjectorResource.ResourcesCount": 148,
      "Resources.ClusterAgentConnectionResource.NamespacesCount": 1,
      "Resources.ClusterAgentConnectionResource.ResourcesCount": 1,
      "Resources.DaemonSetResource.NamespacesCount": 16,
      "Resources.DaemonSetResource.ResourcesCount": 24,
      "Resources.DeploymentConfigResource.NamespacesCount": 439,
      "Resources.DeploymentConfigResource.ResourcesCount": 4564,
      "Resources.DeploymentResource.NamespacesCount": 104,
      "Resources.DeploymentResource.ResourcesCount": 251,
      "Resources.Global.NamespacesCount": 543,
      "Resources.Global.ResourcesCount": 27020,
      "Resources.PodResource.NamespacesCount": 475,
      "Resources.PodResource.ResourcesCount": 2852,
      "Resources.SecretResource.NamespacesCount": 543,
      "Resources.SecretResource.ResourcesCount": 19086,
      "Resources.StatefulSetResource.NamespacesCount": 9,
      "Resources.StatefulSetResource.ResourcesCount": 30,
      "UptimeSeconds": 60427.7069415,
      "Process.WorkingSet64": 524357632,
      "Process.MinWorkingSet": 0,
      "Process.MaxWorkingSet": 2147483648,
      "Process.PeakWorkingSet64": 909082624,
      "Process.PrivateMemorySize64": 611205120,
      "Process.VirtualMemorySize64": 10074427392,
      "Process.PeakVirtualMemorySize64": 10326552576,
      "Process.PagedMemorySize64": 0,
      "Process.PeakPagedMemorySize64": 0,
      "Process.NonpagedSystemMemorySize64": 0,
      "Process.TotalProcessorTime": "00:56:53.1000000",
      "Process.UserProcessorTime": "00:48:32.6300000",
      "Process.PrivilegedProcessorTime": "00:08:20.4700000",
      "Process.Thread": 16,
      "Process.Modules": 155,
      "IsLeader": "True"
  }

Also useful is the output of:

kubectl describe deployment/contrast-agent-operator --namespace contrast-agent-operator

Which will provide something like this:

Name:                  contrast-agent-operator
Namespace:             contrast-agent-operator
CreationTimestamp:     Mon, 28 Oct 2024 15:55:54 -0400
Labels:                app.kubernetes.io/managed-by=Helm
                       app.kubernetes.io/name=operator
                       app.kubernetes.io/part-of=contrast-agent-operator
Annotations:           deployment.kubernetes.io/revision: 4
                       meta.helm.sh/release-name: contrast-agent-operator
                       meta.helm.sh/release-namespace: contrast-agent-operator
Selector:              app.kubernetes.io/name=operator,app.kubernetes.io/part-of=contrast-agent-operator
Replicas:              1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:          RollingUpdate
MinReadySeconds:       0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels:                app.kubernetes.io/name=operator
                       app.kubernetes.io/part-of=contrast-agent-operator
Service Account:       contrast-agent-operator-service-account
Containers:
  contrast-agent-operator:
    Image: contrast/agent-operator:1.5.4
    Port: 5001/TCP
    Host Port: 0/TCP
    Limits:
      cpu: 2
      memory: 512Mi
    Requests:
      cpu: 500m
      memory: 256Mi
    Liveness: http-get https://:5001/health delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness: http-get https://:5001/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      CONTRAST_DEFAULT_REGISTRY: contrast
      CONTRAST_SETTLE_DURATION: 10
      CONTRAST_EVENT_QUEUE_SIZE: 10000
      CONTRAST_EVENT_QUEUE_FULL_MODE: DropOldest
      CONTRAST_WEBHOOK_SECRET: contrast-web-hook-secret
      CONTRAST_WEBHOOK_CONFIGURATION: contrast-web-hook-configuration
      CONTRAST_ENABLE_EARLY_CHAINING: false
      CONTRAST_INSTALL_SOURCE: helm
      CONTRAST_INITCONTAINER_CPU_REQUEST: 100m
      CONTRAST_INITCONTAINER_CPU_LIMIT: 100m
      CONTRAST_INITCONTAINER_MEMORY_REQUEST: 64Mi
      CONTRAST_INITCONTAINER_MEMORY_LIMIT: 64Mi
      POD_NAMESPACE: (v1:metadata.namespace)
      CONTRAST_WEBHOOK_SERVICENAME: contrast-agent-operator
      CONTRAST_WEBHOOK_HOSTS: $(CONTRAST_WEBHOOK_SERVICENAME),$(CONTRAST_WEBHOOK_SERVICENAME).$(POD_NAMESPACE).svc,$(CONTRAST_WEBHOOK_SERVICENAME).$(POD_NAMESPACE).svc.cluster.local
    Mounts: <none>
  Volumes: <none>
Conditions:
  Type Status Reason
  ---- ------ ------
  Progressing True NewReplicaSetAvailable
  Available True MinimumReplicasAvailable
OldReplicaSets: contrast-agent-operator-85ffdb79b8 (0/0 replicas created)
NewReplicaSet: contrast-agent-operator-58f844bf6b (1/1 replicas created)
Events: <none>

These outputs can provide invaluable detail when troubleshooting resource issues.

Related Articles

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request