Kubermatic branding element

Troubleshooting with AI - How k8sgpt makes debugging Kubernetes clusters easier

AI tech wires

About a year and a half ago, on March 28, 2023, one of the fastest-growing projects of the CNCF (Cloud Native Computing Foundation) became a reality. With k8sgpt, there is another practical use case for AI in the IT environment. The tool offers an efficient way to detect and resolve errors and issues in a Kubernetes cluster.

Artificial intelligence is a hot topic that everyone is talking about, but practical applications are often limited. In a spur-of-the-moment action, a new open-source project emerged in the cloud-native world. Alex Jones and Thomas Schuetz saw the advantage of Large Language Models (LLMs) and their increasing prevalence. The idea behind it: using AI to debug Kubernetes clusters.

In a cluster, problems are often not easy to identify, and error messages are not easy to understand. With k8sgpt, a solution for these specific challenges is now available. The tool analyzes these errors and sends the error message to an AI interface. Users receive a clear error description and guidance with possible steps to resolve the error in response.

Two options for using k8sgpt

There are two ways to use k8sgpt. One option is to integrate it directly into your own CLI. Linux, Windows, and OSX are supported platforms, and simple installation options can be found in the official documentation. By choosing this option, access to the cluster is granted via the existing kubeconfig. k8sgpt performs a pre-analysis of the data and then sends it to the AI backend. The result is an output in the shell that lists the various errors and provides possible solutions.

The option to use different AIs underscores the independence that the tool strives to achieve. All major commercial providers are covered. There are backends for OpenAI, Gemini, Azure OpenAI, AWS SageMaker, and Bedrock, as well as the option to use local LLMs via localAI and Ollama. This is especially interesting since many companies in the DACH region do not allow the use of commercial providers.

Data privacy takes precedence

k8sgpt is transparent about the data it processes and sends to the AI. Depending on the analyzer used, k8sgpt collects different data. An example would be k8sgpt analyze pod. The following data is collected: Container Status Message, Pod Name, Pod Namespace, and Event Message. Only when k8sgpt is used with the - explain flag, data is sent to the AI backend. The data sent in this case are the same data collected by k8sgpt itself.

By using the anonymize flag, data can be anonymized before being sent to the AI backend. Anonymization includes Deployment Name and Deployment Namespace. No logs are collected, and the API server only collects basic data needed by the k8sgpt analyzer.

The safest way is, of course, to use a local LLM variant. There are many options available. The easiest way is to run a local LLM in your cluster directly, and the most convenient way to do this is with LocalAI or Ollama, as these solutions can be easily installed in Kubernetes clusters.

Automation with the k8sGPT Operator

The k8sGPT Operator offers a convenient way to use k8sgpt within a Kubernetes cluster and fully utilise the advantages of AI-supported troubleshooting.

It automates the execution of scans of multiple clusters at regular intervals. These scans then collect the data from the clusters and forward it to Prometheus so that the status of multiple clusters can be viewed centrally.

Of course, this can also be automated and implemented in line with the Infrastructure as Code model.

The operator can be installed via ArgoCD, for example. Helm, which is a package manager for Kubernetes, is used for the installation itself.

After installation, the operator must be configured with a k8sgpt resource. This resource defines the settings for the scans, such as the namespace to be analysed, the AI backend to be used and the type of data to be collected. The operator offers various options for customisation. For example, users can configure the frequency of scans, the filters to be used and the behaviour during data collection.

Conclusion: The wide range of functions and options described make k8sgpt an extremely useful tool for debugging Kubernetes clusters.

Mario Fahlandt

Mario Fahlandt

Customer Delivery Architect