Anodot's Prometheus-Agent is deployed on your cluster, and collects the cluster's metrics directly from Prometheus, or a Prometheus-like interface, like Thanos, in a very lightweight manner. It consumes little resources and requires little data transfer.
The metrics are sent to Anodot, eventually allowing you to conveniently view and browse the usage and the costs of your cluster components in Anodot Cost.
How does it work
- The agent runs as a single instance of an hourly scheduled CronJob (or as a Deployment) in the Kubernetes cluster
- It queries aggregated metrics of the last hour from Prometheus (or anything that supports PromQL, like Thanos), based on a predefined list of queries.
- Packs the retrieved metrics in compressed files and sends them to Anodot's S3 bucket.
- Minimal status logs are also sent to Anodot's CloudWatch for troubleshooting and support.
- Anodot continues the metrics processing, where the detailed analysis takes place, outside of the client's cluster.
Capabilities and footprint
The agent was tested to collect metrics from a cluster of up to 1000 nodes.
Examples of the agent's small footprint when collecting metrics for a duration of an hour, about a cluster which contains 400 nodes:
- Runtime: about 4 seconds (for one hour of data)
- Data transfer: about 1 MB
Another example from an environment which contains 2500 nodes:
- Runtime: about 2 minutes (for one hour)
- Data transfer: about 27 MB
Setup
Prerequisites
-
The agent's installation requires an access key from Anodot, which allows the agent to send metrics and logs to Anodot.
The same access key will serve any agent in a cluster linked to the same Anodot-Costs payer account. - To get the access key, please contact support@anodot.com
- Linked accounts/Subscriptions should be connected and validated
- For other technical requirements, please refer to the installation section below.
Installation
The Prometheus-Agent installation is available using the k8s-metrics-collector helm chart from the Anodot Cost helm repository:
https://github.com/pileus-cloud/charts/tree/main/helm-chart-sources/k8s-metrics-collector
In the documentation of that chart you will find:
- Technical requirements specification, including the list of required metrics to be available and queried by the agent.
- Installation commands.
Limitations
-
The visibility of Kubernetes data in Anodot Cost depends on billing data availability which covers the usage data.
It means that once the agent is working properly, it will require a full processing of the next invoice that covers that date before that data could be shown in the Anodot Cost console. -
For AWS accounts: Anodot also supports metrics collection by AWS CloudWatch Agent, however, only one type of an agent (CloudWatch-Agent / Prometheus-Agent) is supported under the same Anodot Cost payer account. Namely, one cannot have CloudWatch-agent in one cluster and Prometheus-Agent in another cluster as the single source of usage metrics collection for Anodot Cost.
If you already use CloudWatch agent and would like to switch to Prometheus-Agent, contact Anodot to plan the agent migration. -
A single agent instance collects data for a single Kubernetes cluster.
In case your environment stores metrics from multiple clusters, the agent has to be configured in order to filter out metrics from other sources. In this manner, multiple agent instances are required to collect data from multiple clusters.
Questions & Answers
- I have multiple clusters that I want to monitor. Do I need to install multiple agents?
Yes. One agent for one Kubernetes cluster.
The common setup is that a Prometheus server is hosted by the same cluster it monitors, and the agent is deployed in that cluster as well.
Other setups are also available. Note especially the case described ahead, where multiple clusters are monitored by the same Prometheus server.
- How many Anodot access keys do I need if I have multiple clusters?
One key per payer account. That means that even if you have multiple clusters owned by multiple linked accounts, as long as they are under the same payer account, the same access key can be used by all your deployed agents.
- I'm not using a regular Prometheus server, but an alternative setup such as Thanos/Grafana-Mimir/DataDog/Coralogix etc. Is the agent compatible?
The agent is compatible with different PromQL-compatible servers, as long as it has proper access to them (see other questions for special cases) and the required metrics are collected and accessible.
- My Prometheus/PromQL server stores metrics from multiple clusters. Does the agent support this kind of setup?
Yes. The agent has to be configured in order to collect only the metrics that come from the cluster it is intended for. This is important in order to avoid confusion and duplicate costs when viewing the metrics in the Anodot-Cost platform.
In this kind of setup, it is typical that metrics are labeled in a way that makes it possible to identify which cluster is the source of which collected metric. For example, attaching to each metric a label whose name is "cluster", and its value is a name that identifies the cluster.
The agent supports a flexible configuration for this kind of filtering, where the value of that setting is a subexpression of a PromQL query that filters out anything else. For example, if the label name that identifies a cluster is "cluster", and the value of our cluster is "mine", a subexpression for this in PromQL would be cluster="mine". When this subexpression is used as a string value for the METRIC_CONDITION parameter in the agent's Chart, it needs to be quoted, therefore resulting as METRIC_CONDITION: 'cluster="mine"' . This mechanism supports even more complex conditions, as long as they can be expressed by a PromQL subexpression.
- The metrics of my cluster are stored in a Prometheus server which is hosted by another cluster. Does the agent support this? Where should I deploy the agent in this case?
The agent supports this kind of setup. It is typical to deploy the agent in the same cluster it is intended to monitor, but it can be deployed anywhere as long as it has proper access to the Prometheus server (therefore it might be easier to deploy it on the same cluster hosting the Prometheus server). Note that the agent settings that describe the cluster, e.g. cluster-name, account-ID, linked-account, cloud-provider, etc. should describe the cluster which the agent is supposed to monitor.
- Accessing my Prometheus/PromQL server requires a user name and a password. Does the agent support this?
Yes. The agent's configuration includes optional settings for user-name and password that are required to be provided in this case.
- Accessing my Prometheus/PromQL server requires special headers in the HTTP request (for example, the 'X-Scope-OrgID' header in a multi-tenant mode as in some Grafana setups). Does the agent support this?
Yes. The agent's configuration includes optional settings for request headers.
- Can I run the agent without hosting a Prometheus server? My cluster is small, and hosting a Prometheus server requires too many resources.
It's beyond the scope of this document to provide guidance for such setup, but it is possible to store metrics from a cluster in a Prometheus server outside of the monitored cluster, consequently saving the overhead of the resources consumed by Prometheus from your small clusters. Even the agent does't have to run inside the monitored cluster (although it is very lightweight and can be hosted in very small clusters with no significant footprint).
- How long does it take until I can see the K8S Cost breakdown in the Anodot-Cost console?
The visibility of Kubernetes data in Anodot-Costs depends on billing data availability which covers the usage data.
It means that once the agent is working properly, it will require a full processing of the next invoice that covers that date, before that data could be shown in the Anodot-Costs console. This usually happens within 48 hours.