Introduction
The definition of each container may specify a request value, which is a reservation mechanism that guarantees the specified amount of resources (generally CPU or memory) to be available for that container. In Kubernetes, these settings are derived from the definition of the pod, which in turn, is derived from the definition of its workload, if it is subject to one. Therefore, in terms of Kubernetes objects, the request values are generally controlled by the workload objects.
Higher request value provides robustness for the specific workload (by guaranteeing that these resources are always available for it), but on the other hand, prevents these resources from being used by other workloads even if these resources are not being actively used by the requesting workload. When there are no sufficient resources to run a workload, another worker node is required, which when added, directly increases the clusters' cost. Therefore, even if the evaluation of a workload's cost, based on its resource allocation, is somewhat theoretical, it has very direct implications for the actual cost of the cluster, and therefore rightsizing a Kubernetes workload is essential for the cluster's cost optimization.
Workload rightsizing is about adjusting the workload request of compute resources (CPU and Memory) to the actual needs of the workload, which are derived from its actual usage of those resources. There's a range for what would be considered the optimal request value, which is derived from one's preferences for the tradeoff between cost optimization and workload robustness.
Prerequisites
The recommended default setting for the CPU/Memory Aggregation function is set for Max. However to use Max/Percentile 95 metric Anodot Cost's Prometheus-Agent, version 0.3.12 or later is required, and max-queries must not be disabled in the Agent's configuration (Max/Percentile 95 are new metrics introduced in the recent agent version).
Alternatively, the CPU/Memory Aggregation function setting can be changed to Average as supported by the older agent version as well.
How it's done
The recommendation process calculates the optimized request value of the compute resource, namely CPU or Memory, based on the workload's recorded usage and the preferences for the recommendation.
Within the customized time range of days to check (counted backwards from the last day of full data availability), the workload's request value is measured. As the most recent value of the request is the basis for cost and performance prediction, the range between the most recent day and the earliest day where that value remained consistent is called the effective time frame.
For example, let's assume that we have a workload in Day 1 with a CPU request value of 200 millicores, that remained on this value until Day 10 where it was changed to 150 millicores, and did not change at least until Day 14, which is yesterday, the last day with full data availability.
If the setting for days to check is 7 (the default), the effective time frame starts at day 10.
However, the effective time frame is within the range of days to check. Therefore, if the CPU request value was changed and remained consistent since day 5, the effective time frame starts at day 8, as set by the range of days to check.
The effective time frame is the reference for the workload cost and usage analysis. The workload's usage, whether it's the average or maximal value, is calculated based on that frame. So is its annual compute cost, which is the average daily cost of the workload during that frame, extrapolated to a year.
During that frame, the resource usage of all the workload's subordinate pods is gathered and calculated:
-
Average usage
Average of daily average usage -
Maximal usage
Maximum of daily maximum usage -
Hourly 95th percentile
Maximum of daily maximum of hourly 95th percentile usage
Each value is served for a different strategy of rightsizing.
Basing the request value with respect to the average usage has great potential of cost saving, but with the risk of not having enough resources available when the usage is less predictable.
Basing the request value with respect to the maximal usage observed is the most performance-conservative approach, which has good chances of guaranteeing sufficient availability of resources for your workload, but requires reserving more unused resources in advance, therefore less cost efficient.
The maximal usage is also subject to abnormal spikes of usage that may not reflect the regular work of the workload, which the resource reservation request is aimed to serve. A method to mitigate abnormal spikes is to use high percentile instead of the maximal value. This is the purpose of the hourly 95th percentile usage calculation, which provides a relatively conservative strategy, but less vulnerable to waste resources due to abnormal behavior.
Illustration of the three types of usage aggregations, and how the 95th percentile might be used to ignore abnormal spikes:
One of these calculated usage values, according to the usage aggregation baseline preference, is used as the basis for the recommendation. A customized buffer is added to that value, in order to provide additional flexibility above the observed usage, and the result is the target request, or the recommended request value. That value (for each compute resource when relevant) is the goal of the rightsizing recommendation for a given workload.
Let's illustrate this with the previous example of usage graphs, and we'll assume a preference of 8 days to check. The three kinds of usage aggregation are based on the effective time range:
One of these baseline values, according to the usage aggregation baseline preference, is used for calculating the target request.
In this example, the Maximum baseline is higher than the current request, therefore cannot be used to recommend a new target request value that will have potential cost savings. This is one of the scenarios where the recommendation is ruled out.
The 95th Percentile and the Average baselines are still relevant. We'll assume the 95th percentile was set as the baseline preference. Then a target buffer is added to that value, whose size is set by the Memory target buffer preference in this example (or CPU target buffer, in case of CPU), resulting in the target value for the request.
In this example, the target value is still below the current request value. If it were not the case, that would have been another scenario for ruling out the recommendation.
The target value is the recommended request value. We can re-evaluate the workload cost using that value, using the actual allocated cost and assuming the same nature of behavior during the effective time frame. This can be roughly illustrated by the difference between the current request and the recommended request.
Cost evaluation notes
When costs are introduced in the context of this recommendation, it is the compute costs allocated to the workload.
The current monthly / annual cost of a workload as displayed in the recommendation details is not the cost that was actually allocated to it historically, but an extrapolation of the cost that was allocated to it during the effective time frame (extended to the period of a month / year).
With a new request value (recommended request value), the change in the expected resources allocation is the basis for the re-evaluation of the new cost, by applying this ratio on the allocated cost during the effective time frame. This provides an estimation of the cost allocation for the workload when assuming the same environmental conditions during the effective time frame onwards.
Compute costs consist of the evaluation of the CPU cost and the Memory cost independently of each other, according to the same logic used in Anodot Cost's Kubernetes costs breakdown, which is affected by the compute cost weights preferences, and actual costs of the compute resources. The re-evaluation of the new cost also consists of the two parts of the compute cost, each one according to its new recommended request value, where relevant.
There are several important remarks regarding the evaluation of the potential cost resulted from applying the recommendation:
- Workloads are consumers of compute resources, therefore have direct implication for incurring costs. However, if the resources allocated to a workload are decreased by 10% (for example), and consequently the cost allocated to that workload is decreased by 10%, it doesn't mean that the 10% cost difference will no longer be spent by the payer. The difference in money spent only takes effect when less nodes are required to host these workloads as a result of rightsizing them, and that usually does not depend on a single workload. Therefore, similarly to Kubernetes cost allocation regardless of rightsizing recommendation, the evaluated potential cost savings is relative to the full cost savings potential.
- The same aspects of the general Kubernetes cost breakdown apply here as well: costs are allocated on resources allocation basis, but this does not mean that the same allocation always incurs the same costs, since other external factors are involved: workload costs in Anodot Cost are derived from the real cloud compute resources used to host them, and their prices may vary to due different reasons, such as the type of the machine, whose resource capacity is not the only factor that determines its price.
In the context of this recommendation, the same external conditions that resulted in the current allocated cost are assumed for the cost re-evaluation based on the new expected resources allocation.
Ruling out recommendations
In order to issue a rightsizing recommendation for a workload, it has to fulfill the following conditions:
- The cluster has to be properly onboarded and integrated with Anodot Cost for Kubernetes costs breakdown.
- The workload has a sufficient consequent settings consistency (which was collected by Anodot Cost); no less than the minimal effective time frame preference.
- In case the usage aggregation baseline preference is set to Maximum (the default) or 95th Percentile, this data has to be available.
- There must be at least one resource (CPU or Memory) with a recommended new request value, i.e. "target value".
- There is a request value set for the resource.
- The target value, which is the baseline usage (according to usage aggregation baseline) + target buffer (according to CPU target buffer or Memory target buffer) has to be lower than the current request value.
- The overall savings amount resulting from applying the recommendation has to be no less than the minimal cost savings - monthly quantity or Minimal cost savings - percentage preferences, if set to non-zero.
Preferences
The following table describes the available preferences and how they affect the recommendation results.
Preference | Description | Default value |
Days to check | Number of days (from yesterday backwards) to analyze the workload's usage, limiting its effective time frame. | 7 |
Minimal effective time frame |
Minimal number of days in an effective time frame, which is the range of having consistent request settings. Workloads with shorter frames are not candidates for rightsizing recommendations. The longer the effective time frame is (which is derived from the workload's age and consistency of settings, and the preference of days to check), the evaluation of its usage is more established, and consequently, the prediction derived from the recommendation analysis is more reliable. Cannot be longer than Days to check. |
2 |
Usage aggregation baseline | Type of usage aggregation that is used for calculating the target request value (the recommended request value). Select between Maximum (most conservative), 95th Percentile, and Average (least conservative). | Maximum |
CPU target buffer | Size in percentage added to the calculated baseline usage of CPU, in order to determine the target request value. | 15% |
Memory target buffer | Size in percentage added to the calculated baseline usage of Memory, in order to determine the target request value. | 15% |
Minimal cost savings - monthly quantity |
Minimum cost quantity (in the used currency unit), for a month, required from the potential savings resulted by applying the recommendation. When the potential savings are lower, the recommendation is ruled out. If Minimal cost savings - percentage is also set, both conditions are required. |
0 (unused) |
Minimal cost savings - percentage |
Minimum percentage of cost savings resulted by applying the recommendation. When the potential savings are lower, the recommendation is ruled out. Only one of the minimal cost saving constraint preferences can be set (monthly quantity or percentage). If Minimal cost savings - monthly quantity is also set, both conditions are required. |
5% |
Recommendation workload details
You can find details describing the workload in a recommendation item.
-
Effective time frame
Number of consecutive days the workload was consistently defined with the same request values, within the range of days to check (see in the chapter above detailed and illustrated definition). -
Effective time frame start
The day where the effective time frame of the workload begins - General usage aggregation terms:
-
Average
The average of the daily average usage, of all the pods, within the described duration. -
95th Percentile
The maximum of the daily maximum of the hourly 95th percentile usage, as sampled from all the pods, within the described duration. -
Maximum
The maximum of the daily maximum sampled from all the pods, within the described duration.
-
Average
-
CPU Usage & Utilization
These values, which apply to each of the usage aggregation terms, refer to the effective time frame.
CPU Usage values describe the quantity of CPU used in millicore units.
Utilization values are the ratio between the usage quantitative value, and the request value, if such was set. The ratio might be above 100% if the usage is above the request. -
Memory Usage & Utilization
These values, which apply to each of the usage aggregation terms, refer to the effective time frame.
Memory usage values describe the quantity of RAM used in scaling units of Bytes.
Utilization values are the ratio between the usage quantitative value, and the request value, if such was set. The ratio might be above 100% if the usage is above the request. -
Memory & CPU Performance charts
These graphs display usage metrics and the request value in daily granularity, and typically extend to a greater range than the effective time frame (depends on activity and data availability). As usual, the values describe aggregations where the units describe the usage of a single pod. The usage is only counted when a pod is active (meaning, inactivity is not aggregated as "0" value). - Pods activity and uptime
-
Number of pods
The number of different pods running in a day, according to the last day in the effective time frame.
This value provides indication about how dynamic the workload is, but it doesn't indicate how long, or simultaneous, the pod's uptime is (for this, refer to the running hours and the uptime ratio chart). -
Daily running hours
The number of running hours in a day on average for a single pod, averaged from all the workload's pods. The duration it describes matches the uptime ratio chart.
24 is the maximum value, which means that all the pods in the workload are constantly running. -
Daily accumulated running hours
The total number of running hours accumulated from all the workload's pods, in a day on average. The duration it describes matches the uptime ratio chart. -
Average Uptime Ratio chart
Similarly to the daily running hours, the value 1 represents full uptime, namely 24 hours a day. This chart provides the same information in daily granularity. The Single pod graph represents the average uptime ratio of all the workload's pods (and therefore, 1 is the maximum value, which means that all the pods in the workload are constantly running), while the All pods accumulated graph represents the accumulated running hours (in units of "uptime ratio"), per day.
-
Number of pods
Since the value 1 represents full uptime, it's similar to the number of simultaneous pods, or the equivalent of the number of simultaneous pods in a day running constantly.
Simple examples:
A workload with one pod that runs only six hours every day will be described as follows:
Number of pods: 1
Daily running hours: 6
Daily accumulated running hours: 6
Average uptime ratio - Single pod: 0.25
Average uptime ratio - All pods accumulated pod: 0.25
A workload with one pods that runs constantly will be described as follows:
Number of pods: 1
Daily running hours: 24
Daily accumulated running hours: 24
Average uptime ratio - Single pod: 1
Average uptime ratio - All pods accumulated pod: 1
A workload with two pods that both run 12 hours a day will be described as follows:
Number of pods: 2
Daily running hours: 12
Daily accumulated running hours: 24
Average uptime ratio - Single pod: 0.5
Average uptime ratio - All pods accumulated pod: 1
A workload with two pods that run constantly will be described as follows:
Number of pods: 2
Daily running hours: 24
Daily accumulated running hours: 48
Average uptime ratio - Single pod: 1
Average uptime ratio - All pods accumulated pod: 2
Concrete example:
In this chart, we can see that the common behavior of the workload is 4 pods running simultaneously and constantly. We can see momentary increases in the number of pods on July 9 and 29, and a momentary decrease in the uptime on July 30.