Millions Wasted on Kubernetes Resources
Blog Article Published: 05/16/2023
Originally published by Sysdig.
Written by Javier Martínez.
The Sysdig 2023 Cloud-Native Security and Container Usage Report has shed some light on how organizations are managing their cloud environments. Based on real-world customers, the report is a snapshot of the state of cloud-native in 2023, aggregating data from billions of containers.
Our report retrieves data from cloud projects in the following areas:
- Number of containers that are using fewer CPU and memory than needed.
- Number of containers with no CPU limits set.
- Overallocation and estimated losses.
Limits and requests
Over the year, we have covered the importance of Limits and Requests. Simply put, they provide a way of specifying the maximum and the guaranteed amount of a computing resource for a container, respectively.
But they are more than that – they also indicate your company’s intention for a particular process. They can define the eviction tier level and the Quality of Service for the Pods running those containers.
Our study shows that:
- 49% of containers have no memory limits set.
- 59% of containers have no CPU limits set.
While setting up memory limits might have negative side effects, it’s important to set CPU limits to avoid starvation in processes or particular containers having drastic spikes of CPU consumption.
59% containers with no CPU limits
Our study showed that 59% of containers had no CPU limits set at all. Normally, adding CPU limits might lead to Throttling, but the report shows as well that on average 69% of the purchased CPU was unused, suggesting that no capacity planning was in place.
49% containers with no memory limits
Almost half of the containers had no memory limits at all. This particular case is special, since adding a limit to memory might eventually cause OOM errors.
Cloud providers give plenty of options to run applications with the ease of a click, which is a great way to kickstart the monitoring journey. However, cloud-native companies tend to allocate resources just to avoid becoming saturated, which can lead to astronomical costs.
Why does this happen?
- Urge to scale quickly
- Lack of resource consumption visibility
- Multi-tenant scaling
- Lack of Kubernetes knowledge
- Lack of capacity planning
Since CPU is the most costly resource in a cloud instance, companies might be overspending on something they are never going to use.
By using the average cost for AWS pricing on nodes based on CPU and memory, we can then calculate the average savings for companies that address these problems.
Specifically, our report showed that companies with more than 1,000 nodes could reduce their wasted resources by $10M per year.
In case the limits set are higher than the actual CPU, Kubernetes nodes will display:
Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.)
This means that Kubernetes will throttle some processes to provide higher CPU utilization.
Cost reduction strategies
By using applications to track resource usage and by performing capacity planning, companies can mitigate these costs with a clear investment/return net gain. Both Limits and Requests are useful tools that can be used to restrict the usage, but they can be cumbersome as they can lead to Pod eviction or over-commitment.
Limitranges are a useful tool to automatically assign a value range for both limits and requests for all containers within a namespace.
Both vertical autoscaling (increasing the resource size on demand) and horizontal autoscaling (increasing or decreasing the amount of Pods based on utilization) can be used to dynamically adapt to the current needs of your cloud-native solution.
Companies with multi-tenant solutions might come up with the problem that some of their projects are more demanding than others in terms of resources. Because of this, assigning the same resources might eventually cause overspending.
That’s why you can use ResourceQuotas to set a maximum amount of a resource to be consumed for all processes in a namespace.
There has been rapid growth in the number of companies investing in cloud solutions in recent years.
But with great power comes great responsibility. Cloud projects might want to find a balance for resources like CPU or memory.
Generally, they want to allocate enough so they never have saturation problems. But, on the other hand, over-allocating will lead to massive spending on unused resources.
The solution? Capacity planning, autoscaling, and visibility into costs are the best tools to take back control over your cloud-native spendings.
Trending This Week
#1 Shared Responsibility Model Explained
#2 Remote Working Strategy Desktop Virtualization or Remote Access
#3 How CASB Is Different from Web Proxy/Firewall
#4 Top Threat #5 to Cloud Computing: Insecure Software Development
#5 The Service Mesh Wars: Why Istio might not be favorite after all
Sign up to receive CSA's latest blogs
This list receives 1-2 emails a month.