
Previously, modifying resource requests or limits required destroying the pod and creating a new one with updated specifications. Applications went offline during the transition. Network connections dropped. The process required maintenance windows for routine operational tasks.
The new implementation modifies cgroup (control group) settings directly on running containers. When resource specifications change, Kubernetes updates the existing cgroup rather than recreating the pod. Applications continue running without interruption.
The feature particularly benefits AI training workloads and edge computing deployments. Training jobs can now scale vertically without restarts. Edge environments gain resource flexibility without the complexity of pod recreation.
“For AI, that’s a really big training job that can be scaled and adjusted vertically, and then for edge computing, that’s really big to where there’s added complexity and actually adjusting those workloads,” Hagen said.
The feature requires cgroups v2 on the underlying Linux nodes. Kubernetes 1.35 deprecates cgroups v1 support. Most current enterprise Linux distributions include cgroups v2, but older deployments may need OS upgrades before using in-place resource adjustments.
Gang Scheduling supports distributed AI workloads
Among the preview features that is in the new release is a capability known as gang scheduling. The feature (tracked as KEP-4671) is intended to help distributed applications that require multiple pods to start simultaneously.
