What’s new in Kubernetes 1.16?

Kubernetes 1.16 is almost here and it’s packed with cool new features, like ephemeral containers for easy pod debugging, support for dual-stack network in pods, many new options for the scheduler… And we are just getting started! Here is the list of what’s new in Kubernetes 1.16. This is what’s new in #Kubernetes 1.16 Click to tweet

Kubernetes 1.16 – Editor’s pick:

These are the features that look more exciting to us for this release (ymmv):

#995 Kubeadm for Windows
#277 Ephemeral containers
#1206 Kubernetes metrics overhaul
#693 Node topology manager

Kubernetes 1.16 core

#277 Ephemeral containers

Stage: Alpha Feature group: node Ephemeral containers are a great way to debug running pods, as you can’t add regular containers to a pod after creation (you should use sysdig tools like kubectl capture or kubectl trace for that though!), but you can run ephemeral containers. Right now the steps to run an ephemeral container aren’t straightforward. Once this feature is stable you may be able to run them with just kubectl debug:

kubectl debug -c debug-shell --image=debian target-pod -- bash

These containers executes within the namespace of an existing pod and has access to the file systems of its individual containers. Ephemeral containers aren’t meant to be used for regular deployments, so they have some limitations. For example, they will never be automatically restarted and you can’t configure them as a regular container. In particular, fields like ports, livenessProbe, readinessProbe or lifecycle that imply a role in a pod will be disallowed.

#563 Add IPv4/IPv6 dual-stack support

Stage: Alpha Feature group: network As the use of IPv6 increases it’s getting more common to manage clusters with mixed IPv4 and IPv6 network configurations. Up until now a Kubernetes cluster could only run in either IPv4 or IPv6 mode. You needed the assistance of plugins to assign dual-stack addresses on a pod, and it wasn’t a convenient solution, as Kubernetes would only be aware of one address per pod. Now you can natively run your cluster in dual-stack mode. For example, you can have dual-stack pods (services still need to be either IPv4 or IPv6). To use dual-stack you need to enable the feature gate IPv6DualStack in the relevant components of your cluster, and then setup your services. You can get the full steps here here.

#752 New endpoint API

Stage: Alpha Feature group: network Until now, all endpoints for a service were stored in one single object. In large Services with many pods, this Endpoints object may grow too big and become problematic; as big objects cannot be stored in etcd, and also aren’t propagated to kube-proxy(s). In addition, everytime there is a change in an endpoint the whole Endpoints object is re-computed, stored and shared with all watchers. This process doesn’t scale too well and can become a bottleneck in scenarios like rolling upgrades, where there is a burst of endpoint changes. The new EndpointSlice API will split endpoints into several Endpoint Slice resources, solving many of the current API problems. It’s also designed to support other future features, like multiple IPs per pod.

#688 Pod overhead: account resources tied to the pod sandbox, but not specific containers

Stage: Alpha Feature group: node In addition to the requested resources, your pods needs some extra resources just to maintain their runtime environment. With PodOverhead feature gate enabled, Kubernetes will take into account this overhead when scheduling a pod. The Pod Overhead is calculated and fixed at admission time and it’s associated with the pod’s RuntimeClass, get the full details here.

#895 Even pod spreading across failure domains

Stage: Alpha Feature group: scheduling One of the challenges of running a multi-zone cluster is to spread your pods evenly, so high availability will work correctly and the resource utilization will be efficient. With topologySpreadConstraints you can distribute your pods across zones, with a maximum difference in pod count number of maxSkew. Zones are created by grouping nodes with the same topologyKey label. If we want to deploy this pod:

apiVersion: v1
kind: Pod
metadata:
  name: mypod
…
spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        foo: bar
…

In a cluster with this topology:

Label
        +---------------+---------------+
zone=   |     zoneA     |     zoneB     | 
        +-------+-------+-------+-------+
node=   | node1 | node2 | node3 | node4 |
        +-------+-------+-------+-------+
foo:bar | P     | P     | P _   | _     |
        +-------+-------+-------+-------+

The only way to comply with the topology constraints is for the pod to be deployed in node3 or in node4.

#950 Add pod-startup liveness-probe holdoff for slow-starting pods

Stage: Alpha Feature group: node Probes allows Kubernetes to monitor the status of your applications. You can use livenessProbe to periodically check if the application is still alive. One example container defines this probe:

livenessProbe:
  httpGet:
    path: /healthz
    port: liveness-port
  failureThreshold: 3
  periodSeconds: 10

If it fails 3 times in 30s the container will be restarted. But as this container is slow and needs more than 30 seconds to start, the probe will fail and the container will be restarted again. This new feature lets you define a startupProbe that will hold off all the other probes until the pod finishes its startup:

startupProbe:
  httpGet:
    path: /healthz
    port: liveness-port
  failureThreshold: 30
  periodSeconds: 10

Now our slow container has up to 5 minutes (30 checks * 10 seconds = 300s) to finish its startup.

#964 Extending RequestedToCapacityRatio priority function to support resource bin packing of extended resources

Stage: Alpha Feature group: scheduling The RequestedToCapacityRatioPriority function allows to schedule pods depending on the relative usage of each node. That way you can choose whether to schedule pods in the less used nodes, or to fill the ones that are already in use. The new resources property lets you further define the relative usage of a node. By assigning weights to the node resources you can define scenarios like “CPU usage is 3 times more important than used memory”, then schedule more pods in nodes with idle CPUs even if they don’t have that much free memory.

{
    "kind" : "Policy",
    "apiVersion" : "v1",
    …
    "priorities" : [
       …
      {
        "name": "RequestedToCapacityRatioPriority",
        "weight": 2,
        "argument": {
          "requestedToCapacityRatioArguments": {
            "shape": [
              {"utilization": 0, "score": 0},
              {"utilization": 100, "score": 10}
            ],
            "resources": [
              {"name": "intel.com/foo", "weight": 5},
              {"name": "CPU", "weight": 3},
              {"name": "Memory", "weight": 1}
            ]
          }
        }
      }
    ],
  }

#894 RuntimeClass scheduling

Stage: Graduating to Beta Feature group: node The initial

<a target="_blank" href="https://sysdig.com/blog/whats-new-kubernetes-1-14/#585httpsgithubcomkubernetesenhancementsissues585runtimeclass" rel="noopener noreferrer">RuntimeClass</a>

implementation was meant for homogeneous clusters, where every node supports every RuntimeClass. This upgrade improves scheduling in heterogeneous clusters, with specialized nodes that only support a subset of the runtime classes. In these clusters, pods are now automatically scheduled only to the nodes that have support for their RuntimeClass.

#995 Kubeadm for Windows

Stage: Alpha Feature group: cluster-lifecycle Support for windows nodes was introduced in Kubernetes 1.14, however there wasn’t an easy way to join windows nodes to a cluster. Starting in Kubernetes 1.16, kubeadm join will be available for Windows users with partial functionality. It will lack some features like kubeadm init or kubeadm join --control-plane.

#1043 RunAsUserName for Windows

Stage: Alpha Feature group: windows Now that Kubernetes has support for Group Managed Service Accounts we can use the runAsUserName Windows specific property to define which user will run a container’s entrypoint. The property is inside the PodSecurityContext and SecurityContext structs, and it needs to follow the format DOMAINUSER, where the domain part is optional.

apiVersion: v1
kind: Pod
…
spec:
  securityContext:
    windowsOptions:
      runAsUserName: "NT AUTHORITY\NETWORK SERVICE"

#689 Support GMSA for Windows workloads

Stage: Graduating to Beta Feature group: windows This will allow an operator to choose a GMSA at deployment time, and run containers using it to connect to existing applications such as a database or API server without changing how the authentication and authorization are managed inside the organization. Read more in the release for 1.14 of the “What’s new in Kubernetes” series.

#492 Admission webhook

Stage: Graduating to Stable Feature group: API Until now mutating webhooks were only called once, in alphabetical order. In Kubernetes 1.15 this changed, allowing webhook re-invocation if another webhook later in the chain modifies the same object. Read more in the release for 1.15 of the “What’s new in Kubernetes” series.

#956 Add watch bookmarks support

Stage: Graduating to Beta Feature group: API The “bookmark“ watch event is used as a checkpoint, indicating that all objects up to a given resourceVersion requested by the client have already been sent. The API can skip sending all these events, avoiding unnecessary processing on both sides. Read more in the release for 1.15 of the “What’s new in Kubernetes” series.

Hardware support

#693 Node topology manager

Stage: Alpha Feature group: node Machine learning, scientific computing and financial services are examples of systems that are computational intensive or require ultra low latency, this kinds of workloads benefits from proper resource allocation. For example, performance is improved if a process runs on one isolated CPU core rather than jumping between cores or sharing time with other processes. Parallel processes also run better on cores inside the same CPU socket (in multi socket systems). The node topology manager is a kubelet component that centralizes the coordination of hardware resource assignments. Currently this task is done by independent components (CPU manager, device manager, CNI), which sometimes ends up on unoptimized allocations. Only pods running in Guaranteed QoS class that have an integer cpu value are considered by the Topology Manager, like the one in this example:

…
spec:
  containers:
  - name: nginx
    image: nginx
    resources:
      limits:
        memory: "200Mi"
        cpu: "2"
        example.com/device: "1"
      requests:
        memory: "200Mi"
        cpu: "2"
        example.com/device: "1"

Configuration management

#1177 Advanced configurations with kubeadm (using Kustomize).

Stage: Alpha Feature group: cluster-lifecycle kubeadm works great to configure most of the Kubernetes clusters, but it has some limitations and some advanced use cases requires extra tools. With Kustomize you can patch base configurations to obtain configuration variants, which helps to manage some advanced scenarios. For example, you can have a base configuration for your service, then patch it with different limits for each of your dev, test and prod environments. Now kubeadm integrates with Kustomize. When passing patches via the --experimental-kustomize flag, kubeadm will first apply those patches to the existing configuration, then proceed as usual with the patched config.

kubeadm init --experimental-kustomize kubeadm-patches/

The flag will be renamed to just -kustomize when this feature reaches beta. Learn more and check other examples here.

#555 Server-side apply

Stage: Graduating to Beta Feature group: API machinery This feature aims to move the logic away from kubectl apply to the apiserver, fixing most of the current workflow pitfalls and also making the operation accessible directly from the API (for example using curl), without strictly requiring kubectl or a Golang implementation. Read more in the release for 1.14 of the “What’s new in Kubernetes” series.

Cloud providers

#980 Finalizer protection for service LoadBalancers

Stage: Graduating to Beta Feature group: network There are various corner cases where cloud resources are orphaned after the associated Service is deleted. Finalizer Protection for Service LoadBalancers was introduced to prevent this from happening. Read more in the last release of the “What’s new in Kubernetes” series.

#586 Azure availability zones

Stage: Graduating to Stable Feature group: azure Nodes in Azure will be added with label failure-domain.beta.kubernetes.io/zone=<region>-<az></az></region> and topology-aware provisioning is added for Azure managed disks storage class. Read more in the release for 1.13 of the “What’s new in Kubernetes” series.

#604 [Azure] Cross resource group nodes

Stage: Graduating to Stable Feature group: azure Cross resource group (RG) nodes and unmanaged (such as on-prem) nodes in Azure cloud provider are now supported. Read more in the release for 1.13 of the “What’s new in Kubernetes” series.

Storage

#1122 Support CSI plugins in Windows

Stage: Alpha Feature group: storage Container Storage Interface plugins were created to allow the development of third party storage volume systems. Starting with Kubernetes 1.16, Windows nodes will be able to use the existing CSI plugins.

#989 Extend allowed PVC DataSources

Stage: Graduating to Beta Feature group: storage Using this feature, you can “clone” an existing PV. A Clone results in a new, duplicate volume being provisioned from an existing volume. Read more in the 1.15 release of the “What’s new in Kubernetes” series.

#556 Add resizing support to CSI volumes

Stage: Graduating to Beta Feature group: storage To support resizing of CSI volumes an external resize controller will monitor all PVCs. If a PVC meets following criteria for resizing, it will be added to controller’s workqueue. Read more in the 1.14 release of the “What’s new in Kubernetes” series.

#596 CSI inline volume support

Stage: Graduating to Beta Feature group: storage CSI volumes can only be referenced via PV/PVC today. This works well for remote persistent volumes. This feature introduces the possibility to use CSI volumes as local ephemeral volumes as well. Read more in the 1.14 release of the “What’s new in Kubernetes” series.

Kubernetes 1.16 custom resources

#95 CustomResourceDefinitions

Stage: Graduating to Stable Feature group: API We’ve been covering custom resource definitions in the 1.15 release of the “What’s new in Kubernetes” series. This feature groups the many modifications and improvements that were performed to graduate CustomResourceDefinitions to Stable in the Kubernetes 1.16 release:

Subresources for Custom Resources
Defaulting and Pruning for Custom Resources
Webhook Conversion for Custom Resources
Publishing the CRD OpenAPI Schema

#571 Subresources for custom resources

Stage: Graduating to Stable Feature group: API With this feature you can enable the Status and Scale subresources for Custom resources. By adding the comment // +kubebuilder:subresource:status in your CDR definition you will be enabling the /status subresource, which exposes the current status in the system of your custom resource.

// MySQL is the Schema for the mysqls API
// +k8s:openapi-gen=true
// +kubebuilder:subresource:status
type MySQL struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    Spec   MySQLSpec   `json:"spec,omitempty"`
    Status MySQLStatus `json:"status,omitempty"`
}

By enabling the Scale subresource, you’ll be able to check how many replicas of your subresource are deployed vs the desired amount. You can obtain this information from the exposed /scale subresource or executing kubectl get deployments. You can also use kubectl scale to adjust the number of replicas of your custom resource. To enable the Scale subresource you need to define the corresponding JSONPaths in the CDR:

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
…
spec:
 subresources:
   status: {}
   scale:
     specReplicasPath: .spec.replicas
     statusReplicasPath: .status.replicas
     labelSelectorPath: .status.labelSelector

#575 Defaulting and pruning for custom resources

Stage: Graduating to Stable Feature group: API Two features aiming to facilitate the JSON handling and processing associated with CustomResourceDefinitions. Read more in the 1.15 release of the “What’s new in Kubernetes” series.

#598 Webhook conversion for custom resources

Stage: Graduating to Stable Feature group: API Different CRD versions can have different schemas. You can now handle on-the-fly conversion between versions defining and implementing a conversion webhook. Read more in the 1.15 release of the “What’s new in Kubernetes” series.

#692 Publish CRD OpenAPI schema

Stage: Graduating to Stable Feature group: API CustomResourceDefinition (CRD) allows the CRD author to define an OpenAPI v3 schema to enable server-side validation for CustomResources (CR). Read more in the 1.15 release of the “What’s new in Kubernetes” series.

Deprecations

#1164 Deprecate and remove SelfLink

Stage: Alpha Feature group: API The field SelfLink is present in every Kubernetes object and contains a URL representing the given object. This field does not provide any new information and its creation and maintenance has a performance impact, so a decision has been taken to progressly deprecate SelfLink by Kubernetes 1.21

#1179 Building Kubernetes without in-tree cloud providers

Stage: Alpha Feature group: cloud-provider Specific code for cloud providers is being moved away from the core Kubernetes repository (in-tree) to their own external repositories (out-of-tree). By doing so, cloud providers will be able to develop and make releases independent from the core Kubernetes release cycle. In this halfway moment cloud providers are being copied out-of-tree but they are still available in-tree, so developers may end up with two versions of the same cloud provider in their builds. How do you know which one of the two versions is active? With this alpha feature you can disable in-tree cloud providers to ensure your build is only using the external version.

#1206 Kubernetes metrics overhaul

Stage: Alpha Feature group: instrumentation This feature summarizes several tasks needed to align Kubernetes metrics with their Instrumentation Guidelines. Main tasks are changing the names and units of some metrics to be in line with the rest of the Prometheus ecosystem. Kubernetes 1.14 marked a lot of metrics as deprecated and created replacements that follow the guidelines. Kubernetes 1.16 removes the labels pod_name and container_name from cAdvisor metrics, duplicates of pod and container. Kubernetes 1.17 will disable the metrics deprecated in 1.14, and they will be finally removed in 1.18. If you want a full list of deprecated and new metrics you can check this feature’s keg.

That’s all folks! Exciting as always, get ready to upgrade your clusters if you are intending to use any of these features. If you liked this, you might want to check out our previous What’s new in Kubernetes editions:

And, if you enjoy keeping up to date with the Kubernetes ecosystem, subscribe to our container newsletter, a monthly email with the coolest stuff happening in the cloud-native ecosystem.