Here at Sysdig we follow Kubernetes development pretty closely. Next Tuesday the next release of our favourite orchestration tool will get out of the oven freshly baked, so this is a summary of what’s new in Kubernetes 1.12!
We have grouped features in things that we find cool, security stuff, storage improvements, cloud providers support and other internal changes. Let’s have a look:
Some cool stuff that we are super exicted about
#117: Arbitrary / Custom Metrics in the Horizontal Pod Autoscaler (Beta)
The new Horizontal Pod Autoscaler specification allows you to assign arbitrary labels to your metrics and use this information to scale.
For example, scaling on
http_requests but only taking into account the
HorizontalPodAutoscaler controller can fetch metrics in two different ways: direct Heapster access, and REST client access. Here at Sysdig we wrote previously about HPAs on: How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics.
#21: Pod Vertical Scaling (Beta)
Using Vertical Scaling the resource limits assigned to a pod (or set of pods) can be dynamically determined based on analysis of historical resource utilization, amount of resources available in the cluster and real-time events, such as OOMs. This is particularly useful for pods that are costly to destroy and recreate.
#585: RuntimeClass (Alpha)
RuntimeClass is a new cluster-scoped resource that surfaces container runtime properties to the control plane. RuntimeClasses are assigned to pods through a
runtimeClass field in the
PodSpec. This provides a new mechanism for supporting multiple runtimes in a cluster and/or node and select which one to use. Finally Docker and Rkt together :) See more about this again in the design proposal doc.
#382: Taint node by condition (Beta)
Taint node by condition feature causes the node controller to dynamically create taints corresponding to observed node conditions. The user can choose to ignore some of the node’s problems (represented as Node conditions) by adding appropriate pod tolerations. This feature is promoted from alpha to beta in 1.12.
#432: Mount namespace propagation (Stable)
Mount propagation allows to share volumes mounted by a container to other containers in the same pod, or even to other pods in the same node. It means that any mounts from inside the container are reflected in the host’s mount namespace. An good application is to enable containerization of volume plugins. More info in the volumes documentation.
#495: Configurable Pod process namespace sharing (Beta)
Users can configure containers within a pod to share a common PID namespace by setting an option in the
PodSpec. More on this in the Kubernetes documentation: share process namespace.
#587: Resource quota API (Beta)
The quota system will identify specific resources that are limited by default. With current behavior, resource consumption is unlimited if a quota doesn’t exist. With this feature, consumption will be denied if the quota doesn’t exist, restricting consumption of high-cost resources. More about this in the resoure quota docs.
#43: Kubelet TLS bootstrap (Stable)
This is also a featured we covered in our blog post Kubernetes Security: RBAC and TLS, now graduating as stable. The kubelet can generate a private key and a signing request (CSR) to sent over to the cluster CA to get the corresponding certificate. You can read more info about it here.
#267: Kubelet server TLS certificate rotation (Beta)
The kubelets are able to rotate both its client and/or server certificates, actually we already wrote about this security best practice in our Kubernetes Security Guide. We can automatically rotate them through the respective
RotateKubeletServerCertificate feature flags in the kubelet that are enabled by default now.
#366: Egress support for Network Policy (Stable)
NetworkPolicy objects support an
to section to allow or deny traffic based on IP ranges or Kubernetes metadata (i.e.
namespace=”proxy”). Cluster egress mechanisms often require rewriting the source or destination IP of packets. this means that connections from pods to
Service IPs that get rewritten to cluster-external IPs may or may not be subject to
ipBlock-based policies (see below).
#367: IPBlock for Network Policies (Stable)
NetworkPolicy objects now support CIDR IP blocks to be configured in the rule definitions. You can combine Kubernetes-specific selectors with IP-based ones both for ingress and egress policies. For example: “allow connections from any pod in the
‘default’ namespace with the label
‘role=db’ to CIDR
10.0.0.0/24 on TCP port
#460: Encryption at rest KMS integration (Beta)
Data encryption at rest using Google Key Management Service as an encryption provider. You can read more about it on KMS providers for data encryption.
#177: Snapshot / restore volume support for Kubernetes (CRD + external controller) (Alpha)
Similar to how API resources
PersistentVolumeClaim are used to provision volumes for users and administrators,
VolumeSnapshot API resources can be provided to create volume snapshots for users and administrators. Read more about volume snapshots here.
#561: Topology aware dynamic provisioning (Beta)
Topology aware dynamic provisioning. We will be using this to allow a Pod to request one or more Persistent Volumes (PV) with topology that are compatible with the Pod’s other scheduling constraints, such as resource requirements and affinity/anti-affinity policies. Read more here about Allowed topologies.
#557: Kubernetes CSI topology support (Beta)
When we are using multi-zone clusters, pods can be spread across zones in a specific region, so this means that single-zone storage backends should be provisioned in each zone. The volume binding mode handles when volumen binding and dynamic provisioning should happen.
You can read more about this and its support for different cloud providers here.
#554: Dynamic maximum volume count (Beta)
When dynamic volume limits feature is enabled, Kubernetes automatically determines the node type and supports the appropriate number of attachable volumes for the node and vendor.
You can read more about dynamic volume limits in the Kubernetes documentation.
Cloud providers support improvements
#586: Azure Availability Zones (Alpha)
Kubernetes 1.12 brings support for Azure availability zones. Nodes in within each availability zone will be added with label
failure-domain.beta.kubernetes.io/zone=<region>-<AZ> and Azure managed disks storage class will be provisioned taking this into account. Read more about it on Using Availability Zones.
#513: Azure Virtual Machine Scale Sets (Stable)
This feature adds support for Azure Virtual Machine Scale Sets. This is an Azure technology which let you to create and manage a group of identical and load balanced virtual machines.
You can read more about the Azure VMSS in the Azure Documentation.
#514: Add Azure support to cluster-autoscaler (Stable)
This feature adds support for Azure Cluster Autoscaler. As resource demands increase, the cluster autoscaler allows your cluster to grow as required. The Cluster Autoscaler does this scaling your agent nodes based on pending pods.
You can read more about the Azure Cluster Autoscaler in the Azure Documentation for AKS.
#604: Azure cross resource group nodes (Alpha)
Adds support for cross resource group (RG) nodes and unmanaged (such as on-prem) nodes in Azure cloud provider. The Kubernetes Azure cloud-controller-manger docs have all the info about this too.
#558: GCE PD topology support (Beta)
Adds support to the Google Compute Engine Persistent Disk topology (rules to describe accessibility of an object with respect to location in a cluster.
In multi-zone clusters, Pods can be spread across zones in a region. Single-zone storage backends should be provisioned in the zones where Pods are scheduled. See Storage Classes: GCE PD for more details..
#567: AWS EBS topology support (Beta)
AWS EBS topology support. Adds topology aware dynamic provisioning support for AWS servers, the same as before but now for AWS EBS. See Storage Classes: AWS EBS for more details.
#115: Easier installation and upgrades through ComponentConfig (Alpha)
In earlier Kubernetes versions, modifying the base configuration of the core cluster components (like the kubelet or the scheduler) was a delicate process, that often required live patching and thus, was not easily automatable.
ComponentConfig is an ongoing effort to make components configuration more dynamic and directly reachable through the Kubernetes API.
#288: Improve the multi-platform compatibility (Beta)
Kubernetes aims to support the multiple architectures, including arm, arm64, ppc64le, s390x and windows platforms. Automated CI e2e conformance tests have been deployed to ensure compatibility moving forward. A related goal is to support running clusters with nodes of mixed architectures, although we are not there yet though.
#612: Quota by priority (Beta)
Pods can be created at a specific priority and you can control a pod’s consumption of system resources based on a pod’s priority, by using the
scopeSelector. See Pod priority preemption to know more about this.
#548: Schedule DaemonSet Pods by kube-scheduler (Beta)
This feature is not new, but is enabled by default in 1.12. Instead of being scheduled by the DaemonSet controller, are scheduled by the default scheduler. This means that we will see pods and daemonsets created in Pending state and the scheduler will consider pod priority and preemption.
You can read more about how daemon pods are scheduled again in the Kubernetes doc.
#576: APIServer DryRun (Alpha)
Add an apiserver “dry-run” parameter so that requests can be validated and “processed” without actually being persisted. The idea is to be able to send requests to modifying endpoints, and see if the request would have succeeded (admission chain, validation, merge conflicts, …) and/or what would have happened without having it actually happen. More information about this on this document from sig-api-machinery team.
#578: Server-side printing in kubectl (Stable)
kubectl get should get columns back from the server, not the client, and be able to handle this type of server response under all use-cases. Currently, the server sends bare information that’s handled by kubectl to print the columns. Now, the server will be the one who sends the headers and fields for each row so kubectl or any other client is able to reproduce the tabular representation. More on this feature design proposal doc.
#579: Updated plugin mechanism for kubectl (Beta)
Kubectl should support extensions adding new commands as well as overriding specific subcommands (at any depth). This proposal introduces the main design for a plugin mechanism in kubectl. The mechanism is a git-style system, that looks for executables on a user’s $PATH whose name begins with kubectl-. This allows plugin binaries to override existing command paths and add custom commands and subcommands to kubectl.
For example, the user experience for switching namespaces could go from:
kubectl config set-context $(kubectl config current-context) --namespace=mynewnamespace to
kubectl set-ns mynewnamespace
set-ns would be a user-provided plugin which would call the initial
kubectl config set-context command and set the namespace flag according to the value provided as the plugin’s first parameter. Have a look at the design proposal if you are curious about this feature.
#591: Horizontal Pod Autoscaler to reach proper size faster (Beta)
Horizontal Pod Autoscaler algorithm has been improved to reach the desired size faster, full details on how this works here.
#592: TTL after finish (Alpha)
This features will clean up finished jobs automatically. These are usually no longer needed in the system. Keeping them around will generate more load in the API server. A TTL can now be configured to have a controller to clean them. Read more in the Jobs docs.
#593: Scheduler checks feasibility and scores a subset of all cluster nodes (Alpha)
Before Kubernetes 1.12, kube-scheduler used to check the feasibility of all the nodes in a cluster and then scored the feasible ones. Kubernetes 1.12 has a new feature that allows the scheduler to stop looking for more feasible nodes once it finds a certain number of them. This improves the scheduler’s performance in large clusters. More about this in the scheduler performance tunning docs.
#614: SCTP support for Services, Pod, Endpoint, and NetworkPolicy (Alpha)
SCTP is now supported as additional protocol alongside TCP and UDP in Pod, Service, Endpoint, and NetworkPolicy. See the issue for more info.
Developers have numerous options for instrumenting code to monitor and troubleshoot application performance. When should you turn to custom metrics like Prometheus, statsd and JMX? When are tracing and APM the right fit? How do these solutions compare?