Release Notes 2.3.0

DKP® version 2.3 was released on August 16, 2022.

You must be a registered user and logged on to the support portal to download this product. New customers must contact their sales representative or sales@d2iq.com before attempting to download or install DKP.

Release Summary

Welcome to D2iQ Kubernetes Platform (DKP) 2.3! This release provides fixes to reported issues, integrates changes from previous releases, and maintains compatibility and support for other packages used in DKP.

Supported Versions

Any DKP cluster you attach using DKP 2.3.0 must be running a Kubernetes version in the following ranges:

Kubernetes Support	Version
DKP Minimum	1.22.0
DKP Maximum	1.23.x
DKP Default	1.23.12
EKS Default	1.22.x
AKS Default	1.23.x
GKE Default	1.22.x-1.23.x

DKP 2.3 comes with support for Kubernetes 1.23, enabling you to benefit from the latest features and security fixes in upstream Kubernetes. This release comes with approximately 47 enhancements. To read more about major features in this release, visit https://kubernetes.io/blog/2021/12/07/kubernetes-1-23-release-announcement/.

Features and Enhancements

The following improvements are included in this release.

Support for Amazon EKS

DKP 2.3 enables easier management of EKS clusters. When utilizing AWS Elastic Kubernetes Service (EKS), you must add additional services for capabilities such as multi-cluster management, operational insights, Kubecost, Flux, Velero, Prometheus, Grafana, Cert-Manager, Calico, Gatekeeper, Dex, and artificial intelligence (AI) and machine learning (ML) based on Kubeflow.

DKP brings value to EKS customers by providing all components needed for a production-ready Kubernetes environment. DKP 2.3 provides the capability to provision EKS clusters using the DKP UI, In addition to above, DKP 2.3 also provides the ability to upgrade your EKS clusters using the DKP platform, making it possible to manage the comeplete lifecycle of EKS clusters from a centralized platform.

DKP adds value to Amazon EKS through features such as Time to Value, Cloud-Native Expertise, Military-Grade Security, and Lower TCO, among other features documented in the EKS insfrastructure section.

Additionally, we now provide the ability to EKS Upgrade via CLI.

Support for GCP

Provision your Kubernetes clusters on the Google Cloud Platform with DKP 2.3. By making DKP as your choice of platform for multi-cluster management, you will now be able to centrally manage all your Kubernetes clusters in the top three public cloud providers (AWS, Azure, and GCP), making multi-cloud Kubernetes management easy with DKP.

Attach an existing GKE cluster to DKP

You can attach an existing GKE (Google Kubernetes Engine) cluster to DKP. After attaching, you can use DKP to examine and manage the cluster.

Improved Documentation Site

DKP 2.3 comes with consolidated documentation, where we reorganized and updated the formerly separate Konvoy and Kommander documentation into the new D2iQ Help Center. This provides you with improved search functionality and gives us the future ability to add multimedia content. Explore the new D2iQ Help Center at the same site as our previous documentation. You can still access all n-2 supported documentation at https://archive-docs.d2iq.com/.

The new organization highlights capabilities available to DKP Enterprise users only, such as EKS and AKS, where you’ll see the new Enterprise badge:

Previous versions of the documentation remain available on our Archived Docs site.

Multiple Availability Zones

Availability zones (AZs) are isolated locations within data center regions where public cloud services originate and operate. DKP now supports multiple AZs. Because all the nodes in a node pool are deployed in a single AZ, you might want to create additional node pools, to ensure your cluster has nodes deployed in multiple AZs. By default, the control-plane Nodes are created in three different zones. However, the default worker Nodes reside in a single AZ. You can create additional node pools in other AZs with the dkp create nodepool command.

Custom Domains and Certificates for Workload (managed and attached) Clusters

Configure a custom domain and certificate for your Managed or Attached cluster. DKP supports configuring a custom domain name per cluster, so you can access the DKP UI and other platform services via that domain. Additionally, you can provide a custom certificate for the domain, or one can be issued automatically by Let’s Encrypt (or other certificate authorities supporting the ACME protocol).

Updated Image Bundle Names

We changed the Image Bundle extensions from tar.gz to .tar, as follows:

The Kommander Image bundle is now kommander-image-bundle-v2.3.0.tar
The DKP Catalog Image bundle is now dkp-catalog-applications-image-bundle-v2.3.0.tar
The DKP Insights Catalog Application Image bundle is now dkp-insights-image-bundle-v2.3.0.tar

Since these files are no longer a compressed file format (.gz), they no longer require decompression.

Updated Custom Certificate Name

When you install or create a cluster with a custom domain, the Certificate Authority (CA) automatically creates a certificate. In DKP versions 2.2 and earlier, this certificate is called kommander-traefik-acme. In this version of DKP and later, the certificate is called kommander-traefik-tls.

If you have set up automation or customization around this certificate, ensure you update the certificate name in objects that reference it.

DKP Upgrades

From version 2.2 to the latest version 2.3, the following upgrades are available:

Ability to upgrade all Platform apps in CLI (non air-gapped).
Ability to upgrade all Platform apps in CLI (air-gapped).

For more information see Upgrade DKP | Supported-upgrade-paths.

DKP Insights Alert Details

DKP Insights detects various kinds anomalies in the Kubernetes clusters and workloads and presents them as Insight Alerts in an Insights table. In this release, we enhance an insight alert with a details page. For more information, see DKP Insights Release Notes .

Support for Cluster-scoped Configuration and Deployments

When you enable an application for a Workspace, you deploy that application to all clusters within the Workspace. You can also choose to enable or customize an application on certain clusters within a Workspace. This enhanced functionality allows you to use DKP in a multi-cluster scenario without restricting the management of your clusters from a single workspace.

The cluster-scoped enablement and customization of applications is an Enterprise only feature, which allows the configuration of all Workspace applications (Platform, DKP Catalog and Custom applications) in your managed and attached clusters, regardless of your environment configuration (air-gapped or non-air-gapped).

Upgrade vSphere from the CLI

We provided the ability to upgrade Core Addons as well as Kubernetes version, via the CLI. Refer to these sections for more information:

Grafana Loki log retention policy

By default, Grafana Loki has a storage retention period of one week. If you want to keep log metadata and logs for a different period of time, override the ConfigMap to modify the storage retention period in Grafana Loki.

2.3.0 components and applications

The following are component and application versions for DKP 2.3.0.

Components

Component Name	Version
Cluster API Core (CAPI)	1.1.3-d2iq.5
Cluster API AWS Infrastructure Provider (CAPA)	1.4.1
Cluster API Google Cloud Infrastructure Provider (CAPG)	1.1.0
Cluster API Pre-provisioned Infrastructure Provider (CAPPP)	0.9.2
Cluster API vSphere Infrastructure Provider (CAPV)	1.2.0
Cluster API Azure Infrastructure Provider (CAPZ)	1.3.2
Konvoy Image Builder	1.19.9
containerd	1.4.13
etcd	3.4.13

Applications

Common Application Name	APP ID	Version	Component Versions
Centralized Grafana	centralized-grafana	34.9.3	chart: 34.9.3 prometheus-operator: 0.55.0
Centralized Kubecost	centralized-kubecost	0.26.0	chart: 0.26.0 kubecost: 1.95.0
Cert Manager	cert-manager	1.7.1	chart: 1.7.1 cert-manager: 1.7.1
Chartmuseum	chartmuseum	3.9.0	chart: 3.9.0 chartmuseum: 3.9.0
Dex	dex	2.9.18	chart: 2.9.18 dex: 2.31.0
Dex K8s Authenticator	dex-k8s-authenticator	1.2.13	chart: 1.2.13 dex-k8s-authenticator: 1.2.4
DKP Insights Management	dkp-insights-management	0.2.2	chart: 0.2.2 dkp-insights-management: 0.2.2
External DNS	external-dns	6.5.5	chart: 6.5.5 external-dns: 0.12.0
Fluent Bit	fluent-bit	0.19.21	chart: 0.19.20 fluent-bit: 1.9.3
Gatekeeper	gatekeeper	3.8.1	chart: 3.8.1 gatekeeper: 3.8.1
Gitea	gitea	5.0.9	chart: 5.0.9 gitea: 1.16.8
Grafana Logging	grafana-logging	6.28.0	chart: 6.28.0 grafana: 8.4.5
Grafana Loki	grafana-loki	0.48.4	chart: 0.48.4 loki: 2.5.0
Istio	istio	1.14.1	chart: 1.14.1 istio: 1.14.1
Jaeger	jaeger	2.32.2	chart: 2.32.2 jaeger: 1.34.1
Karma	karma	2.0.1	chart: 2.0.1 karma: 0.70
Kiali	kiali	1.52.0	chart: 1.52.0 kiali: 1.52.0
Knative	knative	0.4.0	chart: 0.4.0 knative: 0.22.3
Kube OIDC Proxy	kube-oidc-proxy	0.3.1	chart: 0.3.1 kube-oidc-proxy: 0.3.0
Kube Prometheus Stack	kube-prometheus-stack	34.9.3	chart: 34.9.3 prometheus-operator: 0.55.0 prometheus: 2.34.0 prometheus-alertmanager: 0.24.0 grafana: 8.4.5
Kubecost	kubecost	0.26.0	chart: 0.26.0 kubecost: 1.95.0
Kubefed	kubefed	0.9.2	chart: 0.9.2 kubefed: 0.9.2
Kubernetes Dashboard	kubernetes-dashboard	5.1.1	chart: 5.1.1 kubernetes-dashboard: 2.4.0
Kubetunnel	kubetunnel	0.0.13	chart: 0.0.13 kubetunnel: 0.0.13
Logging Operator	logging-operator	3.17.7	chart: 3.17.7 logging-operator: 3.17.7
MinIO Operator	minio-operator	4.4.25	chart: 4.4.25 minio-operator: 4.4.25
NFS Server Provisioner	nfs-server-provisioner	0.6.0	chart: 0.6.0 nfs-server-provisioner: 2.3.0
Nvidia	nvidia	0.4.4	chart: 0.4.4 nvidia-device-plugin: 0.1.4
Grafana (project)	project-grafana-logging	6.28.0	chart: 6.28.0 grafana: 8.4.5
Grafana Loki (project)	project-grafana-loki	0.48.4	chart: 0.48.4 loki: 2.5.0
Prometheus Adapter	prometheus-adapter	2.17.1	chart: 2.17.1 prometheus-adapter: 0.9.1
Reloader	reloader	0.0.110	chart: 0.0.110 reloader: 0.0.110
Thanos	thanos	0.4.6	chart: 0.4.6 thanos: 0.17.1
Traefik	traefik	10.9.1	chart: 10.9.1 traefik: 2.5.6
Traefik ForwardAuth	traefik-forward-auth	0.3.8	chart: 0.3.8 traefik-forward-auth: 3.1.0
Velero	velero	3.2.3	chart: 3.2.3 velero: 1.5.2

Known issues and limitations

The following items are known issues with this release.

Use static credentials to provision an Azure cluster

Only static credentials can be used when provisioning an Azure cluster.

When attaching GKE clusters, create a ResourceQuota to enable log collection

After you attach the GKE cluster, you can choose to deploy a stack of applications for workspace or project log collection. Once you have enabled this stack, create a ResourceQuota which is required for the logging stack to function correctly. You will have to do this manually, because some DKP versions do not properly handle this by default.
Create the following resource to enable log collection:

Execute the following command to get the namespace of your workspace on the management cluster:
CODE
```
kubectl get workspaces
```
And copy the value under WORKSPACE NAMESPACE column for your workspace. This may NOT be identical to the Display Name of the Workspace.
Set the WORKSPACE_NAMESPACE environment variable to the name of the workspace’s namespace:
CODE
```
export WORKSPACE_NAMESPACE=<gkeattached-cluster-namespace>
```

Run the following command on your attached GKE cluster to create the resource:

CODE

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
  name: fluent-bit-critical-pods
  namespace: ${WORKSPACE_NAMESPACE}
spec:
  hard:
    pods: "1G"
  scopeSelector:
    matchExpressions:
    - operator: In
      scopeName: PriorityClass
      values:
      - system-node-critical
EOF

After a few minutes, log collection is available in your GKE cluster.

This workflow only creates a ResourceQuota in the targeted workspace. Repeat these steps if you want to deploy the logging stack to additional workspaces with GKE clusters.

Resolve issues with failed HelmReleases

There is an existing issue with the Flux helm-controller that can cause HelmReleases to get "stuck" with an error message such as Helm upgrade failed: another operation (install/upgrade/rollback) is in progress. This can happen when the helm-controller is restarted while a HelmRelease is upgrading, installing, and so on.

Workaround

To ensure the HelmRelease error was caused by the helm-controller restarting, first try to suspend/resume the HelmRelease:

CODE

kubectl -n <namespace> patch helmrelease <HELMRELEASE_NAME> --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": true}]'
kubectl -n <namespace> patch helmrelease <HELMRELEASE_NAME> --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": false}]'

This might resolve the issue. If not, continue with the following steps:

You should see the HelmRelease attempting to reconcile, and then it either succeeds (with status: 'Release reconciliation succeeded') or it fails with the same error as before.

If the HelmRelease is still in the failed state, it is likely related to the helm-controller restarting. For example, if the 'reloader' HelmRelease is the one that is stuck.

To resolve the issue, follow these steps:

List secrets containing the affected HelmRelease name:

CODE

kubectl get secrets -n ${NAMESPACE} | grep reloader

CODE

kommander-reloader-reloader-token-9qd8b                        kubernetes.io/service-account-token   3      171m
sh.helm.release.v1.kommander-reloader.v1                       helm.sh/release.v1                    1      171m
sh.helm.release.v1.kommander-reloader.v2                       helm.sh/release.v1                    1      117m

In this example, sh.helm.release.v1.kommander-reloader.v2 is the most recent revision.

Find and delete the most recent revision secret. For example sh.helm.release.v1.*.<revision>
CODE
```
kubectl delete secret -n <namespace> <most recent helm revision secret name>
```

Suspend and resume the HelmRelease to trigger a reconciliation:

CODE

kubectl -n <namespace> patch helmrelease <HELMRELEASE_NAME> --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": true}]'
kubectl -n <namespace> patch helmrelease <HELMRELEASE_NAME> --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": false}]'

You should see the HelmRelease is reconciled and eventually the upgrade and install succeeds.

Fluentbit disabled by default for DKP 2.3

Fluentbit is disabled by default in DKP 2.3 due to memory constraints. The amount of admin logs ingested to Loki requires additional disk space to be configured on the grafana-loki-minio Minio Tenant. Enabling admin logs may use around 2GB/day per node. See Configuring-the-Grafana-Loki-Minio-Tenant for more details on how to configure the Minio Tenant.

If Fluentbit is enabled on the management cluster and you would like it to continue to be deployed after the upgrade, you must pass in the --disable-appdeployments {} flag to the dkp upgrade kommander command. Otherwise, Fluentbit is automatically disabled upon upgrade.

Configure the Grafana Loki MinIO Tenant

Additional steps are required to change the default configuration of the MinIO Tenant that is deployed with Grafana Loki, grafana-loki-minio. Using config overrides is not supported.

By default, the grafana-loki-minio MinIO Tenant is configured with 2 pools with 4 servers each, 1 volume per server, for a total of 80GB.

The MinIO usable storage capacity is always less than the actual storage amount.

Use MinIO Erasure code calculator to establish the appropriate configuration for your log storage requirement.

You are only able to expand MinIO storage by adding more MinIO server pools with the correct configuration. Modifying existing server pools does not work as MinIO does not support reducing storage capacity. See this MinIO Operator documentation for details.
This impacts all your AppDeployment objects that reference the grafana-loki Kommander application definition.
The changes introduced by the following procedure are wiped out upon Kommander install and upgrade.

In this example, we modify the grafana-loki-minio MinIO Tenant object in kommander-workspace (namespace: kommander)

Use this script to clone the management git repository from the Management cluster:

CODE

export KUBECONFIG=$KUBECONFIG

PASS=$(kubectl get secrets -nkommander admin-git-credentials -oyaml -o go-template="{{.data.password | base64decode }}")
URL=https://gitea_admin:$PASS@$(kubectl -n kommander get ingress gitea -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'):443/dkp/kommander/git/kommander/kommander

git clone -c http.sslVerify=false $URL repo

Modify repo/services/grafana-loki/0.48.4/minio.yaml by appending a new server pool to .spec.pools field, for example:

CODE

# the following will add a new server pool with 4 servers
# each server is attached with 1 PersistentVolume of 50G
- servers: 4
  volumesPerServer: 1
  volumeClaimTemplate:
    metadata:
      name: grafana-loki-minio
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 50Gi
  resources:
    limits:
      cpu: 750m
      memory: 1Gi
    requests:
      cpu: 250m
      memory: 768Mi
  securityContext:
    runAsUser: 0
    runAsGroup: 0
    runAsNonRoot: false
    fsGroup: 0

Commit the changes to local clone of the git management repository when you are done editing:

CODE

git add services/grafana-loki/0.48.4/minio.yaml
git commit # finish the commit message editing in editor

Ensure that it is safe to apply the change, and then push the change to management git repository:
CODE
```
git push origin main
```

Set your WORKSPACE_NAMESPACE env variable:

CODE

# this is an example for kommander-workspace
export WORKSPACE_NAMESPACE=kommander

Verify that the Tenant is modified as expected, when the grafana-loki kustomizations reconcile:

CODE

# this prints the .status field of the tenant
kubectl get tenants -n kommander grafana-loki-minio -o jsonpath='{ .status }' | jq

Verify that the new StatefulSet is READY:

CODE

kubectl get sts -n $WORKSPACE_NAMESPACE -l v1.min.io/tenant=grafana-loki-minio

NAME                      READY   AGE
grafana-loki-minio-ss-0   4/4     144m
grafana-loki-minio-ss-1   4/4     144m
grafana-loki-minio-ss-2   4/4     15m

Restart all the StatefulSets that back this Tenant:

CODE

kubectl -n $WORKSPACE_NAMESPACE rollout restart sts grafana-loki-minio-ss-0
statefulset.apps/grafana-loki-minio-ss-0 restarted
kubectl -n $WORKSPACE_NAMESPACE rollout restart sts grafana-loki-minio-ss-1
statefulset.apps/grafana-loki-minio-ss-1 restarted
kubectl -n $WORKSPACE_NAMESPACE rollout restart sts grafana-loki-minio-ss-2
statefulset.apps/grafana-loki-minio-ss-2 restarted

Verify that the MinIO Pods that back this Tenant are all online:

CODE

kubectl logs -n $WORKSPACE_NAMESPACE -l v1.min.io/tenant=grafana-loki-minio
...
Verifying if 1 bucket is consistent across drives...
Automatically configured API requests per node based on available memory on the system: 424
All MinIO sub-systems initialized successfully
Waiting for all MinIO IAM sub-system to be initialized.. lock acquired
Status:         12 Online, 0 Offline. 
API: http://minio.kommander.svc.cluster.local 

Console: http://192.168.202.223:9090 http://127.0.0.1:9090   

Documentation: https://docs.min.io
...

FIPS upgrade from 2.2.x to 2.3.0

If upgrading a FIPS cluster, there is a bug in the upgrade of kube-proxy DaemonSet in that it doesn't get automatically upgraded. After completing the cluster upgrade, run the following command to finish upgrading the kube-proxy DaemonSet:

CODE

kubectl set image -n kube-system daemonset.v1.apps/kube-proxy kube-proxy=docker.io/mesosphere/kube-proxy:v1.23.7_fips.0

`Kube-oidc-proxy` not ready after upgrade

If you installed or attached a cluster in 2.1, kube-oidc-proxy is not available after upgrading to 2.3. This application is required to access the Kubernetes API (with kubectl) using SSO. For affected customers, there are issues with the authentication via kubectl.

To make the application available, run the following command on each cluster that was installed, created or attached in 2.1, and is now on DKP version 2.3.0. Replace <namespace> with each cluster’s workspace namespace:

CODE

kubectl -n <namespace> patch appdeployment kube-oidc-proxy --type=json -p '[{"op":"remove","path":"/spec/configOverrides"}]'

Additional resources

For more information about working with native Kubernetes, see the Kubernetes documentation.
For a full list of attributed 3rd party software, see http://d2iq.com/legal/3rd .

Release Summary

Supported Versions

Features and Enhancements

Support for Amazon EKS

Support for GCP

Attach an existing GKE cluster to DKP

Improved Documentation Site

Multiple Availability Zones

Custom Domains and Certificates for Workload (managed and attached) Clusters

Updated Image Bundle Names

Updated Custom Certificate Name

DKP Upgrades

DKP Insights Alert Details

Support for Cluster-scoped Configuration and Deployments

Upgrade vSphere from the CLI

Grafana Loki log retention policy

2.3.0 components and applications

Components

Applications

Known issues and limitations

Use static credentials to provision an Azure cluster

When attaching GKE clusters, create a ResourceQuota to enable log collection

Resolve issues with failed HelmReleases

Workaround

Fluentbit disabled by default for DKP 2.3

Configure the Grafana Loki MinIO Tenant

FIPS upgrade from 2.2.x to 2.3.0

Kube-oidc-proxy not ready after upgrade

Additional resources

`Kube-oidc-proxy` not ready after upgrade