Release Notes 2.3.2

DKP® version 2.3.2 was released on February 14, 2023

You must be a registered user and logged on to the support portal to download this product. New customers must contact their sales representative or sales@d2iq.com before attempting to download or install DKP.

Release Summary

Welcome to D2iQ Kubernetes Platform (DKP) 2.3.2! This release provides fixes to reported issues, integrates changes from previous releases, and maintains compatibility and support for other packages used in DKP.

DKP Fixes and Updates

The following issues are corrected or resolved in this release.

`Kube-oidc-proxy` is not Available After Upgrade

D2IQ-94629

If you installed or attached a cluster in 2.1, kube-oidc-proxy was not available after upgrading to 2.3.x. This prevented authentication via kubectl using SSO.

Failure to Upgrade Azure Clusters

D2IQ-95191

A bug in the Azure CSI driver caused problems during an upgrade by preventing volumes attached to cluster nodes from being detached, in some circumstances, so they were not available for the new nodes created by the upgrade process. This problem would prevent pods on the upgraded cluster from starting.

CAPA Provider for EKS not Working with -worker-iam-instance-profile

D2IQ-95493

When creating an EKS cluster and specifying a specific IAM instance profile to use on worker nodes, the nodes are created with the specified IAM instance profile. However, the CAPA controller currently does not honor this role, resulting in deployment failures. To workaround this issue, follow the instructions on the Grant Cluster Access page to adjust the roles the CAPA controller uses. In some instances, the IAM authenticator was missed as a prerequisite. For more information see the Amazon EKS documentation: Installing aws-iam-authenticator

AKS Cluster Deployment hangs in a Pending State

D2IQ-94072

When deploying an AKS cluster using the DKP UI on an Azure hosted Management cluster, the AKS cluster deployed correctly, but was staying in the Pending state when viewed from the UI.

Improved Documentation for Changing Calico Encapsulation Type

D2IQ-94582

While deploying DKP 2.3 to a pre-provisioned Azure cluster, the documentation for changing the encapsulation type was not correct, which prevented correct configuration of the Calico Overlay network.

Kommander Installations Fail in Certain Scenarios

D2IQ-92981

When deploying DKP Applications to a centos79 pre-provisioned, air-gapped, and FIPS environment, deployment was failing due to the gatekeeper-update-namespace-label pod crashing and looping.

Corrected Upgrade Documentation to Specify Correct Kubernetes version

D2IQ-93793

This upgrade path to use when upgrading to DKP 2.3 failed due to an incorrectly configured flag.

KIB 1.24.x Fails and Cannot Create Ubuntu Images

D2IQ-95429

Creating an Ubuntu 1804 or 2004 image for GCP with KIB 1.24.2 or 1.24.3 failed with an error.

DKP Insights Fixes and Updates

Incorrect CSI-related Polaris Messages for GCP and Azure

D2IQ-92665

DKP Insights was displaying some incorrect CSI-related critical Polaris messages for GCP and Azure clusters, but not AWS.

Download Signature Files

You need to download an appropriate, signed signature file before you run FIPS validation. Verify which version of DKP you are running to ensure you are downloading the manifest that is compliant with the DKP release number on your system. You can use the FIPS validation tool to verify that specific components and services are FIPS-compliant by checking the signatures of the files against a signed signature file, and by checking that services are using the certified algorithms. Select the links in the Manifest URL column of the following table to obtain a valid file:

DKP version 2.3.2

Operating System version	Kubernetes version	containerd version	Manifest URL
CentOS 7.9	v1.23.12	1.14.13	v1.23.12 CentOS 7.9 Manifest
Oracle 7.9	v1.23.12	1.14.13	v1.23.12 OL 7.9 Manifest
RHEL 7.9	v1.23.12	1.14.13	v1.23.12 RHEL 7.9 Manifest
RHEL 8.2	v1.23.12	1.14.13	v1.23.12 RHEL 8.2 Manifest
RHEL 8.4	v1.23.12	1.14.13	v1.23.12 RHEL 8.4 Manifest

Supported Versions

Any DKP cluster you attach using DKP 2.3.2 must be running a Kubernetes version in the following ranges:

Kubernetes Support	Version
DKP Minimum	1.22.0
DKP Maximum	1.23.x
DKP Default	1.23.12
EKS Default	1.22.x
AKS Default	1.23.x
GKE Default	1.22.x-1.23.x

DKP 2.3 comes with support for Kubernetes 1.23, enabling you to benefit from the latest features and security fixes in upstream Kubernetes. This release comes with approximately 47 enhancements. To read more about major features in this release, visit https://kubernetes.io/blog/2021/12/07/kubernetes-1-23-release-announcement/.

2.3.2 Components and Applications

The following are component and application versions for DKP 2.3.2:

Components

Component Name	Version
Cluster API Core (CAPI)	1.1.3-d2iq.5
Cluster API AWS Infrastructure Provider (CAPA)	1.4.1
Cluster API Google Cloud Infrastructure Provider (CAPG)	1.1.0
Cluster API Pre-provisioned Infrastructure Provider (CAPPP)	0.9.5
Cluster API vSphere Infrastructure Provider (CAPV)	1.2.0
Cluster API Azure Infrastructure Provider (CAPZ)	1.3.2
Konvoy Image Builder	1.19.14
containerd	1.4.13
etcd	3.4.13

Applications

Common Application Name	APP ID	Version	Component Versions	Helm Values	DKP Values
Centralized Grafana	centralized-grafana	34.9.3	chart: 34.9.3 prometheus-operator: 0.55.0 grafana: 8.4.5	Link	Link
Centralized Kubecost	centralized-kubecost	0.27.0	chart: 0.27.0 kubecost: 1.96.0	Link	Link
Cert Manager	cert-manager	1.7.1	chart: 1.7.1 cert-manager: 1.7.1	Link	Link
Chartmuseum	chartmuseum	3.9.0	chart: 3.9.0 chartmuseum: 3.9.0	Link	Link
Dex	dex	2.9.19	chart: 2.9.19 dex: 2.31.0	Link	Link
Dex K8s Authenticator	dex-k8s-authenticator	1.2.14	chart: 1.2.14 dex-k8s-authenticator: 1.2.4	Link	Link
DKP Insights Management	dkp-insights-management	0.2.3	chart: 0.2.3 dkp-insights-management: 0.2.3	N/A	Link
External DNS	external-dns	6.5.5	chart: 6.5.5 external-dns: 0.12.0	Link	Link
Fluent Bit	fluent-bit	0.19.24	chart: 0.19.24 fluent-bit: 1.8.15	Link	Link
Gatekeeper	gatekeeper	3.8.2	chart: 3.8.1 gatekeeper: 3.8.1	Link	Link
Gitea	gitea	5.0.9	chart: 5.0.9 gitea: 1.16.8	Link	Link
Grafana Logging	grafana-logging	6.28.0	chart: 6.28.0 grafana: 8.4.5	Link	Link
Grafana Loki	grafana-loki	0.48.5	chart: 0.48.4 loki: 2.5.0	Link	Link
Istio	istio	1.14.1	chart: 1.14.1 istio: 1.14.1	Link	Link
Jaeger	jaeger	2.32.2	chart: 2.32.2 jaeger: 1.34.1	Link	Link
Karma	karma	2.0.1	chart: 2.0.1 karma: 0.70	Link	Link
Kiali	kiali	1.52.0	chart: 1.52.0 kiali: 1.52.0	Link	Link
Knative	knative	0.4.0	chart: 0.4.0 knative: 0.22.3	Link	Link
Flux	kommander-flux	0.31.4	chart: N/A flux: 0.31.4	N/A	N/A
Kube OIDC Proxy	kube-oidc-proxy	0.3.2	chart: 0.3.1 kube-oidc-proxy: 0.3.0	Link	Link
Kube Prometheus Stack	kube-prometheus-stack	34.9.3	chart: 34.9.3 prometheus-operator: 0.55.0 grafana: 8.4.5 prometheus: 2.34.0 prometheus-alertmanager: 0.24.0	Link	Link
Kubecost	kubecost	0.27.0	chart: 0.27.0 kubecost: 1.96.0	Link	Link
Kubefed	kubefed	0.9.2	chart: 0.9.2 kubefed: 0.9.2	Link	Link
Kubernetes Dashboard	kubernetes-dashboard	5.1.1	chart: 5.1.1 kubernetes-dashboard: 2.4.0	Link	Link
Kubetunnel	kubetunnel	0.0.13	chart: 0.0.13 kubetunnel: 0.0.13	N/A	Link
Logging Operator	logging-operator	3.17.8	chart: 3.17.7 logging-operator: 3.17.7 logging-operator-logging: 3.17.7	Link	Link
Metallb	metallb	0.12.3	chart: 0.12.3 metallb: 0.8.1	Link	Link
MinIO Operator	minio-operator	4.4.25	chart: 4.4.25 minio-operator: 4.4.25	Link	Link
NFS Server Provisioner	nfs-server-provisioner	0.6.0	chart: 0.6.0 nfs-server-provisioner: 2.3.0	Link	Link
Nvidia	nvidia	0.4.4	chart: 0.4.4 nvidia-device-plugin: 0.2.0	Link	Link
Grafana (project)	project-grafana-logging	6.28.0	chart: 6.28.0 grafana: 8.4.5	Link	Link
Grafana Loki (project)	project-grafana-loki	0.48.5	chart: 0.48.4 loki: 2.5.0	Link	Link
Prometheus Adapter	prometheus-adapter	2.17.1	chart: 2.17.1 prometheus-adapter: 0.9.1	Link	Link
Reloader	reloader	0.0.110	chart: 0.0.110 reloader: 0.0.110	Link	Link
Thanos	thanos	0.4.7	chart: 0.4.6 thanos: 0.17.1	Link	Link
Traefik	traefik	10.9.3	chart: 10.9.1 traefik: 2.5.6	Link	Link
Traefik ForwardAuth	traefik-forward-auth	0.3.8	chart: 0.3.8 traefik-forward-auth: 3.1.0	Link	Link
Velero	velero	3.2.3	chart: 3.2.3 velero: 1.5.2	Link	Link

Known Issues and Limitations

The following items are known issues with this release.

Nvidia Feature Discovery Error

D2IQ-93676

When creating a new cluster to migrate Kaptain to version 2.1, after creating the cluster, the nvidia-feature-discovery-gpu-feature-discovery is in a CrashLoopBackOff state, with error.

Workaround

Follow these steps

Place the registry details in the override file, together with Nvidia.
Delete the current override and replaced it with the new override you just created in step 1.
Delete the machine to force preprovisioning.
Rename or delete the *.toml files from the import path directory set in config.toml
Restart containerd and GPU/Nvidia feature discovery.
Verify the Node now shows GPU resources.
Repeat these steps for all affected nodes.

Use static credentials to provision an Azure cluster

Only static credentials can be used when provisioning an Azure cluster.

When attaching GKE clusters, create a ResourceQuota to enable log collection

After you attach the GKE cluster, you can choose to deploy a stack of applications for workspace or project log collection. Once you have enabled this stack, create a ResourceQuota which is required for the logging stack to function correctly. You will have to do this manually, because some DKP versions do not properly handle this by default.
Create the following resource to enable log collection:

Execute the following command to get the namespace of your workspace on the management cluster:
CODE
```
kubectl get workspaces
```
And copy the value under WORKSPACE NAMESPACE column for your workspace. This may NOT be identical to the Display Name of the Workspace.
Set the WORKSPACE_NAMESPACE environment variable to the name of the workspace’s namespace:
CODE
```
export WORKSPACE_NAMESPACE=<gkeattached-cluster-namespace>
```

Run the following command on your attached GKE cluster to create the resource:

CODE

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
  name: fluent-bit-critical-pods
  namespace: ${WORKSPACE_NAMESPACE}
spec:
  hard:
    pods: "1G"
  scopeSelector:
    matchExpressions:
    - operator: In
      scopeName: PriorityClass
      values:
      - system-node-critical
EOF

After a few minutes, log collection is available in your GKE cluster.

This workflow only creates a ResourceQuota in the targeted workspace. Repeat these steps if you want to deploy the logging stack to additional workspaces with GKE clusters.

Resolve issues with failed HelmReleases

There is an existing issue with the Flux helm-controller that can cause HelmReleases to get "stuck" with an error message such as Helm upgrade failed: another operation (install/upgrade/rollback) is in progress. This can happen when the helm-controller is restarted while a HelmRelease is upgrading, installing, and so on.

Workaround

To ensure the HelmRelease error was caused by the helm-controller restarting, first try to suspend/resume the HelmRelease:

CODE

kubectl -n <namespace> patch helmrelease <HELMRELEASE_NAME> --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": true}]'
kubectl -n <namespace> patch helmrelease <HELMRELEASE_NAME> --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": false}]'

This might resolve the issue. If not, continue with the following steps:

You should see the HelmRelease attempting to reconcile, and then it either succeeds (with status: 'Release reconciliation succeeded') or it fails with the same error as before.

If the HelmRelease is still in the failed state, it is likely related to the helm-controller restarting. For example, if the 'reloader' HelmRelease is the one that is stuck.

To resolve the issue, follow these steps:

List secrets containing the affected HelmRelease name:

CODE

kubectl get secrets -n ${NAMESPACE} | grep reloader

CODE

kommander-reloader-reloader-token-9qd8b                        kubernetes.io/service-account-token   3      171m
sh.helm.release.v1.kommander-reloader.v1                       helm.sh/release.v1                    1      171m
sh.helm.release.v1.kommander-reloader.v2                       helm.sh/release.v1                    1      117m

In this example, sh.helm.release.v1.kommander-reloader.v2 is the most recent revision.

Find and delete the most recent revision secret. For example sh.helm.release.v1.*.<revision>
CODE
```
kubectl delete secret -n <namespace> <most recent helm revision secret name>
```

Suspend and resume the HelmRelease to trigger a reconciliation:

CODE

kubectl -n <namespace> patch helmrelease <HELMRELEASE_NAME> --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": true}]'
kubectl -n <namespace> patch helmrelease <HELMRELEASE_NAME> --type='json' -p='[{"op": "replace", "path": "/spec/suspend", "value": false}]'

You should see the HelmRelease is reconciled and eventually the upgrade and install succeeds.

Fluentbit disabled by default for DKP 2.3

Fluentbit is disabled by default in DKP 2.3 due to memory constraints. The amount of admin logs ingested to Loki requires additional disk space to be configured on the grafana-loki-minio Minio Tenant. Enabling admin logs may use around 2GB/day per node. See Configuring-the-Grafana-Loki-Minio-Tenant for more details on how to configure the Minio Tenant.

If Fluentbit is enabled on the management cluster and you would like it to continue to be deployed after the upgrade, you must pass in the --disable-appdeployments {} flag to the dkp upgrade kommander command. Otherwise, Fluentbit is automatically disabled upon upgrade.

Configure the Grafana Loki MinIO Tenant

Additional steps are required to change the default configuration of the MinIO Tenant that is deployed with Grafana Loki, grafana-loki-minio. Using config overrides is not supported.

By default, the grafana-loki-minio MinIO Tenant is configured with 2 pools with 4 servers each, 1 volume per server, for a total of 80GB.

The MinIO usable storage capacity is always less than the actual storage amount.

Use MinIO Erasure code calculator to establish the appropriate configuration for your log storage requirement.

You are only able to expand MinIO storage by adding more MinIO server pools with the correct configuration. Modifying existing server pools does not work as MinIO does not support reducing storage capacity. See this MinIO Operator documentation for details.
This impacts all your AppDeployment objects that reference the grafana-loki Kommander application definition.
The changes introduced by the following procedure are wiped out upon Kommander install and upgrade.

In this example, we modify the grafana-loki-minio MinIO Tenant object in kommander-workspace (namespace: kommander)

Use this script to clone the management git repository from the Management cluster:

CODE

export KUBECONFIG=$KUBECONFIG

PASS=$(kubectl get secrets -nkommander admin-git-credentials -oyaml -o go-template="{{.data.password | base64decode }}")
URL=https://gitea_admin:$PASS@$(kubectl -n kommander get ingress gitea -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'):443/dkp/kommander/git/kommander/kommander

git clone -c http.sslVerify=false $URL repo

Modify repo/services/grafana-loki/0.48.4/minio.yaml by appending a new server pool to .spec.pools field, for example:

CODE

# the following will add a new server pool with 4 servers
# each server is attached with 1 PersistentVolume of 50G
- servers: 4
  volumesPerServer: 1
  volumeClaimTemplate:
    metadata:
      name: grafana-loki-minio
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 50Gi
  resources:
    limits:
      cpu: 750m
      memory: 1Gi
    requests:
      cpu: 250m
      memory: 768Mi
  securityContext:
    runAsUser: 0
    runAsGroup: 0
    runAsNonRoot: false
    fsGroup: 0

Commit the changes to local clone of the git management repository when you are done editing:

CODE

git add services/grafana-loki/0.48.4/minio.yaml
git commit # finish the commit message editing in editor

Ensure that it is safe to apply the change, and then push the change to management git repository:
CODE
```
git push origin main
```

Set your WORKSPACE_NAMESPACE env variable:

CODE

# this is an example for kommander-workspace
export WORKSPACE_NAMESPACE=kommander

Verify that the Tenant is modified as expected, when the grafana-loki kustomizations reconcile:

CODE

# this prints the .status field of the tenant
kubectl get tenants -n kommander grafana-loki-minio -o jsonpath='{ .status }' | jq

Verify that the new StatefulSet is READY:

CODE

kubectl get sts -n $WORKSPACE_NAMESPACE -l v1.min.io/tenant=grafana-loki-minio

NAME                      READY   AGE
grafana-loki-minio-ss-0   4/4     144m
grafana-loki-minio-ss-1   4/4     144m
grafana-loki-minio-ss-2   4/4     15m

Restart all the StatefulSets that back this Tenant:

CODE

kubectl -n $WORKSPACE_NAMESPACE rollout restart sts grafana-loki-minio-ss-0
statefulset.apps/grafana-loki-minio-ss-0 restarted
kubectl -n $WORKSPACE_NAMESPACE rollout restart sts grafana-loki-minio-ss-1
statefulset.apps/grafana-loki-minio-ss-1 restarted
kubectl -n $WORKSPACE_NAMESPACE rollout restart sts grafana-loki-minio-ss-2
statefulset.apps/grafana-loki-minio-ss-2 restarted

Verify that the MinIO Pods that back this Tenant are all online:

CODE

kubectl logs -n $WORKSPACE_NAMESPACE -l v1.min.io/tenant=grafana-loki-minio
...
Verifying if 1 bucket is consistent across drives...
Automatically configured API requests per node based on available memory on the system: 424
All MinIO sub-systems initialized successfully
Waiting for all MinIO IAM sub-system to be initialized.. lock acquired
Status:         12 Online, 0 Offline. 
API: http://minio.kommander.svc.cluster.local 

Console: http://192.168.202.223:9090 http://127.0.0.1:9090   

Documentation: https://docs.min.io
...

FIPS upgrade from 2.2.x to 2.3.0

If upgrading a FIPS cluster, there is a bug in the upgrade of kube-proxy DaemonSet in that it does not get automatically upgraded. After completing the cluster upgrade, run the following command to finish upgrading the kube-proxy DaemonSet:

CODE

kubectl set image -n kube-system daemonset.v1.apps/kube-proxy kube-proxy=docker.io/mesosphere/kube-proxy:v1.23.12_fips.0

Additional resources

For more information about working with native Kubernetes, see the Kubernetes documentation.
For a full list of attributed 3rd party software, see http://d2iq.com/legal/3rd.

Release Summary

DKP Fixes and Updates

Kube-oidc-proxy is not Available After Upgrade

Failure to Upgrade Azure Clusters

CAPA Provider for EKS not Working with -worker-iam-instance-profile

AKS Cluster Deployment hangs in a Pending State

Improved Documentation for Changing Calico Encapsulation Type

Kommander Installations Fail in Certain Scenarios

Corrected Upgrade Documentation to Specify Correct Kubernetes version

KIB 1.24.x Fails and Cannot Create Ubuntu Images

DKP Insights Fixes and Updates

Incorrect CSI-related Polaris Messages for GCP and Azure

Download Signature Files

DKP version 2.3.2

Supported Versions

2.3.2 Components and Applications

Components

Applications

Known Issues and Limitations

Nvidia Feature Discovery Error

Workaround

Use static credentials to provision an Azure cluster

When attaching GKE clusters, create a ResourceQuota to enable log collection

Resolve issues with failed HelmReleases

Workaround

Fluentbit disabled by default for DKP 2.3

Configure the Grafana Loki MinIO Tenant

FIPS upgrade from 2.2.x to 2.3.0

Additional resources

`Kube-oidc-proxy` is not Available After Upgrade