# Setting up Kubernetes Addons

{% hint style="info" %}
See the GitHub repo for this post in [kengz/k0s-cluster](https://github.com/kengz/k0s-cluster).
{% endhint %}

After creating a Kubernetes cluster, we would want to add a set of standard cluster addons using [Helm](https://helm.sh) for DevOps:

* [cert-manager](https://cert-manager.io/docs/installation/helm/): certificate management
* [cluster-autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler): to dynamically autoscale cluster by adding or reducing nodes.
* [metrics-server](https://github.com/kubernetes-sigs/metrics-server/tree/master/charts/metrics-server): for monitoring and and HPA (HorizontalPodAutoscaler) to work.
* [kubernetes-dashboard](https://github.com/kubernetes/dashboard#access): basic cluster monitoring (if [Lens](https://k8slens.dev/) is not available)
* [Loki (scalable)](https://github.com/grafana/loki/tree/main/production/helm/loki): to aggregate and index all logs in the cluster, with retention policy; the logs are searchable in Grafana. Additionally:
  * [promtail](https://grafana.com/docs/loki/latest/clients/promtail/) to aggregate logs
  * Note: [Elasticsearch charts](https://github.com/elastic/helm-charts) (hence ELK) have been deprecated in favor of their licensed ECK; plus Loki is much easier to run and maintain
* [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack): for cluster monitoring with many useful preconfigured cluster Prometheus metrics in Grafana dashboards. Additionally:
  * [prometheus-adapter](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-adapter) for custom metrics API, e.g. for HPA to scale using custom-defined metrics.
  * [prometheus-pushgateway](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-pushgateway) to push application metrics
  * [prometheus-blackbox-exporter](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-blackbox-exporter) to probe endpoints for uptime monitoring

Additionally, install [Lens](https://k8slens.dev/) for GUI monitoring and access to the cluster. Get a free license to use.

## Installations

### cert-manager

[Helm chart here](https://cert-manager.io/docs/installation/helm/).

```bash
# cert manager
helm repo add jetstack https://charts.jetstack.io
helm upgrade -i cert-manager jetstack/cert-manager -n cert-manager --create-namespace --version 'v1.12.1' --set installCRDs=true
```

### cluster-autoscaler

[Helm chart here](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler). This component has some [custom settings](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) dependending on which cloud provider is used, but the main gist is:

```bash
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm upgrade -i cluster-autoscaler autoscaler/cluster-autoscaler -n kube-system
```

### metrics-server

[Helm chart here](https://github.com/kubernetes-sigs/metrics-server/tree/master/charts/metrics-server). Some kubernetes providers have this preinstalled, but the gist is:

```bash
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm upgrade -i metrics-server metrics-server/metrics-server -n kube-system --version '3.10.0'
```

### kubernetes-dashboard

[Manifest here](https://github.com/kubernetes/dashboard#access) - this installs more reliably than its Helm chart. First, prepare manifest to create a service account for dashboard access:

{% code title="./cluster/dashboard-admin-user.yaml" %}

```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: admin-user
    namespace: kubernetes-dashboard
```

{% endcode %}

Then run:

```bash
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
kubectl apply -f ./cluster/dashboard-admin-user.yaml
```

> Preferably, install [Lens](https://k8slens.dev/) for GUI monitoring and access to the cluster (it connects using kubeconfig). Get a free license to use.

### Loki

[Helm chart here](https://github.com/grafana/loki/tree/main/production/helm/loki). [Loki is a Grafana](https://github.com/grafana/loki) project for log aggregation - it is scalable (to TBs), cheap, and simple to maintain. The logs will show up in Grafana dashboards - which is a must-have for Kubernetes clusters.

{% hint style="info" %}
Elasticsearch has [archived their Helm charts](https://github.com/elastic/helm-charts) and [moved to a licensed model (ECK)](https://github.com/elastic/helm-charts/issues/1731), I can no longer recommend it. Also, it is quite bloated and fragile just for log aggregation.
{% endhint %}

First, prepare the Helm override file to:

* [configure storage](https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml#L253) for [scalable mode](https://grafana.com/docs/loki/latest/fundamentals/architecture/deployment-modes/#simple-scalable-deployment-mode) (s3/GCP/Azure etc., or use minio to try first)
* [configure retention](https://grafana.com/docs/loki/latest/operations/storage/retention/)

{% code title="./cluster/loki-values.yaml" %}

```yaml
# for loki scalable mode https://grafana.com/docs/loki/latest/fundamentals/architecture/deployment-modes/#simple-scalable-deployment-mode
# need to configure storage https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml#L253
# or try with minio first
minio:
  enabled: true
loki:
  storage:
    type: s3
    s3:
      s3: null
      endpoint: null
      region: null
      secretAccessKey: null
      accessKeyId: null
      s3ForcePathStyle: false
      insecure: false
  # configure retention https://grafana.com/docs/loki/latest/operations/storage/retention/
  # fields: https://grafana.com/docs/loki/latest/configuration/
  compactor:
    shared_store: filesystem
    retention_enabled: true
  limits_config:
    retention_period: 744h
  auth_enabled: false
```

{% endcode %}

We'll also install [promtail](https://grafana.com/docs/loki/latest/clients/promtail/) as log aggregating agent.

Then run:

```bash
helm repo add grafana https://grafana.github.io/helm-charts
helm upgrade -i loki grafana/loki -n logging --create-namespace --version '5.6.4' -f ./cluster/loki-values.yaml
helm upgrade -i promtail grafana/promtail -n logging --version '6.11.3'
```

{% hint style="info" %}
Grafana dashboard is installed later in kube-prometheus-stack; we will add Loki as a data source to it for log search on Grafana.
{% endhint %}

### kube-prometheus-stack

[Helm chart here](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack). This includes kube-state-metrics, node-exporter, and Grafana: they gather kubernetes metrics from all the cluster nodes, and preconfigure many useful cluster metric dashboards.

Prepare the Helm override file to set/change default password, and add Loki as data source for log search in Grafana:

{% code title="./cluster/prometheus-values.yaml" %}

```yaml
grafana:
  adminPassword: prom-operator
  # configure anonymous view-access
  grafana.ini:
    auth.anonymous:
      enabled: true
      org_name: Main Org.
      org_role: Viewer
    # auth:
    #   disable_login_form: true

  persistence:
    enabled: true

  ## Configure additional grafana datasources (passed through tpl)
  ## ref: http://docs.grafana.org/administration/provisioning/#datasources
  additionalDataSources:
    - name: Loki
      type: loki
      access: proxy
      url: http://loki-gateway.logging.svc.cluster.local
      version: 1
      isDefault: false
```

{% endcode %}

We will also install 3 additional Helm charts. The first is [prometheus-adapter](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-adapter) for custom metrics API, e.g. for HPA to scale using custom-defined metrics.

The second is [Prometheus Pushgateway](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-pushgateway), e.g. to push application metrics. Specify Helm override file to configure ServiceMonitor with matching label so it is scraped by kube-prometheus-stack:

{% code title="./cluster/pushgateway-values.yaml" %}

```yaml
serviceMonitor:
  enabled: true
  additionalLabels:
    release: prometheus
```

{% endcode %}

The third is [Blackbox Exporter](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-blackbox-exporter) to probe endpoints for uptime monitoring. Specify the Helm override file to configure targets and ServiceMonitor with matching label so it is scraped by kube-prometheus-stack:

{% code title="./cluster/blackbox-values.yaml" %}

```yaml
config:
  modules:
    http_2xx:
      prober: http
      timeout: 5s
      http:
        valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
        follow_redirects: true
        preferred_ip_protocol: "ip4"
        valid_status_codes:
          - 200

serviceMonitor:
  enabled: true
  defaults:
    labels:
      # match kube-prometheus-stack scrape config
      release: prometheus
    interval: 30s
    scrapeTimeout: 30s
    module: http_2xx
  scheme: http

  targets: # lowercase only
    - name: github
      url: http://github.com/status
    - name: gitlab
      url: https://status.gitlab.com
```

{% endcode %}

Now, install all of these by running:

```bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm upgrade -i prometheus prometheus-community/kube-prometheus-stack -n monitoring --create-namespace --version '46.8.0' -f ./cluster/prometheus-values.yaml
# adapter for k8s HPA custom metrics
helm upgrade -i prom-adapter prometheus-community/prometheus-adapter -n monitoring --version '4.2.0'
# pushgateway for app metrics
helm upgrade -i prom-pushgateway prometheus-community/prometheus-pushgateway -n monitoring --version '2.2.0' -f ./cluster/pushgateway-values.yaml
# blackbox exporter for uptime monitoring
helm install blackbox prometheus-community/prometheus-blackbox-exporter -n monitoring --version '7.10.0' -f ./cluster/blackbox-values.yaml
```

> Grafana Dashboards can also be provisioned via ConfigMaps. See the repo linked above for more examples as the ConfigMap file is large.

## Accessing Dashboards

After installing the addons, access all the dashboards as follows:

* [Lens](https://k8slens.dev)
  * just open the app, it will use `~/.kube/config` to connect

<figure><img src="https://3782595534-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M5KBcEvOMRAQtKb5jHb%2Fuploads%2FSJj3il0yhDfgtiMUt29n%2FScreenshot%202023-06-12%20at%2011.04.49%20PM.png?alt=media&#x26;token=9eec799b-e6fa-4940-96fb-413e87e38b09" alt=""><figcaption><p>Lens dashboard</p></figcaption></figure>

* [Kubernetes Dashboard](https://github.com/kubernetes/dashboard#access)
  * get token: `kubectl -n kubernetes-dashboard create token admin-user`
  * run `kubectl proxy` and visit <http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/>

<figure><img src="https://3782595534-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M5KBcEvOMRAQtKb5jHb%2Fuploads%2FwI2uuXeZIkZPpSJUDSLb%2FScreenshot%202023-06-12%20at%2011.03.17%20PM.png?alt=media&#x26;token=c51b6e21-23ea-48ac-94d5-72387f8e694b" alt=""><figcaption><p>Kubernetes dashboard</p></figcaption></figure>

* [Grafana](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) for cluster and logging monitoring
  * data sources include kube-state-metrics, node-exporter, prometheus, and custom-added loki for logs
  * run `kubectl port-forward -n monitoring svc/prometheus-grafana 6060:80` and visit <http://localhost:6060> to find the preconfigured dashboards
  * (one-time) [import](https://grafana.com/docs/grafana/latest/dashboards/manage-dashboards/#import-a-dashboard) this [Loki Kubernetes Logs](https://grafana.com/grafana/dashboards/15141-kubernetes-service-logs/) and this [Blackbox exporter](https://grafana.com/grafana/dashboards/7587-prometheus-blackbox-exporter/) dashboards

<figure><img src="https://3782595534-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M5KBcEvOMRAQtKb5jHb%2Fuploads%2F9Kk5eY77F65k2J9DkmvI%2FScreenshot%202023-06-12%20at%2010.59.33%20PM.png?alt=media&#x26;token=ed8869dc-976f-40ce-9b23-7d1d8b5f5e25" alt=""><figcaption><p>One of the many Kubernetes cluster metrics dashboards</p></figcaption></figure>

<figure><img src="https://3782595534-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M5KBcEvOMRAQtKb5jHb%2Fuploads%2F2wFWUPtXvjrmtLEDDKn6%2FScreenshot%202023-06-12%20at%2010.49.54%20PM.png?alt=media&#x26;token=d729cd47-0cf0-4003-83eb-5126de90d7f7" alt=""><figcaption><p>Loki Kubernetes Logs dashboard</p></figcaption></figure>

<figure><img src="https://3782595534-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M5KBcEvOMRAQtKb5jHb%2Fuploads%2FEOD08c9HNN5ezRFy47Vm%2FScreenshot%202023-06-12%20at%2010.51.01%20PM.png?alt=media&#x26;token=63865a7e-9eb5-43e0-b3a9-622614092122" alt=""><figcaption><p>Grafana</p></figcaption></figure>

* [Prometheus](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) for cluster monitoring
  * run `kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090` and visit <http://localhost:9090>

<figure><img src="https://3782595534-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M5KBcEvOMRAQtKb5jHb%2Fuploads%2FDv5dVmD0Ua7QEjbFmZZ1%2FScreenshot%202023-06-12%20at%2011.01.47%20PM.png?alt=media&#x26;token=5a659680-373e-4c83-99b9-c8c82ddf2f21" alt=""><figcaption><p>Prometheus targets</p></figcaption></figure>

{% hint style="info" %}
See the GitHub repo for this post in [kengz/k0s-cluster](https://github.com/kengz/k0s-cluster).
{% endhint %}
