Cluster API Provider RKE2

GitHub


What is Cluster API Provider RKE2

The Cluster API brings declarative, Kubernetes-style APIs to cluster creation, configuration and management.

Cluster API Provider RKE2 (CAPRKE2) is a combination of 2 provider types, a Cluster API Control Plane Provider for provisioning Kubernetes control plane nodes and a Cluster API Bootstrap Provider for bootstrapping Kubernetes on a machine where RKE2 is used as the Kubernetes distro.


Getting Started

Follow our getting started guide to start creating RKE2 clusters with CAPI.

Developer Guide

Check our developer guide for instructions on how to setup your dev environment in order to contribute to this project.

Get in contact

You can get in contact with us via the #capbr channel on the Rancher Users Slack.

User guide

This section contains a getting started guide to help new users utilise CAPRKE2.

Getting Started

Cluster API Provider RKE2 is compliant with the clusterctl contract, which means that clusterctl simplifies its deployment to the CAPI Management Cluster. In this Getting Started guide, we will be using the RKE2 Provider with the docker provider (also called CAPD).

Prerequisites

  • clusterctl to handle the lifecycle of a Cluster API management cluster
  • kubectl to apply the workload cluster manifests that clusterctl generates
  • kind and docker to create a local Cluster API management cluster

Management Cluster

In order to use this provider, you need to have a management cluster available to you and have your current KUBECONFIG context set to talk to that cluster. If you do not have a cluster available to you, you can create a kind cluster. These are the steps needed to achieve that:

  1. Ensure kind is installed.

  2. Create a special kind configuration file if you intend to use the Docker infrastructure provider:

    cat > kind-cluster-with-extramounts.yaml <<EOF
    kind: Cluster
    apiVersion: kind.x-k8s.io/v1alpha4
    name: capi-test
    nodes:
    - role: control-plane
      extraMounts:
        - hostPath: /var/run/docker.sock
          containerPath: /var/run/docker.sock
    EOF
    
  3. Run the following command to create a local kind cluster:

    kind create cluster --config kind-cluster-with-extramounts.yaml
    
  4. Check your newly created kind cluster :

    kubectl cluster-info
    

    and get a similar result to this:

    Kubernetes control plane is running at https://127.0.0.1:40819
    CoreDNS is running at https://127.0.0.1:40819/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
    
    To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
    

Setting up clusterctl

CAPI >= v1.6.0

No additional steps are required and you can install the RKE2 provider with clusterctl directly:

clusterctl init --core cluster-api:v1.9.5 --bootstrap rke2:v0.13.0 --control-plane rke2:v0.13.0 --infrastructure docker:v1.9.5

Next, you can proceed to creating a workload cluster.

CAPI < v1.6.0

With CAPI & clusterctl versions less than v1.6.0 you need a specific configuration. To do this create a file called clusterctl.yaml in the $HOME/.cluster-api folder with the following content (substitute ${VERSION} with a valid semver specification - e.g. v0.5.0 - from releases):

providers:
  - name: "rke2"
    url: "https://github.com/rancher/cluster-api-provider-rke2/releases/${VERSION}/bootstrap-components.yaml"
    type: "BootstrapProvider"
  - name: "rke2"
    url: "https://github.com/rancher/cluster-api-provider-rke2/releases/${VERSION}/control-plane-components.yaml"
    type: "ControlPlaneProvider"

This configuration tells clusterctl where to look for provider manifests in order to deploy provider components in the management cluster.

The next step is to run the clusterctl init command:

clusterctl init --bootstrap rke2 --control-plane rke2

This should output something similar to the following:

Fetching providers
Installing cert-manager Version="v1.10.1"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v1.3.3" TargetNamespace="capi-system"
Installing Provider="bootstrap-rke2" Version="v0.1.0-alpha.1" TargetNamespace="rke2-bootstrap-system"
Installing Provider="control-plane-rke2" Version="v0.1.0-alpha.1" TargetNamespace="rke2-control-plane-system"

Your management cluster has been initialized successfully!

You can now create your first workload cluster by running the following:

  clusterctl generate cluster [name] --kubernetes-version [version] | kubectl apply -f -

Create a workload cluster

There are some sample cluster templates available under the examples/templates folder. This section assumes you are using CAPI v1.6.0 or higher.

For this Getting Started section, we will be using the docker samples available under examples/templates/docker/ folder. This folder contains a YAML template file called cluster-template.yaml which contains environment variable placeholders which can be substituted using the envsubst tool. We will use clusterctl to generate the manifests from these template files. Set the following environment variables:

  • NAMESPACE
  • CLUSTER_NAME
  • CONTROL_PLANE_MACHINE_COUNT
  • WORKER_MACHINE_COUNT
  • KIND_IMAGE_VERSION
  • RKE2_VERSION

for example:

export NAMESPACE=example
export CLUSTER_NAME=capd-rke2-test
export CONTROL_PLANE_MACHINE_COUNT=3
export WORKER_MACHINE_COUNT=2
export KIND_IMAGE_VERSION=v1.31.4
export RKE2_VERSION=v1.31.4+rke2r1

The next step is to substitue the values in the YAML using the following commands:

cd examples/docker/
cat cluster-template.yaml | clusterctl generate yaml > rke2-docker-example.yaml

At this moment, you can take some time to study the resulting YAML, then you can apply it to the management cluster:

kubectl apply -f rke2-docker-example.yaml

and see the following output:

namespace/example created
cluster.cluster.x-k8s.io/capd-rke2-test created
dockercluster.infrastructure.cluster.x-k8s.io/capd-rke2-test created
rke2controlplane.controlplane.cluster.x-k8s.io/capd-rke2-test-control-plane created
dockermachinetemplate.infrastructure.cluster.x-k8s.io/controlplane created
machinedeployment.cluster.x-k8s.io/worker-md-0 created
dockermachinetemplate.infrastructure.cluster.x-k8s.io/worker created
rke2configtemplate.bootstrap.cluster.x-k8s.io/capd-rke2-test-agent created
configmap/capd-rke2-test-lb-config created

Checking the workload cluster

After waiting several minutes, you can check the state of CAPI machines, by running the following command:

kubectl get machine -n example

and you should see output similar to the following:

NAME                                 CLUSTER          NODENAME                                 PROVIDERID                                          PHASE     AGE    VERSION
capd-rke2-test-control-plane-9kw26   capd-rke2-test   capd-rke2-test-control-plane-9kw26       docker:////capd-rke2-test-control-plane-9kw26       Running   21m    v1.31.4+rke2r1
capd-rke2-test-control-plane-pznp8   capd-rke2-test   capd-rke2-test-control-plane-pznp8       docker:////capd-rke2-test-control-plane-pznp8       Running   8m5s   v1.31.4+rke2r1
capd-rke2-test-control-plane-rwzgk   capd-rke2-test   capd-rke2-test-control-plane-rwzgk       docker:////capd-rke2-test-control-plane-rwzgk       Running   17m    v1.31.4+rke2r1
worker-md-0-hm765-hlzgr              capd-rke2-test   capd-rke2-test-worker-md-0-hm765-hlzgr   docker:////capd-rke2-test-worker-md-0-hm765-hlzgr   Running   18m    v1.31.4+rke2r1
worker-md-0-hm765-w6h5j              capd-rke2-test   capd-rke2-test-worker-md-0-hm765-w6h5j   docker:////capd-rke2-test-worker-md-0-hm765-w6h5j   Running   18m    v1.31.4+rke2r1

Accessing the workload cluster

Once cluster is fully provisioned, you can check its status with:

kubectl get cluster -n example

and see an output similar to this:

NAMESPACE   NAME             CLUSTERCLASS   PHASE         AGE   VERSION
example     capd-rke2-test                  Provisioned   22m

You can also get an “at glance” view of the cluster and its resources by running:

clusterctl describe cluster capd-rke2-test -n example

This should output similar to this:

NAME                                                            READY  SEVERITY  REASON  SINCE  MESSAGE
Cluster/capd-rke2-test                                          True                     5m8s
├─ClusterInfrastructure - DockerCluster/capd-rke2-test          True                     22m
├─ControlPlane - RKE2ControlPlane/capd-rke2-test-control-plane  True                     5m8s
│ └─3 Machines...                                               True                     20m    See capd-rke2-test-control-plane-9kw26, capd-rke2-test-control-plane-pznp8, ...
└─Workers
  └─MachineDeployment/worker-md-0                               True                     11m
    └─2 Machines...                                             True                     15m    See worker-md-0-hm765-hlzgr, worker-md-0-hm765-w6h5j

🎉 CONGRATULATIONS! 🎉 You created your first RKE2 cluster with CAPD as an infrastructure provider.

Using ClusterClass for cluster creation

This provider supports using ClusterClass, a Cluster API feature that implements an extra level of abstraction on top of the existing Cluster API functionality. The ClusterClass object is used to define a collection of template resources (control plane and machine deployment) which are used to generate one or more clusters of the same flavor.

If you are interested in leveraging this functionality, you can refer to the examples here:

As with other sample templates, you will need to set a number environment variables:

  • CLUSTER_NAME
  • CONTROL_PLANE_MACHINE_COUNT
  • WORKER_MACHINE_COUNT
  • KUBERNETES_VERSION
  • KIND_IP

for example:

export CLUSTER_NAME=capd-rke2-clusterclass
export CONTROL_PLANE_MACHINE_COUNT=3
export WORKER_MACHINE_COUNT=2
export KUBERNETES_VERSION=v1.30.3
export KIND_IP=192.168.20.20

Remember that, since we are using Kind, the value of KIND_IP must be an IP address in the range of the kind network. You can check the range Docker assigns to this network by inspecting it:

docker network inspect kind

The next step is to substitue the values in the YAML using the following commands:

cat clusterclass-quick-start.yaml | clusterctl generate yaml > clusterclass-example.yaml

At this moment, you can take some time to study the resulting YAML, then you can apply it to the management cluster:

kubectl apply -f clusterclass-example.yaml

This will create a new ClusterClass template that can be used to provision one or multiple workload clusters of the same flavor. To do so, you can follow the same procedure and substitute the values in the YAML for the cluster definition:

cat rke2-sample.yaml | clusterctl generate yaml > rke2-clusterclass-example.yaml

And then apply the resulting YAML file to create a cluster from the existing ClusterClass.

kubectl apply -f rke2-clusterclass-example.yaml

Known Issues

When using CAPD < v1.6.0 unmodified, Cluster creation is stuck after first node and API is not reachable

If you use docker as your infrastructure provider without any modification, Cluster creation will stall after provisioning the first node, and the API will not be available using the LB address. This is caused by Load Balancer configuration used in CAPD which is not compatible with RKE2. Therefore, it is necessary to use our own fork of v1.3.3 by using a specific clusterctl configuration.

Topics

This section contains more detailed information about the features that CAPRKE2 offers and how to use them.

Air-Gapped Cluster Deployment

Introduction

The default way this provider uses to deploy RKE2 is by using the online installation method. This methdod needs access to Rancher servers and Docker.io registry for downloading scripts, RKE2 packages and container images neessary to the installation of RKE2.

Some users might prefer using Air-Gapped installation for multiple possible reasons like deployment on particularly secure environments, sporadic access issues (like Deployment to Edge Locations) or Bandwidth preservation.

RKE2 supports Air-Gapped installation using :

  • 2 methods for node preparation: Tarball on the node, Container Image Registry

  • 2 methods for actual RKE2 installation after the node is prepared: Manual deployment, and Using install.sh from https://get.rke2.io.

Methods supported by CAPRKE2 (Cluster API Provider RKE2)

In choosing between the RKE2 Air-Gapped cluster creation modes above, CAPRKE2 has chosen the best tradeoff in terms of simplicity, usability and limitation of dependencies.

Node preparation

The method that is supported by CAPRKE2 is the Tarball on the node using custom images. The reasons behind this choice include:

  • No dependency on the environments' network infrastructure and Image Registry, and the registry approach does not exempt from needing to use a custom image anyway.

  • CAPI's philosophy is to accept custom-defined base images for infrastructure providers, which makes it easy to build the RKE2 pre-requisites (for a specific RKE2 version) into a custom image to be used for all deployments.

RKE2 deployment

The method that is supported by CAPRKE2 for RKE2 deployment is by using the install.sh approach, described here. This approach is used because it automates a number of tasks needed for RKE2 to be deployed, like creating file hierarchy, unpacking Tarball, and creating systemd service units.

Since these tasks might change in the future, we prefer to rely on the upstream script from RKE2, available in the latest valid version at: https://get.rke2.io .

Pre-requisites on base image

Considering the above tradeoffs, base images used for Air-Gapped need to comply to some pre-requisites in order to work with CAPRKE2. This sections list these pre-requisites:

  • Support and presence of cloud-init (ignition bootstrapping is also on the roadmap)

  • Presence of systemd (because RKE2's installation relies on systemd to start RKE2)

  • Presence of the folders /opt and /opt/rke2-artifacts with the following files inside these folders:

    • install.sh in /opt (this file has the content of the script available at https://get.rke2.io ). One way to create it at build time is by using curl -sfL https://get.rke2.io > /opt/install.sh using a linux user with write permissions to the /opt folder.

    • rke2-images.linux-amd64.tar.zst , rke2.linux-amd64.tar.gz and sha256sum-amd64.txt in the /opt/rke2-artifacts folder, these files can be downloaded for a specific version of RKE2 on its release page, for instance, this page : Release v1.23.16+rke2r1 · rancher/rke2 · GitHub for version v1.23.16+rke2r1 . The files can be found under the Assets sections of the page.

  • Previous pre-requisites should be built into an machine image, for instance, for instance a container image for CAPD or an AMI for AWS EC2. Each Infrastructure provider has its own way of defining machine images.

Configuration of CAPRKE2 for Air-Gapped use

In order to deploy RKE2 Clusters in Air-Gapped mode using CAPRKE2, you need to set the fields spec.agentConfig.airGapped for the RKE2ControlPlane object and spec.template.spec.agentConfig.airGapped for RKE2ConfigTemplate object to true.

You can check a reference implementation for CAPD here including configuration for CAPD custom image.

Node Registration Methods

The provider supports multiple methods for registering a new node into the cluster.

Usage

The method to use is specified on the RKEControlPlane within the spec. If no method is supplied then the default method of internal-first will be used.

You cannot change the registration method after creation.

An example of using a different method:

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: RKE2ControlPlane
metadata:
  name: test1-control-plane
  namespace: default
spec:
  agentConfig:
    version: v1.26.4+rke2r1
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: DockerMachineTemplate
    name: controlplane
  nodeDrainTimeout: 2m
  replicas: 3
  serverConfig:
    cni: calico
  registrationMethod: "address"
  registrationAddress: "172.19.0.3"

Registration Methods

internal-first

For each CAPI Machine that is used for the control plane, we take the internal ip address from Machine.status.addresses if it exists. If there is no internal ip for a machine then we will use an external address instead. For the ip address found for a machine then we add it to RKEControlPlane.status.availableServerIPs.

The first IP address listed in RKEControlPlane.status.availableServerIPs is then used for the join.

internal-only-ips

For each CAPI Machine that is used for the control plane, we take the internal ip address from Machine.status.addresses if it exists and then we add it to RKEControlPlane.status.availableServerIPs.

The first IP address listed in RKEControlPlane.status.availableServerIPs is then used for the join.

external-only-ips

For each CAPI Machine that is used for the control plane, we take the external ip address from Machine.status.addresses if it exists and then we add it to RKEControlPlane.status.availableServerIPs.

The first IP address listed in RKEControlPlane.status.availableServerIPs is then used for the join.

address

For this method you must supply an address in the control plane spec (i.e. RKE2ControlPlane.spec.registrationAddress). This address is then used for the join.

With this method its expected that you have a load balancer / VIP solution sitting in front of all the control plane machines and all the join requests will be routed via this.

CIS and Pod Security Admission

In order to set a custom Pod Security Admission policy when CIS profile is selected it's required to create a secret with the policy content and set an appropriate field on the RKE2ControlPlane object:

apiVersion: v1
kind: Secret
metadata:
  name: pod-security-admission-config
data:
  pod-security-admission-config.yaml: |
    apiVersion: apiserver.config.k8s.io/v1
    kind: AdmissionConfiguration
    plugins:
    - name: PodSecurity
    configuration:
        apiVersion: pod-security.admission.config.k8s.io/v1beta1
        kind: PodSecurityConfiguration
        defaults:
        enforce: "restricted"
        enforce-version: "latest"
        audit: "restricted"
        audit-version: "latest"
        warn: "restricted"
        warn-version: "latest"
        exemptions:
        usernames: []
        runtimeClasses: []
        namespaces: [kube-system, cis-operator-system, tigera-operator]
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: RKE2ControlPlane
metadata:
  ...
spec:
  ...
  files:
    - path: /path/to/pod-security-admission-config.yaml
      contentFrom:
        secret:
          name: pod-security-admission-config
          key: pod-security-admission-config.yaml
  agentConfig:
    profile: cis
    podSecurityAdmissionConfigFile: /path/to/pod-security-admission-config.yaml
    ...

Example of PSA to allow Rancher components to run in the cluster:

apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
  - name: PodSecurity
    configuration:
      apiVersion: pod-security.admission.config.k8s.io/v1
      kind: PodSecurityConfiguration
      defaults:
        enforce: "restricted"
        enforce-version: "latest"
        audit: "restricted"
        audit-version: "latest"
        warn: "restricted"
        warn-version: "latest"
      exemptions:
        usernames: []
        runtimeClasses: []
        namespaces: [cattle-alerting,
                     cattle-fleet-local-system,
                     cattle-fleet-system,
                     cattle-global-data,
                     cattle-impersonation-system,
                     cattle-monitoring-system,
                     cattle-prometheus,
                     cattle-resources-system,
                     cattle-system,
                     cattle-ui-plugin-system,
                     cert-manager,
                     cis-operator-system,
                     fleet-default,
                     ingress-nginx,
                     kube-node-lease,
                     kube-public,
                     kube-system,
                     rancher-alerting-drivers]

Configuring Embedded Registry in RKE2

Overview

RKE2 allows users to enable an embedded registry on control plane nodes. When the embeddedRegistry option is set to true in the serverConfig, users can configure the registry using the PrivateRegistriesConfig field. The process follows RKE2 docs.

Enabling Embedded Registry

To enable the embedded registry, set the embeddedRegistry field to true in the serverConfig section of the RKE2ControlPlane configuration:

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: RKE2ControlPlane
metadata:
  name: my-cluster-control-plane
spec:
  serverConfig:
    embeddedRegistry: true

Configuring Private Registries

Once the embedded registry is enabled, you can configure private registries using the PrivateRegistriesConfig field in RKE2ConfigSpec. This field allows you to define registry mirrors, authentication, and TLS settings.

Example:

apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: RKE2Config
metadata:
  name: my-cluster-bootstrap
spec:
  privateRegistriesConfig:
    mirrors:
      "myregistry.example.com":
        endpoint:
          - "https://mirror1.example.com"
          - "https://mirror2.example.com"
    configs:
      "myregistry.example.com":
        authSecret:
          name: my-registry-secret
        tls:
          tlsConfigSecret:
            name: my-registry-tls-secret
          insecureSkipVerify: false

TLS Secret Format

When configuring the tlsConfigSecret, ensure the secret contains the following keys:

  • ca.crt – CA certificate
  • tls.key – TLS private key
  • tls.crt – TLS certificate

Examples

This section contains examples of how to use the CAPRKE2 with different cloud providers and platforms.

Setting up the Management Cluster

Make sure you set up a Management Cluster to use with Cluster API, you can follow instructions from the Cluster API book.

Cluster API AWS Infrastructure Provider

Installing the AWS provider

Refer to the Cluster API book for configuring AWS credentials and setting up the AWS infrastructure provider.

The next step is to run the clusterctl init command (make sure to provide valid AWS Credential using the AWS_B64ENCODED_CREDENTIALS environment variable):

CAPRKE2 can also be deployed with clusterctl

clusterctl init --bootstrap rke2 --control-plane rke2 --infrastructure aws

Create a workload cluster

Before creating a workload clusters, it is required to build an AMI for the RKE2 version that is going to be installed on the cluster. You can follow the steps in the image-builder README to build the AMI.

You will need to set the following environment variables:

export CONTROL_PLANE_MACHINE_COUNT=3
export WORKER_MACHINE_COUNT=1
export RKE2_VERSION=v1.30.2+rke2r1
export AWS_NODE_MACHINE_TYPE=t3a.large
export AWS_CONTROL_PLANE_MACHINE_TYPE=t3a.large 
export AWS_SSH_KEY_NAME="aws-ssh-key"
export AWS_REGION="aws-region"
export AWS_AMI_ID="ami-id"

Now, we can generate the YAML files from the templates using clusterctl generate yaml command:

clusterctl generate cluster --from https://github.com/rancher/cluster-api-provider-rke2/blob/main/examples/templates/aws/cluster-template.yaml -n example-aws rke2-aws > aws-rke2-clusterctl.yaml

After examining the result YAML file, you can apply to the management cluster using :

kubectl apply -f aws-rke2-clusterctl.yaml

Checking the workload cluster

After a while you should be able to check functionality of the workload cluster using clusterctl:

clusterctl describe cluster -n example-aws rke2-aws

and once the cluster is provisioned, it should look similar to the following:

NAME                                                          READY  SEVERITY  REASON  SINCE  MESSAGE
Cluster/rke2-aws                                              True                     16m
├─ClusterInfrastructure - AWSCluster/rke2-aws                 True                     25m
├─ControlPlane - RKE2ControlPlane/rke2-aws-control-plane      True                     16m
│ └─3 Machines...                                             True                     19m    See rke2-aws-control-plane-8wsfm, rke2-aws-control-plane-qgwr7, ...
└─Workers
  └─MachineDeployment/rke2-aws-md-0                           True                     18m
    └─2 Machines...                                           True                     19m    See rke2-aws-md-0-6d47bf584d-g2ljz, rke2-aws-md-0-6d47bf584d-m9z8h

Ignition based bootstrap

Note: ignition template is currently outdated.

Make sure that BootstrapFormatIgnition feature gate is enable for CAPA manager, you can do it by changing flag in the CAPA manager deployment:

containers:
- args:
  - --feature-gates=EKS=true,EKSEnableIAM=false,EKSAllowAddRoles=false,EKSFargate=false,MachinePool=false,EventBridgeInstanceState=false,AutoControllerIdentityCreator=true,BootstrapFormatIgnition=true,ExternalResourceGC=false
  ...
  name: manager

or by setting the following environment variable before installing CAPA with clusterctl:

export BOOTSTRAP_FORMAT_IGNITION=true

For the Ignition based bootstrap, you will also need to set the following environment variables:

export AWS_S3_BUCKET_NAME=<YOUR_AWS_S3_BUCKET_NAME>

Now you can generate manifests from the cluster template:

clusterctl generate cluster --from https://github.com/rancher/cluster-api-provider-rke2/blob/main/examples/templates/aws/cluster-template-ignition.yaml -n example-aws rke2-aws > aws-rke2-clusterctl.yaml

Cluster API vSphere Infrastructure Provider

Installing the vSphere provider and creating a workload cluster

This config includes a kubevip loadbalancer on the controlplane nodes. The VIP of the loadbalancer for the Kubernetes API is set by the CONTROL_PLANE_ENDPOINT_IP.

Prerequisites:

  • VM template to be used for the cluster machine should be present in the vSphere environment.
  • If airgapped environment is required then the VM template should already include RKE2 binaries as described in the docs. CAPRKE2 is using the tarball method to install RKE2 on the machines. Any additional images like vSphere CPI image should be present in the local environment too.

To initialize Cluster API Provider vSphere, clusterctl requires the following variables, which should be set in ~/.cluster-api/clusterctl.yaml as the following:

## -- Controller settings -- ##
VSPHERE_USERNAME: "<username>"                                # The username used to access the remote vSphere endpoint
VSPHERE_PASSWORD: "<password>"                                # The password used to access the remote vSphere endpoint

## -- Required workload cluster default settings -- ##
VSPHERE_SERVER: "10.0.0.1"                                    # The vCenter server IP or FQDN
VSPHERE_DATACENTER: "SDDC-Datacenter"                         # The vSphere datacenter to deploy the management cluster on
VSPHERE_DATASTORE: "DefaultDatastore"                         # The vSphere datastore to deploy the management cluster on
VSPHERE_NETWORK: "VM Network"                                 # The VM network to deploy the management cluster on
VSPHERE_RESOURCE_POOL: "*/Resources"                          # The vSphere resource pool for your VMs
VSPHERE_FOLDER: "vm"                                          # The VM folder for your VMs. Set to "" to use the root vSphere folder
VSPHERE_TEMPLATE: "ubuntu-1804-kube-v1.17.3"                  # The VM template to use for your management cluster.
CONTROL_PLANE_ENDPOINT_IP: "192.168.9.230"                    # the IP that kube-vip is going to use as a control plane endpoint
VSPHERE_TLS_THUMBPRINT: "..."                                 # sha256 thumbprint of the vcenter certificate: openssl x509 -sha256 -fingerprint -in ca.crt -noout
EXP_CLUSTER_RESOURCE_SET: "true"                              # This enables the ClusterResourceSet feature that we are using to deploy CSI
VSPHERE_SSH_AUTHORIZED_KEY: "ssh-rsa AAAAB3N..."              # The public ssh authorized key on all machines in this cluster.
                                                              #  Set to "" if you don't want to enable SSH, or are using another solution.
"CPI_IMAGE_K8S_VERSION": "v1.30.0"                            # The version of the vSphere CPI image to be used by the CPI workloads
                                                              #  Keep this close to the minimum Kubernetes version of the cluster being created.
**Warning:** This example uses KubeVIP, and there may be upstream issues with it. We do not provide support for resolving any such issues. Use at your own risk.

Then run the following command to generate the RKE2 cluster manifests:

clusterctl generate cluster --from https://github.com/rancher/cluster-api-provider-rke2/blob/main/examples/templates/vmware/cluster-template.yaml -n example-vsphere rke2-vsphere > vsphere-rke2-clusterctl.yaml
kubectl apply -f vsphere-rke2-clusterctl.yaml

Cluster API Docker Infrastructure Provider

This page focuses on using the RKE2 provider with the Docker Infrastructure provider.

Setting up the Management Cluster

Make sure you set up a Management Cluster to use with Cluster API, you can follow instructions from the Cluster API book.

Create a workload cluster

Before creating a workload clusters, it is required to set the following environment variables:

export CONTROL_PLANE_MACHINE_COUNT=3
export WORKER_MACHINE_COUNT=1
export RKE2_VERSION=v1.30.2+rke2r1
export KIND_IMAGE_VERSION=v1.30.0

Now, we can generate the YAML files from the templates using clusterctl generate yaml command:

clusterctl generate cluster --from https://github.com/rancher/cluster-api-provider-rke2/blob/main/examples/templates/docker/cluster-template.yaml -n example-docker rke2-docker > docker-rke2-clusterctl.yaml

After examining the result YAML file, you can apply to the management cluster using:

kubectl apply -f docker-rke2-clusterctl.yaml

Developer Guide

This section describes the workflow for regular developer tasks, such as:

  • Development guide
  • Releasing a new version of CAPRKE2

Development

The following instructions are for development purposes.

  1. Clone the Cluster API Repo into the GOPATH

    Why clone into the GOPATH? There have been historic issues with code generation tools when they are run outside the go path

  2. Fork the Cluster API Provider RKE2 repo

  3. Clone your new repo into the GOPATH (i.e. ~/go/src/github.com/yourname/cluster-api-provider-rke2)

  4. Ensure Tilt and kind are installed

  5. Create a tilt-settings.json file in the root of the directory where you cloned the Cluster API repo in step 1.

  6. Add the following contents to the file (replace /path/to/clone/of/ with appropriate file path and "yourname" with your github account name):

    {
        "default_registry": "ghcr.io/yourname",
        "provider_repos": ["/path/to/clone/of/github.com/yourname/cluster-api-provider-rke2"],
        "enable_providers": ["docker", "rke2-bootstrap", "rke2-control-plane"],
        "kustomize_substitutions": {
            "EXP_MACHINE_POOL": "true",
            "EXP_CLUSTER_RESOURCE_SET": "true"
        },
        "extra_args": {
            "rke2-bootstrap": ["--v=4"],
            "rke2-control-plane": ["--v=4"],
            "core": ["--v=4"]
        },
        "debug": {
            "rke2-bootstrap": {
                "continue": true,
                "port": 30001
            },
            "rke2-control-plane": {
                "continue": true,
                "port": 30002
            }
        }
    }
    
  7. Open another terminal (or pane) and go to the cluster-api directory.

  8. Run the following to create a configuration for kind:

    cat > kind-cluster-with-extramounts.yaml <<EOF
    kind: Cluster
    apiVersion: kind.x-k8s.io/v1alpha4
    name: capi-test
    nodes:
    - role: control-plane
      extraMounts:
        - hostPath: /var/run/docker.sock
          containerPath: /var/run/docker.sock
    EOF
    

    NOTE: if you are using Docker Desktop v4.13 or above then you will encounter issues from here. Until a permanent solution is found its recommended you use v4.12

  9. Run the following command to create a local kind cluster:

    kind create cluster --config kind-cluster-with-extramounts.yaml
    
  10. Now start tilt by running the following command in the directory where you cloned Cluster API repo in step 1:

    tilt up
    
  11. Press the space key to see the Tilt web ui and check that everything goes green.

Attaching the debugger

This section explains how to attach debugger to the CAPRKE2 process running in a pod started using above steps. By connecting a debugger to this process, you would be able to step through the code through your IDE. This guide covers two popular IDEs - IntelliJ GoLand and VS Code.

On the Tilt web UI and confirm the port on which caprke2_controller is exposed. By default, it should be localhost:30001.

GoLand

  1. Go to Run -> Edit Configurations.
  2. In the dialog box that opens up, add a new configuration by clicking on + sign in top left and selecting 'Go Remote'.
  3. Enter the 'Host' and 'Port' values based on where caprke2_controller is exposed.

VS Code

  1. If you don't already have a launch.json setup, go to 'Run and Debug' in the Activity Bar and click on 'create a launch.json file'. Press the escape key; you'll be presented with the newly created launch.json file. Alternatively, create a .vscode directory in the root of your git repo and create alaunch.json file in it.
  2. Insert the following configuration in the file:
    {
        "version": "0.2.0",
        "configurations": [
            {
                "name": "Connect to server",
                "type": "go",
                "request": "attach",
                "mode": "remote",
                "remotePath": "${workspaceFolder}",
                "port": 30002,
                "host": "127.0.0.1"
            }
        ]
    }
    

Insert a breakpoint, e.g., in the updateStatus method which responsible for updating the status of the RKE2ControlPlane resource and run the configuration. To check if you can step into the code, create a workload cluster by using the example provided in the documentation. If things were configured correctly, code execution would halt at the breakpoint and you should be able to step through it.

Troubleshooting

Doing tilt up should install a bunch of resources in the underlying kind cluster. If you don't see anything there, run tilt logs in a separate terminal without stopping the tilt up command that you originally started.

Common Issues

  1. Make sure you run the kind and tilt commands mentioned above from correct directory.
  2. A go mod vendor might be required in your clone of CAPI repo. tilt logs should make this obvious.

CAPRKE2 Releases

Release Cadence

  • CAPRKE2 minor versions (v0.2.0 versus v0.1.0) are released every 1-2 months.
  • CAPRKE2 patch versions (v0.2.2 versus v0.2.1) are released as often as weekly or bi-weekly.

Release Process

  1. Clone the repository locally:
git clone git@github.com:rancher/cluster-api-provider-rke2.git
  1. Depending on whether you are cutting a minor or patch release, the process varies.

    • If you are cutting a new minor release:

      Create a new release branch (i.e release-X) and push it to the upstream repository.

          # Note: `upstream` must be the remote pointing to `github.com:rancher/cluster-api-provider-rke2`.
          git checkout -b release-0.2
          git push -u upstream release-0.2
          # Export the tag of the minor release to be cut, e.g.:
          export RELEASE_TAG=v0.2.0
      
    • If you are cutting a patch release from an existing release branch:

      Use existing release branch.

          git checkout upstream/release-0.2
          # Export the tag of the patch release to be cut, e.g.:
          export RELEASE_TAG=v0.2.1
      
  2. Create a signed/annotated tag and push it:

# Create tags locally
git tag -s -a ${RELEASE_TAG} -m ${RELEASE_TAG}

# Push tags
git push upstream ${RELEASE_TAG}

This will trigger a release GitHub action that creates a release with RKE2 provider components.

  1. Mark release as ready.

Published releases are initially marked as draft. If the published version is supposed to be latest, mark it so on the release page, while editing the release. Please note that we are using semantic versioning while choosing latest version.

  1. Perform mandatory post-release activities, which will ensure contract metadata.yaml file is up-to-date in case of a future minor/major version change.

Prepare main branch for development of the new release

The goal of this task is to bump the versions on the main branch so that the upcoming release version is used for e.g. local development and e2e tests. We also modify tests so that they are testing the previous release.

This comes down to changing occurrences of the old version to the new version, e.g. v1.5 to v1.6, and preparing metadata.yaml for a future release version:

1. Update E2E tests

Existing E2E tests that point to a specific version need to be updated to use the new version instead.

  1. Add a future release to the list of providers in test/e2e/config/e2e_conf.yaml following the format used for previous versions. This will be used as a fake provider version for testing the current state of the repository instead of the actual GitHub release.
  2. Update bootstrap/control plane versions* inside function initUpgradableBootstrapCluster in test/e2e/e2e_suite_test.go.
  3. Edit upgrade test* in test/e2e/e2e_upgrade_test.go.

*To maintain the upgrade test concise and clean, and avoid a growing list of versions, it is required to maintain N-1 minor as a starting version (e.g. if releasing version v4.x, starting version is v3.x and the upgrade is as follows: v3.x -> v4.x).

2. Add future version to metadata.yaml. For example, if v0.5 was just released, we add v0.6 to the list of releaseSeries:

apiVersion: clusterctl.cluster.x-k8s.io/v1alpha3
kind: Metadata
releaseSeries:
  - major: 0
    minor: 1
    contract: v1beta1
  - major: 0
    minor: 2
    contract: v1beta1
  ...
  ...
  ...
  - major: x
    minor: x
    contract: x

Versioning

Cluster API Provider RKE2 follows semantic versioning specification.

Example versions:

  • Pre-release: v0.2.0-alpha.1
  • Minor release: v0.2.0
  • Patch release: v0.2.1
  • Major release: v2.0.0

With the v0 release of our codebase, we provide the following guarantees:

  • A (minor) release CAN include:

    • Introduction of new API versions, or new Kinds.
    • Compatible API changes like field additions, deprecation notices, etc.
    • Breaking API changes for deprecated APIs, fields, or code.
    • Features, promotion or removal of feature gates.
    • And more!
  • A (patch) release SHOULD only include backwards compatible set of bugfixes.

Backporting

Any backport MUST not be breaking for either API or behavioral changes.

It is generally not accepted to submit pull requests directly against release branches (release-X). However, backports of fixes or changes that have already been merged into the main branch may be accepted to all supported branches:

  • Critical bugs fixes, security issue fixes, or fixes for bugs without easy workarounds.
  • Dependency bumps for CVE (usually limited to CVE resolution; backports of non-CVE related version bumps are considered exceptions to be evaluated case by case)
  • Cert-manager version bumps (to avoid having releases with cert-manager versions that are out of support, when possible)
  • Changes required to support new Kubernetes versions, when possible. See supported Kubernetes versions for more details.
  • Changes to use the latest Go patch version to build controller images.
  • Improvements to existing docs (the latest supported branch hosts the current version of the book)

Note: We generally do not accept backports to Cluster API Provider RKE2 release branches that are out of support.

Branches

Cluster API Provider RKE2 has two types of branches: the main branch and release-X branches.

The main branch is where development happens. All the latest and greatest code, including breaking changes, happens on main.

The release-X branches contain stable, backwards compatible code. On every major or minor release, a new branch is created. It is from these branches that minor and patch releases are tagged. In some cases, it may be necessary to open PRs for bugfixes directly against stable branches, but this should generally not be the case.

Support and guarantees

Cluster API Provider RKE2 maintains the most recent release/releases for all supported APIs. Support for this section refers to the ability to backport and release patch versions; backport policy is defined above.

  • The API version is determined from the GroupVersion defined in the top-level bootstrap/api/ and controlplane/api/ packages.
  • For the current stable API version (v1beta1) we support the two most recent minor releases; older minor releases are immediately unsupported when a new major/minor release is available.

Reference

This section contains reference documentation for CAPRKE2 API types.