Backgroud reason

After deploying Rook Ceph by the default configuration, I realized that the filestore was set to filestore, and I found the official suggestion as follows:

The default and recommended storeType is dynamically set to bluestore for devices and filestore for directories.

So, that’s because I use directory as OSD backend storage, so storeType is set to filestore.

And at the first time I created a block pool with Replicated settings, but I want to change to Erasure Coded settings for more storage capacity, because I’m using ceph in on-premises environment, not having much resources.

TL;DR

Check if any resource is using ceph as storage backend

Check if pvc exist.

$ kubectl get pvc --all-namespaces
NAMESPACE   NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
default     mysql-pv-claim   Bound    pvc-88499d1c-e910-4535-b97b-f9be5c3ea579   20Gi       RWO            rook-ceph-block   3d18h
default     wp-pv-claim      Bound    pvc-bbd0cc4c-e5e6-4b83-b357-4368ee69728a   20Gi       RWO            rook-ceph-block   3d17h

Delete reources using pvc.

$ cd manifest/apps/example/stateful/wordpress_ceph
$ kubectl delete -f ./
service "wordpress-mysql" deleted
persistentvolumeclaim "mysql-pv-claim" deleted
deployment.apps "wordpress-mysql" deleted
service "wordpress" deleted
persistentvolumeclaim "wp-pv-claim" deleted
deployment.apps "wordpress" deleted

Change configuration of `cluster.yaml`

Change configuration of cluster.yaml ceph cluster

storage:
  useAllNodes: true
  useAllDevices: true
  deviceFilter: '^sd.'   # The default settings are commented.
  config:
    storeType: bluestore # The default settings are commented.

Maintenance the dedicated nodes for ceph

I’ve tainted these nodes to be dedicated nodes for Ceph.

datateam-rookceph-01
datateam-rookceph-02
datateam-rookceph-03

$ kubectl cordon datateam-rookceph-01 && \
   kubectl cordon datateam-rookceph-02 && \
   kubectl cordon datateam-rookceph-03

$ kubectl drain datateam-rookceph-01 && \
   kubectl drain datateam-rookceph-02 && \
   kubectl drain datateam-rookceph-03

$ kubectl uncordon datateam-rookceph-01 && \
   kubectl uncordon datateam-rookceph-02 && \
   kubectl uncordon datateam-rookceph-03

I’ve upgrade kernel and enable BBR on dedicated nodes for Rook Ceph. Reference: BBR

I’ve cleaned up the disks of the data on the nodes of Rook Ceph and increased the capacity.

Reapply the Rook Ceph cluster manifest

cd manifest/apps/rook/
kubectl apply cluster.yaml

Observe that there are no abnormal objects in the Ceph namespace.

[root@datateam-k8s-control-plane-01 rook]# kubectl get all -n rook-ceph
NAME                                                                 READY   STATUS             RESTARTS   AGE
... Omitted here ...
pod/rook-ceph-mgr-a-5d4765dfb-l9clk                                  0/1     CrashLoopBackOff   7          20h
pod/rook-ceph-mon-a-6864d58cc7-gkt4c                                 1/1     Running            0          20h
pod/rook-ceph-mon-b-65fbf9b96c-k792m                                 1/1     Running            0          20h
pod/rook-ceph-mon-c-54984d88b6-cbcz8                                 1/1     Running            0          20h
pod/rook-ceph-operator-648d574f5c-m6452                              1/1     Running            0          2d
pod/rook-ceph-osd-0-6f966cc46b-wg8gq                                 0/1     CrashLoopBackOff   7          20h
pod/rook-ceph-osd-1-757cc69bc9-k6qbs                                 0/1     CrashLoopBackOff   7          20h
pod/rook-ceph-osd-2-57b88cd98f-zkfp7                                 0/1     CrashLoopBackOff   7          20h



NAME                                                            READY   UP-TO-DATE   AVAILABLE   AGE
... Omitted here ...
deployment.apps/rook-ceph-mgr-a                                 0/1     1            0           25d
deployment.apps/rook-ceph-mon-a                                 1/1     1            1           25d
deployment.apps/rook-ceph-mon-b                                 1/1     1            1           25d
deployment.apps/rook-ceph-mon-c                                 1/1     1            1           25d
deployment.apps/rook-ceph-operator                              1/1     1            1           25d
deployment.apps/rook-ceph-osd-0                                 0/1     1            0           25d
deployment.apps/rook-ceph-osd-1                                 0/1     1            0           25d
deployment.apps/rook-ceph-osd-2                                 0/1     1            0           25d
deployment.apps/rook-ceph-tools                                 1/1     1            1           24d

... Omitted here ...

So, as we can see, ceph-osd does not start as expect.

kubectl logs -f --tail=100 pod/rook-ceph-mgr-a-5d4765dfb-l9clk -n rook-ceph

Tear down Rook Ceph

After some investigation, I found that the Rook Operator can identify the nodes that need to deploy Ceph Cluster, which is in line with the expectations of taint and tolerance.

But somehow, the initialization process was not performed. I suspect that the device I prepared for blueStore was not recognized, but the setting of deviceFilter looks fine.

So I decided to tear down the entire Rook and redeploy it to see if it succeeded.
Here is the instruction of tear down.

One of the steps is very important. This involves cleaning up the partition table, and is Ceph’s standard for using disks as available devices.

cd $HOME && mkdir -pv hack && cd hack
DISK="/dev/sdb"
tee $HOME/hack/reset_disk_usable_ceph.sh <<-EOF
#!/usr/bin/env bash
DISK="$DISK"
# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
# You will have to run this step for all disks.
sgdisk --zap-all $DISK

# These steps only have to be run once on each node
# If rook sets up osds using ceph-volume, teardown leaves some devices mapped that lock the disks.
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %
# ceph-volume setup can leave ceph-<UUID> directories in /dev (unnecessary clutter)
rm -rf /dev/ceph-*
EOF
sh ./reset_disk_usable_ceph.sh

Recreate a cluster with bluestore as storeType backend

Just use these yaml files.
Modification between these files and official example is image location and block storage devicefilter, nothing big deal.

Yaml files is located here: Rook-Ceph-blueStore.

In the “csi / rbd” directory, there are files with the words “ec” and “replicated”, respectively.
They represent block storage of two algorithms, Erasure Coded and Replicated.

Notice:

After apply the storageclass-ec_customized.yaml, we shall notice that the default PG(Placement Group) number of ec-data-pool is 8, this is too little and can see warn log on Ceph Dashboard.

Here is an official tool and guidance of PG calculating.
Take my example here, it should be 128 PGs.

After recreating the cluster with blueStore as storeType backend, I use this command to verify the cluster status:

# Get OSD Pods
# This uses the example/default cluster name "rook-ceph"
OSD_PODS=$(kubectl -n rook-ceph get pods -l \
  app=rook-ceph-osd,rook_cluster=rook-ceph -o jsonpath='{.items[*].metadata.name}')

# Find node and drive associations from OSD pods
for pod in $(echo ${OSD_PODS})
do
 echo "Pod:  ${pod}"
 echo "Node: $(kubectl -n rook-ceph get pod ${pod} -o jsonpath='{.spec.nodeName}')"
 kubectl -n rook-ceph exec ${pod} -- sh -c '\
  for i in /var/lib/rook/osd*; do
    [ -f ${i}/ready ] || continue
    echo -ne "-$(basename ${i}) "
    echo $(lsblk -n -o NAME,SIZE ${i}/block 2> /dev/null || \
    findmnt -n -v -o SOURCE,SIZE -T ${i}) $(cat ${i}/type)
  done | sort -V
  echo'
done

Nothing show up. :sweat_smile:

But I can see blueStore is enabled via Dashboard.

bluestore enabled

Tips

As far as the latest(Release 1.2) version is concerned.
It is not supported to deploy more than one Ceph cluster in the Rook Cluster.
And when I tried to deploy another Ceph cluster in a new namespace, it turns up error of the CRD created by common.yaml and some Ceph plugin would need to be configured in deepth. I mean not only need to change the name of the namespace.
So, if you are not fimiliar with Ceph itself, you better plan early on what nodes to use to deploy Ceph, the size of the nodes, and how to expand the cluster.
Like my settings in the manifest, I use taints and node affinity to restrict Ceph deployed on specific nodes and use /dev/[^sdb-i] as bluestore devices.
I can add nodes or devices on nodes to adjust the scale of cluster.
And changing the PG number and mon amount may be also needed, it depends on the further research of utilization.