Backgroud reason
After deploying Rook Ceph by the default configuration, I realized that the filestore was set to filestore
, and I found the official suggestion as follows:
The default and recommended storeType is dynamically set to bluestore for devices and filestore for directories.
So, that’s because I use directory as OSD backend storage, so storeType is set to filestore.
And at the first time I created a block pool with Replicated
settings, but I want to change to Erasure Coded
settings for more storage capacity, because I’m using ceph in on-premises environment, not having much resources.
TL;DR
Check if any resource is using ceph as storage backend
Check if pvc exist.
$ kubectl get pvc --all-namespaces
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
default mysql-pv-claim Bound pvc-88499d1c-e910-4535-b97b-f9be5c3ea579 20Gi RWO rook-ceph-block 3d18h
default wp-pv-claim Bound pvc-bbd0cc4c-e5e6-4b83-b357-4368ee69728a 20Gi RWO rook-ceph-block 3d17h
Delete reources using pvc.
$ cd manifest/apps/example/stateful/wordpress_ceph
$ kubectl delete -f ./
service "wordpress-mysql" deleted
persistentvolumeclaim "mysql-pv-claim" deleted
deployment.apps "wordpress-mysql" deleted
service "wordpress" deleted
persistentvolumeclaim "wp-pv-claim" deleted
deployment.apps "wordpress" deleted
Change configuration of cluster.yaml
Change configuration of cluster.yaml
ceph cluster
storage:
useAllNodes: true
useAllDevices: true
deviceFilter: '^sd.' # The default settings are commented.
config:
storeType: bluestore # The default settings are commented.
Maintenance the dedicated nodes for ceph
I’ve tainted these nodes to be dedicated nodes for Ceph.
- datateam-rookceph-01
- datateam-rookceph-02
- datateam-rookceph-03
$ kubectl cordon datateam-rookceph-01 && \
kubectl cordon datateam-rookceph-02 && \
kubectl cordon datateam-rookceph-03
$ kubectl drain datateam-rookceph-01 && \
kubectl drain datateam-rookceph-02 && \
kubectl drain datateam-rookceph-03
$ kubectl uncordon datateam-rookceph-01 && \
kubectl uncordon datateam-rookceph-02 && \
kubectl uncordon datateam-rookceph-03
I’ve upgrade kernel and enable BBR on dedicated nodes for Rook Ceph. Reference: BBR
I’ve cleaned up the disks of the data on the nodes of Rook Ceph and increased the capacity.
Reapply the Rook Ceph cluster manifest
cd manifest/apps/rook/
kubectl apply cluster.yaml
Observe that there are no abnormal objects in the Ceph namespace.
[root@datateam-k8s-control-plane-01 rook]# kubectl get all -n rook-ceph
NAME READY STATUS RESTARTS AGE
... Omitted here ...
pod/rook-ceph-mgr-a-5d4765dfb-l9clk 0/1 CrashLoopBackOff 7 20h
pod/rook-ceph-mon-a-6864d58cc7-gkt4c 1/1 Running 0 20h
pod/rook-ceph-mon-b-65fbf9b96c-k792m 1/1 Running 0 20h
pod/rook-ceph-mon-c-54984d88b6-cbcz8 1/1 Running 0 20h
pod/rook-ceph-operator-648d574f5c-m6452 1/1 Running 0 2d
pod/rook-ceph-osd-0-6f966cc46b-wg8gq 0/1 CrashLoopBackOff 7 20h
pod/rook-ceph-osd-1-757cc69bc9-k6qbs 0/1 CrashLoopBackOff 7 20h
pod/rook-ceph-osd-2-57b88cd98f-zkfp7 0/1 CrashLoopBackOff 7 20h
NAME READY UP-TO-DATE AVAILABLE AGE
... Omitted here ...
deployment.apps/rook-ceph-mgr-a 0/1 1 0 25d
deployment.apps/rook-ceph-mon-a 1/1 1 1 25d
deployment.apps/rook-ceph-mon-b 1/1 1 1 25d
deployment.apps/rook-ceph-mon-c 1/1 1 1 25d
deployment.apps/rook-ceph-operator 1/1 1 1 25d
deployment.apps/rook-ceph-osd-0 0/1 1 0 25d
deployment.apps/rook-ceph-osd-1 0/1 1 0 25d
deployment.apps/rook-ceph-osd-2 0/1 1 0 25d
deployment.apps/rook-ceph-tools 1/1 1 1 24d
... Omitted here ...
So, as we can see, ceph-osd does not start as expect.
kubectl logs -f --tail=100 pod/rook-ceph-mgr-a-5d4765dfb-l9clk -n rook-ceph
Tear down Rook Ceph
After some investigation, I found that the Rook Operator can identify the nodes that need to deploy Ceph Cluster, which is in line with the expectations of taint and tolerance.
But somehow, the initialization process was not performed. I suspect that the device I prepared for blueStore was not recognized, but the setting of deviceFilter looks fine.
So I decided to tear down the entire Rook and redeploy it to see if it succeeded.
Here is the instruction of tear down.
One of the steps is very important. This involves cleaning up the partition table, and is Ceph’s standard for using disks as available devices.
cd $HOME && mkdir -pv hack && cd hack
DISK="/dev/sdb"
tee $HOME/hack/reset_disk_usable_ceph.sh <<-EOF
#!/usr/bin/env bash
DISK="$DISK"
# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
# You will have to run this step for all disks.
sgdisk --zap-all $DISK
# These steps only have to be run once on each node
# If rook sets up osds using ceph-volume, teardown leaves some devices mapped that lock the disks.
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %
# ceph-volume setup can leave ceph-<UUID> directories in /dev (unnecessary clutter)
rm -rf /dev/ceph-*
EOF
sh ./reset_disk_usable_ceph.sh
Recreate a cluster with bluestore as storeType backend
Just use these yaml files.
Modification between these files and official example is image location and block storage devicefilter, nothing big deal.
Yaml files is located here: Rook-Ceph-blueStore.
In the “csi / rbd” directory, there are files with the words “ec” and “replicated”, respectively.
They represent block storage of two algorithms,Erasure Coded
andReplicated
.
Notice:
After apply the
storageclass-ec_customized.yaml
, we shall notice that the default PG(Placement Group) number ofec-data-pool
is 8, this is too little and can see warn log on Ceph Dashboard.
Here is an official tool and guidance of PG calculating.
Take my example here, it should be 128
PGs.
After recreating the cluster with blueStore as storeType backend, I use this command to verify the cluster status:
# Get OSD Pods
# This uses the example/default cluster name "rook-ceph"
OSD_PODS=$(kubectl -n rook-ceph get pods -l \
app=rook-ceph-osd,rook_cluster=rook-ceph -o jsonpath='{.items[*].metadata.name}')
# Find node and drive associations from OSD pods
for pod in $(echo ${OSD_PODS})
do
echo "Pod: ${pod}"
echo "Node: $(kubectl -n rook-ceph get pod ${pod} -o jsonpath='{.spec.nodeName}')"
kubectl -n rook-ceph exec ${pod} -- sh -c '\
for i in /var/lib/rook/osd*; do
[ -f ${i}/ready ] || continue
echo -ne "-$(basename ${i}) "
echo $(lsblk -n -o NAME,SIZE ${i}/block 2> /dev/null || \
findmnt -n -v -o SOURCE,SIZE -T ${i}) $(cat ${i}/type)
done | sort -V
echo'
done
Nothing show up. :sweat_smile:
But I can see blueStore is enabled via Dashboard.
Tips
As far as the latest(Release 1.2) version is concerned.
It is not supported to deploy more than one Ceph cluster in the Rook Cluster.
And when I tried to deploy another Ceph cluster in a new namespace, it turns up error of the CRD created by common.yaml
and some Ceph plugin would need to be configured in deepth. I mean not only need to change the name of the namespace.
So, if you are not fimiliar with Ceph itself, you better plan early on what nodes to use to deploy Ceph, the size of the nodes, and how to expand the cluster.
Like my settings in the manifest, I use taints and node affinity to restrict Ceph deployed on specific nodes and use /dev/[^sdb-i]
as bluestore devices.
I can add nodes or devices on nodes to adjust the scale of cluster.
And changing the PG number and mon
amount may be also needed, it depends on the further research of utilization.