In this post we will configure and start Ceph on a cluster using Ansible and Docker. We assume that you are either using the Docker images provided by the Ceph team (Dockerhub) or you built your own images (we describe that process in Docker: Building Ceph Images) and pushed it to some registry (again we talk about that in Docker: Distributing Ceph Images).
There are many tools that deploy Ceph on a cluster. These tools format disks, set up the keyrings, set up configuration files, and start the daemons. Doing it by hand is very difficult. Most people use ceph-deploy but this installs a bunch of packages on the host machines. Instead we use Docker.
Pre-requisites: CloudLab account.
Setup your SSH keys using the Getting
Started directions on the
CloudLab website. Then deploy CloudLab nodes using the CephFS
Profile. These nodes
automatically install Docker on bootup, using our
install
script. After the nodes finish booting (i.e. their status on the CloudLab GUI
read as READY
), push your SSH keys using our pushkeys.sh
convenience
script. Assuming that you have your SSH keys in ~/.ssh
, you can use:
~$ wget https://raw.githubusercontent.com/michaelsevilla/ceph-popper-template/master/hardware/cloudlab/pushkeys.sh
~$ chmod 750 pushkeys.sh
~$ export CLOUDLAB_NODES="<IP0> <IP1> ..."
~$ ./pushkeys.sh
[... snip ...]
DONE!
ceph-ansible is a tool that configures hardware and software for Ceph. We forked the project and made it less dependent on docker-py.
Because our lab follows the Popper Convention to make our research reproducible we save the deploy code, configurations, and benchmarks for all our experiments in a separate repository. We have a repository that helps administrators get started called ceph-popper-template. Log into the head node of the CloudLab cluster and do some setup:
~$ ssh <IP0>
~$ git clone --recursive https://github.com/michaelsevilla/ceph-popper-template.git
~$ cd ceph-popper-template/pipelines/baseline
This repository has submodules that point to ceph-ansible and our own custom roles; configuration files for our Ceph setup; and helper scripts written in bash that deploy Ceph and run the benchmarks. For more information, see the README. For more information on the “baseline” and “pipelines” terminology, read more on the Popper Convention.
hosts
file)Here we specify the IPs and remote users in your cluster. Ansible reads node IP
addresses from a hosts
file, so edit it to describe your CloudLab cluster:
diff --git a/pipelines/baseline/hosts b/pipelines/baseline/hosts
index ab16db9..0f8cf1a 100644
--- a/pipelines/baseline/hosts
+++ b/pipelines/baseline/hosts
@@ -1,11 +1,13 @@
[osds]
-<CHANGE ME>
+node-1 ansible_ssh_user=msevilla
+node-2 ansible_ssh_user=msevilla
+node-3 ansible_ssh_user=msevilla
[mons]
-<CHANGE ME>
+node-0 ansible_ssh_user=msevilla
[mdss]
-<CHANGE ME>
+node-4 ansible_ssh_user=msevilla
[clients]
-<CHANGE ME>
+node-5 ansible_ssh_user=msevilla
Ceph Ansible requires the proper hostnames; if you are using something like CloudLab make sure that your hostnames do not the FQDN:
msevilla@node-1:~$ hostname
node-1.msevilla-qv20111.cephfs-pg0.wisc.cloudlab.us
msevilla@node-1:~$ sudo hostname node-1
msevilla@node-1:~$ hostname
node-1
ansible
directory)The ansible
directory has code for deploying Ceph and its components:
msevilla@node-0:~/experiment/pipelines/baseline$ ls -alh ansible/
total 32K
drwxr-xr-x 4 msevilla cephfs-PG0 4.0K Feb 13 10:04 .
drwxr-xr-x 5 msevilla cephfs-PG0 4.0K Feb 13 10:01 ..
-rw-r--r-- 1 msevilla cephfs-PG0 286 Feb 13 00:29 ansible.cfg
-rw-r--r-- 1 msevilla cephfs-PG0 1001 Feb 13 09:57 ceph.yml
-rw-r--r-- 1 msevilla cephfs-PG0 3.0K Feb 13 00:29 cleanup.yml
drwxr-xr-x 4 msevilla cephfs-PG0 4.0K Feb 13 10:02 group_vars
-rw-r--r-- 1 msevilla cephfs-PG0 133 Feb 13 00:29 monitor.yml
drwxr-xr-x 2 msevilla cephfs-PG0 4.0K Feb 13 00:29 workloads
The *.yml
files are Ansible playbooks that start and configure components:
ceph.yml
starts Ceph, cleanup.yml
tears Ceph down, and monitor.yml
starts
daemons that monitor performance. We separate these components into different
playbooks so users can mix and match Ceph services. The workloads
directory
has scrips for running the baseline benchmarks. The other files and directories
are Ansible configuration files used by the playbooks.
Users can change the ansible/ceph.yml
to specify which Ceph daemons to launch in
the cluster. It uses the hosts
file we set up above and is based off the
ceph-ansible site
file.
vars.yml
)High level configurations are in the vars.yml
. This file is heavily
commented. For CloudLab, uncomment all blocks labeled “Uncomment for CloudLab”:
diff --git a/pipelines/baseline/vars.yml b/pipelines/baseline/vars.yml
index 7b85b19..9854801 100644
--- a/pipelines/baseline/vars.yml
+++ b/pipelines/baseline/vars.yml
@@ -24,13 +24,13 @@
#cloudlab: false
## uncomment for cloudlab
-#ceph_repository: https://github.com/michaelsevilla/ceph.git
-#ceph_docker_version: a241c6f
-#ceph_docker_username: michaelsevilla
-#ceph_docker_imagename: ceph
-#ceph_wait_status: "HEALTH_OK"
-#cloudlab: true
-#kraken: false
+ceph_repository: https://github.com/michaelsevilla/ceph.git
+ceph_docker_version: a241c6f
+ceph_docker_username: michaelsevilla
+ceph_docker_imagename: ceph
+ceph_wait_status: "HEALTH_OK"
+cloudlab: true
+kraken: false
################
# MON #
@@ -45,10 +45,10 @@
#ceph_mon_docker_subnet: 192.168.140.0/24
## uncomment for cloudlab
-#monitor_interface: eth3
-#ceph_mon_docker_interface: eth3
-#ceph_mon_docker_subnet: 10.10.1.1
-#public_network: 10.10.1.1/24
+monitor_interface: eth3
+ceph_mon_docker_interface: eth3
+ceph_mon_docker_subnet: 10.10.1.1
+public_network: 10.10.1.1/24
################
# OSD #
@@ -62,16 +62,16 @@
#ceph_pgs: [8]
## uncomment for cloudlab
-#ceph_pgs: [64, 128]
+ceph_pgs: [64, 128]
#<CHANGE ME> # select one of the below scenarios
## uncomment to use a real disk (for msevilla's cluster/cloudlab)
-#ceph_osd_docker_extra_env: "CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE"
-#ceph_osd_docker_prepare_env: "CEPH_DAEMON=OSD_CEPH_DISK_PREPARE,OSD_FORCE_ZAP=1"
-#ceph_osd_docker_devices:
-# - /dev/sdc
-#osd_data_size: 400
+ceph_osd_docker_extra_env: "CEPH_DAEMON=OSD_CEPH_DISK_ACTIVATE"
+ceph_osd_docker_prepare_env: "CEPH_DAEMON=OSD_CEPH_DISK_PREPARE,OSD_FORCE_ZAP=1"
+ceph_osd_docker_devices:
+ - /dev/sdc
+osd_data_size: 400
## uncomment to use a directory (for singlenode)
#ceph_osd_docker_extra_env: "DUMMY=DUMMY"
@@ -83,7 +83,7 @@
#mount_point: <CHANGE ME> # where CephFS should mount
## uncomment for all clusters
-#mount_point: "/cephfs-baseliner"
+mount_point: "/cephfs-baseliner"
####################
# experiment stuff #
@@ -96,5 +96,5 @@
#repetitions: 1
## uncomment for multinode (cluster/cloudlab)
-#ceph_radosbench_cmd: "rados -p cephfs_metadata bench --show-time 120 write"
-#repetitions: 3
+ceph_radosbench_cmd: "rados -p cephfs_metadata bench --show-time 120 write"
+repetitions: 3
Verify that all your changes match your cluster. In particular make sure the network and disks configurations match your cluster using:
# for *interface, ceph_mon_docker_subnet, public_network use the command:
$ ifconfig
# for ceph_osd_docker_device use the command:
$ sudo fdisk -l
We configured our cluster with 1 MDS, 3 OSDs, and 1 MON. We bumped the number of placement groups to 128, as per the Placement Group Preselection Guide.
ansible/group_vars
directory)To configure the Ceph cluster with the variables in the Ceph configuration file
documentation,
change the Ansible group_vars/all
file. We also need to specify the image we
built in the Docker: Building Ceph Images blog.
These can be specified in each configuration file for the daemon but for
simplicity we put everything in the global variable file.
The run script copies the configuration files to the Ceph Ansible directory and deploys Ceph on the cluster using our Docker image.
$ ./run.sh
After Ceph finishes starting up we should see:
cluster e9570dd8-03ad-45f0-8a74-ec9b3bb7095f
health HEALTH_OK
monmap e2: 1 mons at {node-0=10.10.1.1:6789/0}
election epoch 4, quorum 0 node-0
fsmap e5: 1/1/1 up {0=node-4=up:active}
mgr no daemons active
osdmap e19: 3 osds: 3 up, 3 in
flags sortbitwise,require_jewel_osds,require_kraken_osds
pgmap v27: 256 pgs, 2 pools, 2523 bytes data, 20 objects
91706 MB used, 1091 GB / 1180 GB avail
256 active+clean
recovery io 221 B/s, 0 keys/s, 2 objects/s
client io 1638 B/s wr, 0 op/s rd, 6 op/s wr
Great! It looks like Ceph is healthy and running.
We can also use run.sh
to check the health of our Ceph deployment by
benchmarking the RADOS write speed, network bandwidth, and disk bandwidth. We
also enable monitoring:
diff --git a/pipelines/baseline/run.sh b/pipelines/baseline/run.sh
index 55fda4d..7f14eeb 100755
--- a/pipelines/baseline/run.sh
+++ b/pipelines/baseline/run.sh
@@ -27,11 +27,11 @@ DOCKER="docker run -it --rm $NETW $DIRS $ANSB $CODE $WORK michaelsevilla/ansible
rm -rf results || true
mkdir results
-$DOCKER cleanup.yml ceph.yml
+$DOCKER cleanup.yml ceph.yml monitor.yml
# Uncomment to run baseline
-#$DOCKER /workloads/radosbench.yml \
-# /workloads/netbench.yml \
-# /workloads/osdbench.yml
+$DOCKER /workloads/radosbench.yml \
+ /workloads/netbench.yml \
+ /workloads/osdbench.yml
exit 0
Ceph provides the Calamari monitoring framework but when we started our work with Ceph it was not open source or production quality. We built our own framework using Docker containers. The images package the following tools:
Graphite: database for storing metrics, API for receiving metrics, and a GUI for visualization
Collectl: command line tool that monitors common server resource utilizations
ceph-stats: command line tool (written by us) that packages and sends Ceph performance metrics to Graphite using netcat.
All you have to do is add an IP for Graphite:
diff --git a/pipelines/baseline/hosts b/pipelines/baseline/hosts
index ab16db9..ff50ed5 100644
--- a/pipelines/baseline/hosts
+++ b/pipelines/baseline/hosts
@@ -1,11 +1,16 @@
[clients]
<CHANGE ME>
node-0 ansible_ssh_user=msevilla
+
+[graphite]
+node-0 ansible_ssh_user=msevilla
We can add any type of monitoring services to any node using ansible/monitor.yml
:
---
- hosts: graphite
become: True
roles:
- monitor/graphite
- hosts: osds
become: True
roles:
- monitor/collectl
These tools are configured in the same way that the Ceph daemons are
configured: with the variables in the ansible/group_vars
directory. We look at
the Graphite configuration as an example:
$ cat ansible/group_vars/graphite
---
webui_port: 8082
These variables can be changed if you need to use different ports on your cluster. To understand how these variables are used we need to look at the deploy code for the service. In this case we look at the Ansible deploy code for Graphite here.
We see that WEBUI_PORT
is passed into the Docker container as an environment
variable. To understand what the container does with this environment variable,
we look at the Graphite image source code; taking a peek we see that it is
launched with the entrypoint for the Docker
image
saved in the srl-roles repository. That entrypoint shows that WEBUI_PORT
is passed as a command line argument:
./bin/run-graphite-devel-server.py --port $WEBUI_PORT /opt/graphite
Welcome to the world of Ansible/Docker – things are linked all over the place and figuring out what calls what takes us on a journey that traverses multiple repositories and content hosting services. :)
To ensure that the cluster is healthy re-run ./run.sh
, which
is packaged with our experiment template repository. This runs sanity checks
inspired by the Ceph blog
post
and benchmarks RADOS write throughput, network bandwidth, and disk IO.
$ ./run.sh
If you log in to one of the Ceph OSDs we see:
:~$ docker ps
IMAGE COMMAND NAMES
michaelsevilla/collectl "/entrypoint.sh" collectl
piha.soe.ucsc.edu:5000/ceph/daemon:master "/entrypoint.sh" issdm-0-osd-devsde
While it is running, we can explore the different metrics using Graphite – we have a video showing off some of the powerful visualization features:
When the benchmarks finish the results are stored in the results
directory.
Reading files of results is hard… especially when the formats differ. We package the scripts for parsing and graphing results in a Python notebook. To interact with the results and graphs run:
$ cd ceph-popper-template/pipelines/
$ ./jupyter.sh
The graphs are in baseline/visualize/visualize.ipynb
and if you open that up
you see the parsing code and graphs for the results. The notebook for our
results is rendered by GitHub
here
(but note that GitHub does not make the notebook interactive). We have a
screenshot of the notebook below:
Awesome. Now we can monitor and measure our cluster.
Jekyll theme inspired by researcher