Understanding Ceph - An Open Source Distributed Storage Platform

Ceph is software based, distributed storage platform which runs on commodity hardware. In order to understand the efficiency of Ceph, lets first see what is commodity hardware. Commodity computers are hardware components developed by multiple vendors, incorporating hardware components based on a single open standard. When compared to supermicrocomputer, commodity computers are low cost and its open standards ensures less differentiation among different vendors' products. Ceph storage cluster runs on these commodity computers and it uses well know CRUSH (Controlled Replication Under Scalable Hashing) algorithm to ensure data distribution and scaling among the cluster. The main goals of Ceph development are to provide extremely scalable, object, block and file-based storage mechanism. Ceph offers one single storage platform that can handle all type of data storage mechanisms (object, block, and file). It is highly scalable i.e. up to Exabyte level and built-in fault tolerant and data replication mechanism for data consistency.

History of Ceph

It was created as an open source project by Sage Weil, started in 2004, and the software became available under an open source license in 2006. Weil was the founder of "Inktank Storage" company which further kept working on Ceph’s development until the Red Hat corporation purchased Inktank and they brought the Ceph’s development in-house. The first major stable release of Ceph was launched in 2012. In October 2014, the development team released “Giant”, the seventh major stable release of Ceph. It’s still a work in progress for the sake of maturity and perfection.

A Ceph Storage Cluster consists of two types of daemons:

  • OSD Daemon
  • Ceph Monitor

Ceph OSD Daemon

Object Storage Devices (OSD) is an important part of Ceph based cluster. OSDs actually store the contents of files/data. They use File System to store this data. OSD Daemon is used to manage all such disks on the cluster. This daemon is responsible for storing data on a local file system and providing access to this data over the network via different client software or access mediums. This daemon is responsible for adding and removing disks, partitioning a disk, managing OSD, lower-level space and security functions, and data replication access disks.

Ceph Monitor

Ceph Monitor is the one daemon that is used to monitor the complete clustering. If you have a up and running Ceph based cluster, you will need Ceph Monitor on daily basis to see your cluster health and status. Monitoring a cluster on daily basis involves tasks like checking the overall state of OSDs, file system or block level data status. You can manage load balancing and data replication details using this Daemon.

In order to better understand the working of Ceph cluster, let’s see how it handles all three types of storage mechanisms.

Ceph Object storage

When data is written to Ceph, it uses its built-in mechanism to automatically stripe and replicate this data all across the cluster. Ceph’s object storage data can not only be accessed by using the built-in Ceph APIs, but you can also use Amazon S3 services or REST based APIs to access it. Ceph’s block storage mechanism offers RADOS (Reliable Autonomic Distributed Object Store). RADOS service is an integral part of Storage mechanism; it has the ability to scale to thousands of hardware devices (commonly referred to as "Nodes") by using the management software that is already installed on these nodes.

Ceph Block Storage

This storage mode lets users to mount Ceph as a thinly provisioned block device. RADOS service is also used on block storage level to ensure data scalability. Librados is involved on this level; it is a software library that users use to communicate with Stoage servers or nodes. It is a python based library, it is one open source app, so you can also tweak and enhance librados  to your own needs for better communication with Ceph  nodes.  It uses "RADOS Block Device" also known as RBD to integrate with back-end. RBD also inherits the functionality of Librados and it can also be used to snapshot and restore of data in cluster.

Ceph File Storage

CephFS is a Portable Operating System Interface (POSIX)-compliant File System for Ceph based storage clusters. It provides services that map the actual data directories and files to the objects stored within RADOS. So in this way CephFS and RADOS work with hands in hands. RADOS works here by dynamically distributing data evenly among different nodes. This File system offers support for unlimited data storage and stronger data safety. Ceph is well known for File Storage cluster as it delivers maximum performance and scalability. Please note that you can also use btrfs or EXT4 with Ceph, but for that RedHat recommends using a recent Linux kernel (v3.14 or later).


Ceph, operating under Redhat umbrella, offers cost effective, easy to manage and highly reliable data clustering features. Redhat is actively working on it,  its continuous ongoing development ensures that bug fixes and new features are added on regular basis. It open source and easily modifiable to your needs.

Leave a Comment