How to Setup Cassandra Replication on Ubuntu 16.04

If you have data that is not suited for relational database, chances are you are looking for NoSQL solution. NoSQL options are diverse, Aerospike, MongoDB, Redis, and many others try to solve Big Data problem in different ways. In this article we will concentrate on replication with cassandra. This database actually got the name from Greek Mythology, cassandra was the seer who always correctly predicted future but everybody disbelieved her. So the creators of this database predict that NoSQL will in future supplant the relational databases, but they don't expect RDBMS folks to believe them.


To follow this article, you should have 3 nodes set up one by one using our previous cassandra setup guide. You should have all three nodes up and running and three terminal windows with ssh session in each one. After you have that we commence to connecting up the Cassandra nodes into one cluster.

Building a cluster

Logged in as Cassandra user you need to edit Cassandra configuration in each of three nodes. The file is called cassandra.yaml

nano ~/conf/cassandra.yaml

This needs to be configured on all 3 servers. The seeds line can be entered in one server and then copied over, but ip addresses of each server must be entered genuine.

cluster_name: 'Test Cluster'

- class_name: org.apache.cassandra.locator.SimpleSeedProvider
- seeds: "your-server-ip,your-server-ip-2,your-server-ip-3"

listen_address: your-server-ip

rpc_address: your-server-ip

To set up entpoint snitch, paste this oneliner to all three nodes:

sed -i 's/endpoint_snitch: SimpleSnitch/endpoint_snitch: GossipingPropertyFileSnitch/g' ~/conf/cassandra.yaml

And use this command to append bootstrap line to the end of file.

echo 'auto_bootstrap: false' >> ~/conf/cassandra.yaml

The snitch we setup has incompatible datacenter name, dc1 instead of datacenter1 so lets fix that on all three nodes:

sed -i 's/dc=dc1/dc=datacenter1/g' ~/conf/

Restart all three nodes if needed, and after that sh bin/nodetool status  should get you something like this:


Next thing we need to do is connect to console from one of the nodes to the other node. We need to type address of server and port 9042 after cqlsh like this:

cqlsh ip.addr.of.node 9042

The local host login with cqlsh only will not work.

Replication setup

If you wonder why we changed default snitch configuration, now I will explain. There are generally two replication strategies with Cassandra. SimpleStrategy and NetworkTopologyStrategy. First uses the default snitch, second one uses they snitch we have set. We need this advanced strategy if we are going to have easy scaling of the cluster. With this strategy, you can add more nodes in another datacenter and span the cluster across the globe.

So inside the cqlsh console we need to type this:

CREATE KEYSPACE linoxide WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1' : 3};

It will create new keyspace named linoxide, with replication set with NetworkTopologyStrategy and it will make 3 replicas in datacenter1.

Ok, lets then see what we created. The command is bold, rest is output.

SELECT * FROM system_schema.keyspaces;
 keyspace_name | durable_writes | replication
linoxide | True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '3'}
system_auth | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
system_schema | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
system_distributed | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
system | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
system_traces | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '2'}

Lets exit the cqlsh and issue nodetool command once more, to see change in the cluster.

nodetool status
Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 250.7 KiB 256 100.0% 34689c1e-939c-4bd3-8774-ac4534880744 rack1
UN 188.02 KiB 256 100.0% 7542e062-d6d3-473a-b79c-4f5e11547c1f rack1
UN 236.58 KiB 256 100.0% 2f10690c-1e6e-4297-bda6-c3fb36279495 rack1

Notice that every node now has 100% of data, up from 66 percent earlier. That is due to replication factor 3 we set, we now have one copy of the data on each node.


So there, we have setup the Cassandra cluster with replication. From here, you can add more nodes, racks and datacenters, you can import arbitrary amount of data and change the replication factor in all or some of the datacenters. For ways how to do this, you can refer to Cassandra official documentation. I hope that this guide helped you dive into future of database technology and that you decided to believe Cassandra. Thank you for reading and have a nice day.

Leave a Comment