DATABASE

Apache Cassandra The Crash-Proof Nosql Database (Part 1)

- Free product key for windows 10

- Free Product Key for Microsoft office 365

- Malwarebytes Premium 3.7.1 Serial Keys (LifeTime) 2019

The last time I wrote on NoSQL databases in February 2011, the technology was already booming. Today, these databases have changed the way developers think about building their applications; making them look beyond RDBMS back-ends to even handle data on a massive scale. Some very unique data models that were earlier impossible with conventional databases are now possible with NoSQL databases and clustering. One such NoSQL database is Cassandra, which was donated to Apache by Facebook in 2008.

Apache Cassandra the Crash-Proof Nosql Database

Cassandra’s most enticing and central feature is that it is decentralized and has no single point of failure. It is a column-oriented database, which was initially inspired and based on Amazon's Dynamo for its distributed design. The decentralized design makes it immune to almost any type of outage that affects a part of the cluster, while the column family-based design allows for richer, more complex data models that resemble Google's BigTable. This has allowed it to develop a good amalgamation of features from both Dynamo and BigTable, while evolving into a top-notch choice for production environments in various organizations, including the place where it was created – Facebook.

Cassandra has allowed it to develop a good amalgamation of features from both Dynamo and BigTable

Another concept that is important to Cassandra is eventual consistency, which is increasingly being looked at in the context of Brewer's CAP theorem, which I had discussed in the earlier article. Eventual consistency, as its name suggests, offers huge performance benefits by assuming that consistency does not need to be guaranteed immediately at all points in the database, and that it can be relaxed to some extent. This is achieved by what is known as a tunable consistency model, which uses the consistency level setting to be specified with each operation, so that they are deemed to be successful even if data has not been written to all replicas.

Architecture

Cassandra uses so many components to build upon its complex architectural theory that it is really difficult to go through all the bits and pieces without missing anything. The terminologies discussed here are those that provide an insight into the inner workings of this database. Cassandra's architecture is built more towards avoiding a single point of failure in the cluster, so as to have unhindered access to the maximum amount of data in case any part of the cluster fails. It uses technologies that resemble peer-to-peer networking to achieve a failure-proof data distribution model. Hence, no single node in a Cassandra cluster can be termed as a master of others, and coordination among the nodes is achieved with the help of the Gossip failure detection protocol, which is used to determine the availability of a node in the cluster.

Gossip is managed with the help of the Gossiper present on each node, which keeps on initiating ‘Gossip’ communications periodically with random nodes to check their availability. As a result, each node performs similar functions to others, and there are no designated roles for a particular function.

Gossip is managed with the help of the Gossiper present on each node, which keeps on initiating ‘Gossip’ communications periodically with random nodes to check their availability

Each node in Cassandra is part of a ring, which determines the way in which the topology of a Cassandra cluster is represented. Each node in the cluster is assigned a token, and a part of the data for which it is responsible. The data to be assigned to each node is determined by the Partitioner, which allows the row keys to be sorted according to the partitioning strategy chosen. The default strategy is random partitioning, which works on the basis of consistent hashing to distribute row keys. Another partitioning strategy available is the use of Byte-Ordered Partitioner, which orders row keys according to their raw bytes. AntiEntropy is then used to synchronize the replicas of the data to the newest version by periodically comparing the checksums. Merkle trees are used in Cassandra to implement AntiEntropy, just like for Dynamo, but in a slightly different way. For more details, you could read the respective documentation.

Reads and writes

When a node receives a read request from the client, it first determines the consistency level specified in it, on the basis of which it determines the number of replicas that need to be contacted. When each replica responds with the requested data, it is then compared to determine the most recent version to be reported back. If the consistency level specified is a weaker one, then the latest revision is immediately reported back, and then out-of-date replicas are updated. If it is one of the stronger consistency levels, then first a read repair operation is performed, after which the data is reported back.

In case of a write operation, the consistency level is again used to determine the amount of nodes required to respond with an acknowledgement of success, before the whole operation is deemed to be successful. If the nodes required for consistency are not available, then mechanisms like ‘hinted handoff’ are used to ensure consistency whenever the node comes back online. The complete flow for a write operation involves components like the commit logs, Memtables, SSTables, etc. The commit logs are the first failover protection mechanism, where the operation is written so that the written data can be recovered in case of a failure. The memtables then act as an in-memory database, where all the data is updated until it is flushed to disk in the form of SSTables. Compaction is then periodically performed to assimilate data, so that it can be merged into a single file.

Related

Apache Cassandra The Crash-Proof Nosql Database (Part 2)

Other

Buffalo MiniStation Slim 500GB External Hard Drive

Icy Dock ICYRaid MB662U3-2S Dual-HDD Enclosure

Midrange SSD Mayhem: Samsung And Corsair Go At It Once Again (Part 2)

Midrange SSD Mayhem: Samsung And Corsair Go At It Once Again (Part 1)

OCZ Vertex 4 (256GB) - Hitting The SSD Sweet Spot

Samsung 840 Series Pro 256GB 2.5 Inch SATA Solid State Drive

SanDisk Ultra Plus 256GB - The Cheapest High-End SSD

Western Digital My Passport 2TB - The Ideal Companion For Anyone

Corsair Neutron GTX 240GB - A Fast Performing SSD

OCZ Vector 256GB - One Of The Dominant Names In SSD

Top 10

- Free Mobile And Desktop Apps For Accessing Restricted Websites

- MASERATI QUATTROPORTE; DIESEL : Lure of Italian limos

- TOYOTA CAMRY 2; 2.5 : Camry now more comely

- KIA SORENTO 2.2CRDi : Fuel-sipping slugger

- How To Setup, Password Protect & Encrypt Wireless Internet Connection

- Emulate And Run iPad Apps On Windows, Mac OS X & Linux With iPadian

- Backup & Restore Game Progress From Any Game With SaveGameProgress

- Generate A Facebook Timeline Cover Using A Free App

- New App for Women ‘Remix’ Offers Fashion Advice & Style Tips

- SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th

- Messages forwarded by Outlook rule go nowhere

- Create and Deploy Windows 7 Image

- How do I check to see if my exchange 2003 is an open relay? (not using a open relay tester tool online, but on the console)

- Creating and using an unencrypted cookie in ASP.NET

- Poor Performance on Sharepoint 2010 Server

- SBS 2008 ~ The e-mail alias already exists...

- Public to Private IP - DNS Changes

- Send Email from Winform application

- How to create a .mdb file from ms sql server database.......