Compaction in Apache Cassandra refers to the operation of merging multiple SSTables into a single new one. It mainly deals with the following.
- Merge keys
- Combine columns
- Discard tombstones
Compaction is done for two purposes.
- bound the number of SSTables to consult on reads. Cassandra’s write model allows multiple versions of a row exists in different SSTables. At read, different versions are read from different SSTables and merged into one. Compaction will reduce number of SStables to consult and therefore improve read performance.
- reclaim space taken by obsolete data in SSTable
After compaction, the old SSTables will be marked as obsolete. These SSTables are deleted asynchronously when JVM performs a GC, or when Cassandra restarts, whichever happens first. Cassandra may force a deletion if it detects disk space is running one. It is also possible to force a deletion from JConsole.
Size Tiered Compaction
When there’re N (default set to 4) similar-sized SStables for a column family, a compaction will occur to merge them into one. Suppose we have four SSTables of size X, a compaction will merge them into one SSTable of size Y. When we have four SSTables of size Y, a compaction will merge them into another SSTable of size Z. This process repeats itself.
There’re a few problems with this approach.
- Space. A compaction will create a new SSTable and both the new and old SSTables exist during the compaction. In the worst case (with no deletes and overwrites), a compaction will require 2 times of the on-disk space used for SSTables.
- Performance. There’s no guarantee of how many SSTables a row can exist.
This is introduced in Cassandra 1.0, based on leveldb (reference 4). This compaction strategy creates fixed-size SSTable (by default 5MB) and groups them into levels of different size capacities. Each level is 10 times as large as its previous level.
When a SSTable is created at L0, it is immediately compacted with SStables in L1. When L1 is filled up, extra SSTables generated in L1 are promoted to L2, and any subsequent SSTables appear at L1 will be compacted with SSTables at L2. Within a level, Leveled Compaction guarantees there’s no overlapping in SSTables, meaning a row can appear at most once on a single level. However, a row can appear on different levels.
Leveled Compaction resolves the concerns of size tiered compaction.
- Space. It only requires 10 times of the SSTable size for compaction.
- Performance. It guarantees that 90% of all reads will be satisfied from a single SSTable assuming row sizes are nearly uniform. Worst case is bounded by total number of levels. (For 10TB data, there’re 7 levels).
When to Use Which
Leveled compaction essentially treats more disk IO for a guarantee of how many SSTables a row may be spread. Below are the cases where Leveled Compaction can be a good option.
- Low latency read is required
- High read/write ratio
- Rows are frequently updated. If size-tiered compaction is used, a row will spread across multiple SSTables.
Below are the cases where size-tiered compaction can be a good option.
- Disk I/O is expensive
- Write heavy workloads.
- Rows are write once. If the rows are written once and then never updated, they will be contained in a single SSTable naturally. There’s no point to use the more I/O intensive leveled compaction.
1. Cassandra SSTable: http://wiki.apache.org/cassandra/MemtableSSTable
2. Compaction in Apache Cassandra: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
3. Cassandra, the Definitive Guide.
4. leveldb: http://code.google.com/p/leveldb/
5. When to use leveled compaction: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction