Hash-sharded Indexes

On this page Carat arrow pointing down

If you are working with a table that must be indexed on sequential keys, you should use hash-sharded indexes. Hash-sharded indexes distribute sequential traffic uniformly across ranges, eliminating single-range hot spots and improving write performance on sequentially-keyed indexes at a small cost to read performance.

Warning:

This is an experimental feature. The interface and output are subject to change.

How hash-sharded indexes work

CockroachDB automatically splits ranges of data in the key-value store based on the size of the range, and on the load streaming to the range. To split a range based on load, the system looks for a point in the range that evenly divides incoming traffic. If the range is indexed on a column of data that is sequential in nature (e.g., an ordered sequence, or a series of increasing, non-repeating TIMESTAMPs), then all incoming writes to the range will be the last (or first) item in the index and appended to the end of the range. As a result, the system cannot find a point in the range that evenly divides the traffic, and the range cannot benefit from load-based splitting, creating a hot spot at the single range.

Hash-sharded indexes solve this problem by distributing sequential data across multiple nodes within your cluster, eliminating hotspots. The trade-off to this, however, is a small performance impact on reading sequential data or ranges of data, as it's not guaranteed that sequentially close values will be on the same node.

Hash-sharded indexes create a physical STORED computed column, known as a shard column. CockroachDB uses this shard column, as opposed to the sequential column in the index, to control the distribution of values across the index. The shard column is hidden by default but can be seen with SHOW COLUMNS.

Warning:

When dropping a hash-sharded index, the shard column will also be dropped. This will require a rewrite of the table.

For details about the mechanics and performance improvements of hash-sharded indexes in CockroachDB, see our Hash Sharded Indexes Unlock Linear Scaling for Sequential Workloads blog post.

Create a hash-sharded index

To create a hash-sharded index, set the experimental_enable_hash_sharded_indexes session variable to on. Then, add the optional USING HASH WITH BUCKET_COUNT = n_buckets clause to a CREATE INDEX statement, to an INDEX definition in a CREATE TABLE statement, or to an ALTER PRIMARY KEY statement. When this clause is used, CockroachDB creates n_buckets computed columns, shards the index into n_buckets shards, and then stores each index shard in the underlying key-value store with one of the computed column's hash as its prefix.

To change the bucket size of an existing hash-sharded primary key index, use an ALTER PRIMARY KEY statement with a USING HASH WITH BUCKET_COUNT = n_buckets clause that specifies the new bucket size and the existing primary key columns.

Examples

For an example of a hash-sharded index, see Create a hash-sharded secondary index.

See also


Yes No
On this page

Yes No