Data Definition
CQL stores data in tables, whose schema defines the layout of said
data in the table, and those tables are grouped in keyspaces. A
keyspace defines a number of options that applies to all the tables it
contains, most prominently of which is the
replication strategy <replication-strategy>
used by the keyspace. It
is generally encouraged to use one keyspace by application, and thus
many cluster may define only one keyspace.
This section describes the statements used to create, modify, and remove those keyspace and tables.
Common definitions
The names of the keyspaces and tables are defined by the following grammar:
keyspace_name: name table_name: keyspace_name '.' ] name name: unquoted_name | quoted_name unquoted_name: re('[a-zA-Z_0-9]\{1, 48}') quoted_name: '"' unquoted_name '"'
Both keyspace and table name should be comprised of only alphanumeric
characters, cannot be empty and are limited in size to 48 characters
(that limit exists mostly to avoid filenames (which may include the
keyspace and table name) to go over the limits of certain file systems).
By default, keyspace and table names are case insensitive (myTable
is
equivalent to mytable
) but case sensitivity can be forced by using
double-quotes ("myTable"
is different from mytable
).
Further, a table is always part of a keyspace and a table name can be
provided fully-qualified by the keyspace it is part of. If is is not
fully-qualified, the table is assumed to be in the current keyspace
(see USE statement
<use-statement>
).
Further, the valid names for columns is simply defined as:
column_name: identifier
We also define the notion of statement options for use in the following section:
options: option ( AND option )* option: identifier '=' ( identifier | constant | map_literal )
CREATE KEYSPACE
A keyspace is created using a CREATE KEYSPACE
statement:
create_keyspace_statement: CREATE KEYSPACE [ IF NOT EXISTS ] keyspace_name WITH options
For instance:
CREATE KEYSPACE excelsior
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};
CREATE KEYSPACE excalibur
WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1' : 1, 'DC2' : 3}
AND durable_writes = false;
Attempting to create a keyspace that already exists will return an error
unless the IF NOT EXISTS
option is used. If it is used, the statement
will be a no-op if the keyspace already exists.
The supported options
are:
name | kind | mandatory | default | description |
---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
The replication
property is mandatory and must at least contains the
'class'
sub-option which defines the
replication strategy <replication-strategy>
class to use. The rest of
the sub-options depends on what replication strategy is used. By
default, Cassandra support the following 'class'
:
SimpleStrategy
A simple strategy that defines a replication factor for data to be
spread across the entire cluster. This is generally not a wise choice
for production because it does not respect datacenter layouts and can
lead to wildly varying query latency. For a production ready strategy,
see NetworkTopologyStrategy
. SimpleStrategy
supports a single
mandatory argument:
sub-option | type | since | description |
---|---|---|---|
|
|
|
|
NetworkTopologyStrategy
A production ready replication strategy that allows to set the replication factor independently for each data-center. The rest of the sub-options are key-value pairs where a key is a data-center name and its value is the associated replication factor. Options:
sub-option | type | since | description |
---|---|---|---|
|
|
|
|
|
|
|
|
Note that when ALTER
ing keyspaces and supplying replication_factor
,
auto-expansion will only add new datacenters for safety, it will not
alter existing datacenters or remove any even if they are no longer in
the cluster. If you want to remove datacenters while still supplying
replication_factor
, explicitly zero out the datacenter you want to
have zero replicas.
An example of auto-expanding datacenters with two datacenters: DC1
and
DC2
:
CREATE KEYSPACE excalibur
WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor' : 3}
DESCRIBE KEYSPACE excalibur
CREATE KEYSPACE excalibur WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true;
An example of auto-expanding and overriding a datacenter:
CREATE KEYSPACE excalibur
WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor' : 3, 'DC2': 2}
DESCRIBE KEYSPACE excalibur
CREATE KEYSPACE excalibur WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '2'} AND durable_writes = true;
An example that excludes a datacenter while using replication_factor
:
CREATE KEYSPACE excalibur
WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor' : 3, 'DC2': 0} ;
DESCRIBE KEYSPACE excalibur
CREATE KEYSPACE excalibur WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = true;
If transient replication has been enabled, transient replicas can be
configured for both SimpleStrategy
and NetworkTopologyStrategy
by
defining replication factors in the format
'<total_replicas>/<transient_replicas>'
For instance, this keyspace will have 3 replicas in DC1, 1 of which is transient, and 5 replicas in DC2, 2 of which are transient:
CREATE KEYSPACE some_keysopace
WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1' : '3/1'', 'DC2' : '5/2'};
USE
The USE
statement allows to change the current keyspace (for the
connection on which it is executed). A number of objects in CQL are
bound to a keyspace (tables, user-defined types, functions, …) and the
current keyspace is the default keyspace used when those objects are
referred without a fully-qualified name (that is, without being prefixed
a keyspace name). A USE
statement simply takes the keyspace to use as
current as argument:
use_statement: USE keyspace_name
ALTER KEYSPACE
An ALTER KEYSPACE
statement allows to modify the options of a
keyspace:
alter_keyspace_statement: ALTER KEYSPACE keyspace_name WITH options
For instance:
ALTER KEYSPACE Excelsior
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 4};
The supported options are the same than for
creating a keyspace <create-keyspace-statement>
.
DROP KEYSPACE
Dropping a keyspace can be done using the DROP KEYSPACE
statement:
drop_keyspace_statement: DROP KEYSPACE [ IF EXISTS ] keyspace_name
For instance:
DROP KEYSPACE Excelsior;
Dropping a keyspace results in the immediate, irreversible removal of that keyspace, including all the tables, UTD and functions in it, and all the data contained in those tables.
If the keyspace does not exists, the statement will return an error,
unless IF EXISTS
is used in which case the operation is a no-op.
CREATE TABLE
Creating a new table uses the CREATE TABLE
statement:
create_table_statement: CREATE TABLE [ IF NOT EXISTS ] table_name : '(' : column_definition : ( ',' column_definition )* : primary_key ')' ] : ')' table_options ] column_definition: column_name cql_type [ STATIC ] [ PRIMARY KEY] primary_key: partition_key clustering_columns ] partition_key: column_name : | '(' column_name ( ',' column_name )* ')' clustering_columns: column_name ( ',' column_name )* table_options: COMPACT STORAGE table_options ] : | CLUSTERING ORDER BY '(' clustering_order ')' table_options ] : | options clustering_order: column_name (ASC | DESC) ( ',' column_name (ASC | DESC) )*
For instance:
CREATE TABLE monkeySpecies (
species text PRIMARY KEY,
common_name text,
population varint,
average_size int
) WITH comment='Important biological records';
CREATE TABLE timeline (
userid uuid,
posted_month int,
posted_time uuid,
body text,
posted_by text,
PRIMARY KEY (userid, posted_month, posted_time)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };
CREATE TABLE loads (
machine inet,
cpu int,
mtime timeuuid,
load float,
PRIMARY KEY ((machine, cpu), mtime)
) WITH CLUSTERING ORDER BY (mtime DESC);
A CQL table has a name and is composed of a set of rows. Creating a
table amounts to defining which columns
<column-definition>
the rows will be composed, which of those columns
compose the primary key <primary-key>
, as well as optional
options <create-table-options>
for the table.
Attempting to create an already existing table will return an error
unless the IF NOT EXISTS
directive is used. If it is used, the
statement will be a no-op if the table already exists.
Column definitions
Every rows in a CQL table has a set of predefined columns defined at the
time of the table creation (or added later using an
alter statement<alter-table-statement>
).
A column_definition
is primarily comprised of the name of the column
defined and it’s type <data-types>
, which restrict which values are
accepted for that column. Additionally, a column definition can have the
following modifiers:
STATIC
-
it declares the column as being a
static column <static-columns>
. PRIMARY KEY
-
it declares the column as being the sole component of the
primary key <primary-key>
of the table.
Static columns
Some columns can be declared as STATIC
in a table definition. A column
that is static will be “shared” by all the rows belonging to the same
partition (having the same partition key <partition-key>
). For
instance:
CREATE TABLE t (
pk int,
t int,
v text,
s text static,
PRIMARY KEY (pk, t)
);
INSERT INTO t (pk, t, v, s) VALUES (0, 0, 'val0', 'static0');
INSERT INTO t (pk, t, v, s) VALUES (0, 1, 'val1', 'static1');
SELECT * FROM t;
pk | t | v | s
----+---+--------+-----------
0 | 0 | 'val0' | 'static1'
0 | 1 | 'val1' | 'static1'
As can be seen, the s
value is the same (static1
) for both of the
row in the partition (the partition key in that example being pk
, both
rows are in that same partition): the 2nd insertion has overridden the
value for s
.
The use of static columns as the following restrictions:
-
tables with the
COMPACT STORAGE
option (see below) cannot use them. -
a table without clustering columns cannot have static columns (in a table without clustering columns, every partition has only one row, and so every column is inherently static).
-
only non
PRIMARY KEY
columns can be static.
The Primary key
Within a table, a row is uniquely identified by its PRIMARY KEY
, and
hence all table must define a PRIMARY KEY (and only one). A
PRIMARY KEY
definition is composed of one or more of the columns
defined in the table. Syntactically, the primary key is defined the
keywords PRIMARY KEY
followed by comma-separated list of the column
names composing it within parenthesis, but if the primary key has only
one column, one can alternatively follow that column definition by the
PRIMARY KEY
keywords. The order of the columns in the primary key
definition matter.
A CQL primary key is composed of 2 parts:
-
the
partition key <partition-key>
part. It is the first component of the primary key definition. It can be a single column or, using additional parenthesis, can be multiple columns. A table always have at least a partition key, the smallest possible table definition is:CREATE TABLE t (k text PRIMARY KEY);
-
the
clustering columns <clustering-columns>
. Those are the columns after the first component of the primary key definition, and the order of those columns define the clustering order.
Some example of primary key definition are:
-
PRIMARY KEY (a)
:a
is the partition key and there is no clustering columns. -
PRIMARY KEY (a, b, c)
:a
is the partition key andb
andc
are the clustering columns. -
PRIMARY KEY ((a, b), c)
:a
andb
compose the partition key (this is often called a composite partition key) andc
is the clustering column.
The partition key
Within a table, CQL defines the notion of a partition. A partition is simply the set of rows that share the same value for their partition key. Note that if the partition key is composed of multiple columns, then rows belong to the same partition only they have the same values for all those partition key column. So for instance, given the following table definition and content:
CREATE TABLE t (
a int,
b int,
c int,
d int,
PRIMARY KEY ((a, b), c, d)
);
SELECT * FROM t;
a | b | c | d
---+---+---+---
0 | 0 | 0 | 0 // row 1
0 | 0 | 1 | 1 // row 2
0 | 1 | 2 | 2 // row 3
0 | 1 | 3 | 3 // row 4
1 | 1 | 4 | 4 // row 5
row 1
and row 2
are in the same partition, row 3
and row 4
are
also in the same partition (but a different one) and row 5
is in yet
another partition.
Note that a table always has a partition key, and that if the table has
no clustering columns
<clustering-columns>
, then every partition of that table is only
comprised of a single row (since the primary key uniquely identifies
rows and the primary key is equal to the partition key if there is no
clustering columns).
The most important property of partition is that all the rows belonging to the same partition are guarantee to be stored on the same set of replica nodes. In other words, the partition key of a table defines which of the rows will be localized together in the Cluster, and it is thus important to choose your partition key wisely so that rows that needs to be fetch together are in the same partition (so that querying those rows together require contacting a minimum of nodes).
Please note however that there is a flip-side to this guarantee: as all rows sharing a partition key are guaranteed to be stored on the same set of replica node, a partition key that groups too much data can create a hotspot.
Another useful property of a partition is that when writing data, all the updates belonging to a single partition are done atomically and in isolation, which is not the case across partitions.
The proper choice of the partition key and clustering columns for a table is probably one of the most important aspect of data modeling in Cassandra, and it largely impact which queries can be performed, and how efficiently they are.
The clustering columns
The clustering columns of a table defines the clustering order for the
partition of that table. For a given partition <partition-key>
, all
the rows are physically ordered inside Cassandra by that clustering
order. For instance, given:
CREATE TABLE t (
a int,
b int,
c int,
PRIMARY KEY (a, b, c)
);
SELECT * FROM t;
a | b | c
---+---+---
0 | 0 | 4 // row 1
0 | 1 | 9 // row 2
0 | 2 | 2 // row 3
0 | 3 | 3 // row 4
then the rows (which all belong to the same partition) are all stored
internally in the order of the values of their b
column (the order
they are displayed above). So where the partition key of the table
allows to group rows on the same replica set, the clustering columns
controls how those rows are stored on the replica. That sorting allows
the retrieval of a range of rows within a partition (for instance, in
the example above, SELECT * FROM t WHERE a = 0 AND b > 1 and b ⇐ 3
)
to be very efficient.
Table options
A CQL table has a number of options that can be set at creation (and,
for most of them, altered
<alter-table-statement>
later). These options are specified after the
WITH
keyword.
Amongst those options, two important ones cannot be changed after
creation and influence which queries can be done against the table: the
COMPACT STORAGE
option and the CLUSTERING ORDER
option. Those, as
well as the other options of a table are described in the following
sections.
Compact tables
Warning
Since Cassandra 3.0, compact tables have the exact same layout
internally than non compact ones (for the same schema obviously), and
declaring a table compact only creates artificial limitations on the
table definition and usage. It only exists for historical reason and is
preserved for backward compatibility And as
==== Reversing the clustering order The clustering order of a table is defined by the
The Note that this option is basically a hint for the storage engine to change the order in which it stores the row but it has 3 visible consequences: # it limits which ==== Other table options review (misses cdc if nothing else) and link to proper categories when appropriate (compaction for instance) A table supports the following options:
===== Speculative retry options By default, Cassandra read coordinators only query as many replicas as
necessary to satisfy consistency levels: one for consistency level
Pre-4.0 speculative Retry Policy takes a single string as a parameter,
this can be Examples of setting speculative retry are:
Or,
The problem with these settings is when a single host goes into an
unavailable state this drags up the percentiles. This means if we are
set to use In 4.0 the values (case-insensitive) discussed in the following table are supported:
As of version 4.0 speculative retry allows more friendly params
(CASSANDRA-13876).
The
The text component is case insensitive and for Some examples:
Two values of the same kind cannot be specified such as
Note that frequently reading from additional replicas can hurt cluster
performance. When in doubt, keep the default
===== Compaction options The All default strategies support a number of
===== Compression options The
For instance, to create a table with LZ4Compressor and a chunk_lenth_in_kb of 4KB:
===== Caching options Caching optimizes the use of cache memory of a table. The cached data is
weighed by size and access frequency. The
For instance, to create a table with both a key cache and 10 rows per partition:
===== Read Repair options The
The available read repair settings are: ==== Blocking The default setting. When ==== None When ===== Other considerations:
== ALTER TABLE Altering an existing table uses the alter_table_statement: ALTER TABLE table_name alter_table_instruction alter_table_instruction: ADD column_name cql_type ( ',' column_name cql_type )* : | DROP column_name ( column_name )* : | WITH options For instance:
The
|
Dropping a column assumes that the timestamps used for the value of this column are "real" timestamp in microseconds. Using "real" timestamps in microseconds is the default is and is strongly recommended but as Cassandra allows the client to provide any timestamp on any table it is theoretically possible to use another convention. Please be aware that if you do so, dropping a column will not work correctly.
Once a column is dropped, it is allowed to re-add a column with the same name than the dropped one unless the type of the dropped column was a (non-frozen) column (due to an internal technical limitation). ==== == DROP TABLE
Dropping a table uses the DROP TABLE
statement:
drop_table_statement: DROP TABLE [ IF EXISTS ] table_name
Dropping a table results in the immediate, irreversible removal of the table, including all data it contains.
If the table does not exist, the statement will return an error, unless
IF EXISTS
is used in which case the operation is a no-op.
TRUNCATE
A table can be truncated using the TRUNCATE
statement:
truncate_statement: TRUNCATE [ TABLE ] table_name
Note that TRUNCATE TABLE foo
is allowed for consistency with other DDL
statements but tables are the only object that can be truncated
currently and so the TABLE
keyword can be omitted.
Truncating a table permanently removes all existing data from the table, but without removing the table itself.