Cassandra has been one of the fastest growing NoSQL databases with strong adoption across a wide variety of use cases. Users value the screaming performance, high scalability, security and zero downtime of Cassandra. On the flip side, developers often express concerns about the complex data modeling process. In this article, we examine how Astra DB mitigates the traditional data modeling process to build applications using Apache Cassandra or DataStax Enterprise (DSE).
Data Modeling
The Cassandra Data Model Process
One of the most challenging aspects of using Cassandra is the need to understand the data modeling considerations. A typical application requires several days (or weeks!) of careful analysis to:
a) design all the required queries to power the application and
b) design the optimized data model for that set of queries.
Changes in the queries, additional new queries to support new functionality, etc. all require more modeling effort as the application evolves over time.
The Astra Data Model Process
For Astra, Datastax has taken a lot of effort to help new developers by standardizing on the open-source Stargate API. Astra DB provides out of the box access to REST API, GraphQL API, and JSON Document API endpoints.
The Document API in particular, allows the developers to model their data as JSON documents similar to other Document databases such as MongoDB. This frees the developers from having to deal with the Cassandra schema - partition keys, clustering columns, etc. This eliminates much of the angst of the data modeling process.
For immediate testing, the CQL console that is built into the Astra DB UI allows for rapid data loading and querying as soon as a database spins up. This offers developers the freedom to do what they need to do without having to learn the traditional Cassandra data modeling.
Indexing
In databases of any kind, indexing is heavily used to speed up queries. However, indexing in Cassandra is quite different from traditional databases. The primary index consisting of the partition key and clustering columns is often times not sufficient to meet the needs of growing applications.
Secondary Indexes
Historically, secondary indexes in Cassandra were never very effective. In particular, it does not work well for very low cardinality or high cardinality –read more here. Data modelers got around this by creating multiple tables for each query pattern, which leads to significant write amplification, data redundancy, and additional code complexity to manage the parallel writes.
One option for DSE customers is to create Solr Indexes for this purpose, but there are tradeoffs including increased write latency and write amplification.
Storage Attached Index
DataStax introduced Storage Attached Index (SAI) in 2020 which is a distributed, highly scalable index and is available on all platforms: Astra DB, Apache Cassandra and DataStax Enterprise 6.8.3. However, customers who are using an older version of DSE do not have access to SAI. SAIs are distributed index systems that add column-level indexes to any column and almost any Cassandra data type. They offer the following benefits.
Storage Attached Index Benefits
Better Performance: SAIs reduce the need for disk space storage, offer faster read and write times. SAI’s integrate with the storage engine of Cassandra, indexing both in-memory memtables and on-disk SSTables as they are written, thus resolving differences at read time.
No index rebuild: SAIs are fully compatible with a feature known as zero-copy streaming, a method used to quickly move data through the cluster. This means that as new nodes are bootstrapped or decommissioned, the indexes do not need to be rebuilt.
Query on any field: With SAIs, queries are no longer restricted to the partition and clustering keys of the table; any field can be used to query data. This feature allows for completely new methods to store and model data in Cassandra. Custom tables to cater to niche query patterns, and query driven data models no longer dictate how tables and data models are built.
Conclusion
Astra DB provides all the power and scalability that Cassandra is known for, but without the fuss and training requirements, changing the way people start developing on Cassandra. With Astra DB, you can pick and use one of several APIs, which results in a faster and more streamlined development lifecycle.
If you are considering a new project with Astra DB, or migrating an existing Cassandra/DataStax Enterprise application to Astra, we can help. Please contact us for more informatibubuildfoandra has been one of the fastest growing NoSQL databases with strong adoption across a wide variety of use cases. Users value the screaming performance, high scalability, security and zero downtime of Cassandra. On the flip side, developers often express concerns about the complex data modeling process. In this article, we examine how Astra DB mitigates the traditional data modeling requirements to building applications using Apache Cassandra or DataStax Enterprise (DSE).
Commenti