Exploring Azure Cosmos DB — Part I
Before getting started with Azure Cosmos DB, lets see what NoSQL database is when compared to an SQL database, which will help us understand more on why and when to use cosmos DB.
Introduction to Cosmos DB
Cosmos DB belongs to the NoSQL database category. It is a Paas (Platform as a service) which is a fully managed, globally distributed, multi-model NoSQL database service provided by Microsoft Azure. It provides extremely low latency, high availability, and consistency and supports multi-region (Geo) replication too.
Azure Cosmos DB provides support for 5 database APIs with SDKs for many programming languages and platforms. The API we choose to interact with the Cosmos DB service depends on the data model that we intend to use with our database. The following table summarizes the five APIs offered for Azure Cosmos DB based on the four NoSQL data models:
5 APIs of Cosmos DB
- SQL API : This supports JSON documents and SQL-based queries.
2. Mongo DB API : Supports Mongo DB APIs and JSON documents.
3. Gremlin API : Supports Gremlin API with graph-based nodes and edge data representations.
4. Cassandra API: Supports Casandra API for wide column data representations.
5. Table API — Supports Table API for Table Storage with premium capabilities.
Although support for all endpoints is there we should carefully review the use-cases before deciding on the API to avoid possible future blockers. There are limitations in functions available for MongoDB, Cassandra and Gremlin APIs. The most recommended one is the SQL API which has a maintained and active Github repo.
Cosmos DB resource model
We can classify the Cosmos DB resource model into 4 Pillars: Account, Database, Container and Item.
The following diagram shows the generalized hierarchy of the Cosmos DB:
1st Layer-Account: The foundation of every cosmos DB is an ‘account’ which represents a set of configurations and using them we can perform administration tasks, such as:
- Global data replication
- Consistency levels
- Backup strategy
- Private endpoints
- Firewall rules
An account can have as many databases created as you want.
2nd layer-Database: Its like a light logical “box” encapsulating data access and capacity allocation. These databases can be either of type key-value, column, document or graph which are the 4 NoSQL database types explained above. Moreover, each database can have a set of containers.
3rd layer-Container: Container is where the data is stored. It is a schema-agnostic container of entities, stored procedures, user-defined functions (UDFs) and triggers. Depending on the Cosmos API selected, a container is realized as table, collection or graph. Few main features of a container are;
- Containers contain consolidated JSON documents and associated JavaScript logic for writing store procedures, triggers, user defined functions.
- Ability for either AutoScale or manual scaling where we can mention the throughput(Request Unit or RU) request unit.
- ‘Time to live’ option which helps to automatic delete items from collections.
4th layer-Item: Cosmos item can be as a document in a collection, row in a table, node, or edge in a graph. Basically, items comprises of the data content inside a container.
What is Azure Cosmos DB partition key?
Partition key is a very crucial decision that you need to take when creating the DB containers which directly affects the performance and latency of queries.
In terms of writing efficient queries, Cosmos DB allows you to group a set of items in your collection by a similar property determined by the partition key. Partition keys are the core element to distributing your data efficiently into different logical and physical sets so that the queries performed against the database are completed fast.
Since you cannot change the partition key once the container is created, it is required to choose the partition key during the design phase of the application architecture.
Physical partitioning:
Cosmos DB is designed to scale horizontally based on the distribution of data between physical partitions. These partitions are self-sufficient nodes which can be separately deployed and are synchronized and coordinated by a central gateway.
Logical partitioning:
Logical partitions are a set of documents which share the same characteristic (partition key) and they all are usually persisted fully in the same physical partition.
How to create an Azure Cosmos DB
You can refer the below step-by-step guide to create a cosmos DB, write a sample JSON data and run a simple query in the cosmos DB data explorer.
You have reached the end of this blog-post. I hope you got a basic understanding of the core concepts in Cosmos DB and why and when to use it in real world applications. Hence as the next step of this blog series, I will guide you on how to build a simple java project to write bulk document data to Cosmos DB and query from those persisted DB data. Stay tuned ..!