Exploring Azure Cosmos DB — Part I

5 min readOct 2, 2022

Before getting started with Azure Cosmos DB, lets see what NoSQL database is when compared to an SQL database, which will help us understand more on why and when to use cosmos DB.

Introduction to Cosmos DB

Cosmos DB belongs to the NoSQL database category. It is a Paas (Platform as a service) which is a fully managed, globally distributed, multi-model NoSQL database service provided by Microsoft Azure. It provides extremely low latency, high availability, and consistency and supports multi-region (Geo) replication too.

Azure Cosmos DB provides support for 5 database APIs with SDKs for many programming languages and platforms. The API we choose to interact with the Cosmos DB service depends on the data model that we intend to use with our database. The following table summarizes the five APIs offered for Azure Cosmos DB based on the four NoSQL data models:

5 APIs of Cosmos DB

SQL API : This supports JSON documents and SQL-based queries.

2. Mongo DB API : Supports Mongo DB APIs and JSON documents.

3. Gremlin API : Supports Gremlin API with graph-based nodes and edge data representations.

4. Cassandra API: Supports Casandra API for wide column data representations.

5. Table API — Supports Table API for Table Storage with premium capabilities.

Although support for all endpoints is there we should carefully review the use-cases before deciding on the API to avoid possible future blockers. There are limitations in functions available for MongoDB, Cassandra and Gremlin APIs. The most recommended one is the SQL API which has a maintained and active Github repo.

Cosmos DB resource model

We can classify the Cosmos DB resource model into 4 Pillars: Account, Database, Container and Item.

The following diagram shows the generalized hierarchy of the Cosmos DB:

1st Layer-Account: The foundation of every cosmos DB is an ‘account’ which represents a set of configurations and using them we can perform administration tasks, such as:

Global data replication
Consistency levels
Backup strategy
Private endpoints
Firewall rules

An account can have as many databases created as you want.

2nd layer-Database: Its like a light logical “box” encapsulating data access and capacity allocation. These databases can be either of type key-value, column, document or graph which are the 4 NoSQL database types explained above. Moreover, each database can have a set of containers.

3rd layer-Container: Container is where the data is stored. It is a schema-agnostic container of entities, stored procedures, user-defined functions (UDFs) and triggers. Depending on the Cosmos API selected, a container is realized as table, collection or graph. Few main features of a container are;

Containers contain consolidated JSON documents and associated JavaScript logic for writing store procedures, triggers, user defined functions.
Ability for either AutoScale or manual scaling where we can mention the throughput(Request Unit or RU) request unit.
‘Time to live’ option which helps to automatic delete items from collections.

4th layer-Item: Cosmos item can be as a document in a collection, row in a table, node, or edge in a graph. Basically, items comprises of the data content inside a container.

What is Azure Cosmos DB partition key?

Partition key is a very crucial decision that you need to take when creating the DB containers which directly affects the performance and latency of queries.

In terms of writing efficient queries, Cosmos DB allows you to group a set of items in your collection by a similar property determined by the partition key. Partition keys are the core element to distributing your data efficiently into different logical and physical sets so that the queries performed against the database are completed fast.

Since you cannot change the partition key once the container is created, it is required to choose the partition key during the design phase of the application architecture.

Physical partitioning:

Cosmos DB is designed to scale horizontally based on the distribution of data between physical partitions. These partitions are self-sufficient nodes which can be separately deployed and are synchronized and coordinated by a central gateway.

Logical partitioning:

Logical partitions are a set of documents which share the same characteristic (partition key) and they all are usually persisted fully in the same physical partition.

How to create an Azure Cosmos DB

You can refer the below step-by-step guide to create a cosmos DB, write a sample JSON data and run a simple query in the cosmos DB data explorer.

Quickstart - Create Azure Cosmos DB resources from the Azure portal

APPLIES TO: SQL API Azure Cosmos DB is Microsoft's globally distributed multi-model database service. You can use Azure…

learn.microsoft.com

You have reached the end of this blog-post. I hope you got a basic understanding of the core concepts in Cosmos DB and why and when to use it in real world applications. Hence as the next step of this blog series, I will guide you on how to build a simple java project to write bulk document data to Cosmos DB and query from those persisted DB data. Stay tuned ..!

References

Choose an API in Azure Cosmos DB

APPLIES TO: SQL API Cassandra API Gremlin API Table API Azure Cosmos DB API for MongoDB Azure Cosmos DB is a fully…

learn.microsoft.com

Quickstart - Create Azure Cosmos DB resources from the Azure portal

APPLIES TO: SQL API Azure Cosmos DB is Microsoft's globally distributed multi-model database service. You can use Azure…

learn.microsoft.com

An Introduction and Tutorial for Azure Cosmos DB

Azure Cosmos DB is a globally distributed, JSON-based database delivered as a 'Platform as a Service' (PaaS) in…

www.infoq.com

Azure Cosmos DB Partitions & Partition Keys Simplified

Performance and speed are crucial to the processing of any application running heavier or light workloads. The most…

parveensingh.com

Exploring Azure Cosmos DB — Part I

Introduction to Cosmos DB

5 APIs of Cosmos DB

Cosmos DB resource model

What is Azure Cosmos DB partition key?

How to create an Azure Cosmos DB

Quickstart - Create Azure Cosmos DB resources from the Azure portal

APPLIES TO: SQL API Azure Cosmos DB is Microsoft's globally distributed multi-model database service. You can use Azure…

References

Choose an API in Azure Cosmos DB

APPLIES TO: SQL API Cassandra API Gremlin API Table API Azure Cosmos DB API for MongoDB Azure Cosmos DB is a fully…

Quickstart - Create Azure Cosmos DB resources from the Azure portal

APPLIES TO: SQL API Azure Cosmos DB is Microsoft's globally distributed multi-model database service. You can use Azure…

An Introduction and Tutorial for Azure Cosmos DB

Azure Cosmos DB is a globally distributed, JSON-based database delivered as a 'Platform as a Service' (PaaS) in…

Azure Cosmos DB Partitions & Partition Keys Simplified

Performance and speed are crucial to the processing of any application running heavier or light workloads. The most…

Written by Yasara Yasawardhana

No responses yet