System Design Tutorial

Coming soon

Kapil Gupta — Sun, 16 Oct 2022 16:09:44 GMT

This is System Design Tutorial, a newsletter about Learn System Design.

Design Dropbox or Google Drive

Kapil Gupta — Wed, 30 Mar 2022 12:01:19 GMT

In this Article, we will learn "How to Design a Cloud Storage Service". This will help you understand how such services work behind the scenes and help you answer system design questions like "How to Design Google Drive", "How to Design Dropbox" or "How to Design One Drive"

Cloud Storage Services have become very popular as they allows you to access your data anywhere and anytime you want it, on multiple devices without manually transfering data or maintaining disks.

Challenges

Heavy Write Volume - Read to Write ration roughly 1: 1.
ACIDity requirement - Strong ACID compliance is required, If we delete a file in a folder and then and then share with other people, it should not be the case that the folder is shared first and people are able to see the deleted file for some time, before it is vanished.
Data Deduplication - Files should be deduplicated, to prevent multiple copies of the same file on the server. To prevent data deduplication, dropbox like services breaks the files in chunks of 4MB and then maintain unique identifier for each chunk.
Delta Upload - Only the part of files that have changed should be synched on remote servers, For this client application should be intelligent enough to identify the chunks that have changed.

Requirement Gathering

Features of Dropbox, Google Drive and One Drive

Upload, Download, Update and Delete of all files and folders in the workspace.
Automatic Versioning and Syncronize data among different clients - Desktop, Mobile, Web etc.
Ability to share files and folders with other people.
Support Offline operation such as adding/deleting/updating files and folders in workspace, these changes should be synced to server and other clients once you are online.
Security and Permissions - Data should be secured using encryption and should be visible to only those people who have the permission to view the files.

Above are the functional Requirements of the Dropbox service, Here are some non functional requirements

High Availibility - Service should be always up to allow users to read their data any time.

Estimations

According to the stats Dropbox has around 600M users, out of which we can assume that 100M users are active daily

On average if user adds/modifies close to 100 files, we will have a total of 10B uploads daily.

Storage Required: 10B * 1MB = 10PB

High Level Design

For uploading and synchronizing your files to Dropbox, you provide a folder that acts as the workspace, any files added, modified or deleted in workspace will be synchronized to Dropbox server and other Desktop and mobile devices.

Detailed Design

Client Application

Client Application is responsible to monitor the changes in the workspace, It interacts with the synchronization service to process metadata updates like change in file name, or contents. It is also responsible to index the file, and send the updated chunks to the cloud storage and retreiving the same in case other clients have updated the file.

Major Components of Client Application :

Watcher : responsible to monitor and synchronize the files and folders for any Create, Update or Delete operation and then inform the Indexer component to handle the changes.
Chunker : job of a chunker is to split the files and the incremental changes into chunks of a suitable size ~4MB. These chunks can be later joined in the same order to reconstruct the original file. This component can intelligently sense the changes that are done on the file and transmit only those parts to the storage server.
Indexer : responsible to listen to the Watcher and maintains the information about the chunks of files. Indexer also syncs this data with the Metadata storage server using syncronization service on successful storage of chunks in cloud storage.
Internal Database - keeps maintains a record of the chunks and associated metadata. It allows for the offline operation when the client is not connected to the Dropbox server.

Metadata Storage

Responsible for maintaining the metadata of the files stored in the Cloud storage. It includes information like workspace, versioning, information regarding chunks, users and workspaces. This information can be stored in a SQL database owing to the strong ACID requirements for this data. Two users who are a part of same workspace should see a consistent view of the files and folders. The Synchronization service will use this metadata to syncronized data among different workspaces across multiple clients.

Synchornization Service

One of the critical component of Cloud Storage design. It process all the updates made by the client on the file and syncronizes those updates across all the subscribers. It updates the client local database to be in sync with the Metadata storage on server. All Dropbox clients including Desktop, Mobile and Web clients talk to syncronization service to get updates from the server or push updates to the server. This way, all clients are in sync with the master copy that is stored in the Dropbox cloud. When the client is offline, all updates are stored locally and when the client becomes online, the syncronization service syncs the data to Metadata storage and the same is subsequently pushed to other clients or shared workspace users. It is also possible that two clients have made changes to the same file offline, so it should handle such conflicts. Dropbox handles such scenarios by creating a Conflicted copy and saving it with the editor’s username, and the save date. Users will be required to manually resolve that conflict.

Cloud Chunk Storage

All the chunks of the files are uploaded to the cloud Storage, and their location is maintained in the metadata. To reduce load on the dropbox servers the client directly talks to the cloud storage to retreive users data. A good option for the cloud storage is the the Amazon Simple Storage Service (S3).

Take a file, divide into 4MB blocks, based upon hash they are mapped to the same object in the storage.
block level deduplication.

server_file_journal - logs of changes happended on the file.

PreProcessing :
Since we are dealing with large files, It is always a good idea to divide the data into chunks to allow upload and download in parallel, this also allows partial uploads/downloads in case of loss of network connectivity and provides efficient utilization of Bandwidht and storage space as we will see later.
Before upload, a large file is broken into multiple chunks. Each chunk has a metadata that will allow us to recreate the file later, Hence we need to store the name (hash of the chunk content), ordering information, size, last modification date etc.

Synchronization : Our client application will be installed on Desktop and mobile devices. Client will monitor the folders within the workspace and synchronize the data by interacting with the Synchronization service using file metadata.

Lets assume we have file upload client installed on computer/mobile
Desktop Client Application monitors the folders that are identified as workspace or sync folders and synchronizes them with the remote Cloud Storage. T
• Watcher monitors the sync folders and notifies the Indexer of any action performed by the user for example when user create, delete, or update files or folders.
• Chunker splits the files into smaller pieces called chunks. To reconstruct a file, chunks will be joined back together in the correct order. A chunking algorithm can detect the parts of the files that have been modified by user and only transfer those parts to the Cloud Storage, saving on cloud storage space, bandwidth usage, and synchronization time.
• Indexer processes the events received from the Watcher and updates the internal database with information about the chunks of the modified files. Once the chunks are successfully submitted to the Cloud Storage, the Indexer will communicate with the Synchronization Service using the Message Queuing Service to update the Metadata Database with the changes.
• Internal Database keeps track of the chunks, files, their versions, and their location in the file system

Metadata Service

Metadata service will be responsible for maintaining the metadata regarding the chunks of files included
Here we need to maintain strong data consistency to reliably maintain the files for the user.

Design Twitter

Kapil Gupta — Wed, 30 Mar 2022 11:59:42 GMT

Twitter is one of the largest social networks where users can post and read tweets ( short text messages with 140 character limit) along with photos and video support.
Users of the Twitter service can also follow their family, friends, or celebrities to get the latest updates about them.
In Twitter, there are two types of timeline features,
User Timeline is a list of all tweets that the user has tweeted.

Home Timeline or News Feed -> is the temporal merge of the user timelines that you follow with certain business rules around it. (complex logic and multiple signals used for this and settings like don’t see the retweets from a particular user etc.)

Requirement Gathering

Publish Tweets: The user should be able to tweet and see the news feed on his/her home page
News Feed: The user should be able to see tweets/posts from the users that she follows.
Relevance: The home timeline/newsfeed should be relevant and sorted in reverse chronological order.
Follow/Unfollow: The user can follow/unfollow other users
Notifications: The user should be notified about the latest events that are relevant to him.
Search: The user can search for tweets, hashtags, people.

Apart from the basic features above, we can support the following requirements.

Trending Topics
Retweet
Comments
HashTag support for posting and searching for tweets.
Recommendations.

Above are the functional requirements of the Twitter Design, Here are some non-functional requirements

High Availability - Service should be always up to allow users to post and read the tweets.
Latency - Users of a service like Twitter rely on it for quick updates on what is happening in the world, hence the goal is to minimize the latency. It is unacceptable for Twitter for messages to reach the followers in more than 5 sec time frame.
Eventual Consistency - We are ok if a particular tweet is available to one user before another.

Challenges

Read Heavy System: In the case of Twitter, we have a read-heavy system, since the number of reads(timeline requests) is much more than the number of writes(post tweets). We need to keep the latency as minimum as possible so that the updates reach the followers in time.

Tip: Whenever we are designing a read-heavy system, we need to make efficient use of caching and precomputing, which will help us reduce the latency to a minimum.

Estimations

Total Number of Users on platform -> 1 Billion
Total Number of Daily Active Users -> 100 Million
Total Number of New Tweets / Day -> 150 Million
Total Number of Follows per user -> 200
Total Number of Timelines generated per day -> 100 Million * 5
Total Number of Tweets viewed per day -> 100 Million * 5 * 20 tweets = 10 Billion

Storage:

Storage required to store 1 tweet -> 140 char * 2 Bytes / char + 20 Bytes for Metadata -> 300 Bytes
Storage required for storing tweets per day -> 300*150 Million = 45000 Million Bytes = 45 GB
Avg Storage required for one photo -> 1MB
Avg Storage required for one video -> 15MB

Assuming 1/10 of tweets contains one photo and video
Storage required for photo and video -> (1 + 15 ) MB * 150 Million /10 => 240 Million MB -> 240 TB/day

Bandwidth Estimates

Total ingress -> 240 TB/day -> 2.9GB/sec
Total egress -> 10 Billion * 300 Byte + 10 Billion /10 * 1 MB + 10 Billion / 10 * 15 MB ~= 185 GB/sec

API's

Our service can expose the following REST API for posting a tweet and getting the news feed

POST /users/{userId}/tweet
{
	“api_key” : "test_key",
	
	“user_id” : ‘2244994945’”,
“tweet_text”: “My First Tweet”,
	"entities": {
       "media": [
          {
          "id_str": "1494379920126095365",
          "media_url": "https://pbs.twimg.com/media/FL0anqqXsAU-KDH.jpg",
          "type": "photo"
        }
      ]
    }
}

api_key(String): The API key for the client making the API request, API Key is used for checking the access, rate-limiting the requests, and for analytics.
tweet_text(String) : 140 character tweet.
user_id(String) : unique identifier of the user.
media (Object): media related to the tweet.

Response: API will provide the URL of the created tweet if a JSON response with appropriate status

HTTP OK 200
{
      "user_id": "2244994945",
      "created_at": "2020-02-14T19:00:55.000Z",
      "id": "1494379925427662851",
      "tweet_text": "My First Tweet",
     "entities": {
       "media": [
          {
          "id_str": "1494379920126095365",
          "media_url": "https://pbs.twimg.com/media/FL0anqqXsAU-KDH.jpg",
          "type": "photo"
        }
      ]
    }
}

In case of error is appropriate, an HTTP status code will be returned.

GET /users/{userId}/timeline

Response: API will return the timeline of the user.

HTTP OK 200

{
	“tweets” : [
	  "1494379925427662851": {
	    "id_str": "1494379925427662851",
	    "tweet_text": "This counts as an open source contribution, right?? 😂😂😝 https://t.co/5alFRLFYvR",
	    "entities": {
	      "media": [
	        {
	          "id_str": "1494379920126095365",
	          "media_url": "https://pbs.twimg.com/media/FL0anqqXsAU-KDH.jpg",
	          "url": "https://t.co/5alFRLFYvR",
	          "display_url": "pic.twitter.com/5alFRLFYvR",
	          "expanded_url": "https://twitter.com/EddyVinckk/status/1494379925427662851/photo/1",
	          "type": "photo"
	        }
	      ]
	    },
	    "source": "Twitter for iPhone",
	    "user_id_str": "633958151",
	    "retweet_count": 17,
	    "favorite_count": 236,
	    "reply_count": 15,
	    "quote_count": 5,
	    "conversation_id_str": "1494379925427662851",
	    "lang": "en"
	  }
	, .....
	]
}

Database Schema

We need to store the user data, tweet data, user_follower, etc, Here is the representation of the data.

High-level design

As It is evident from the estimations that we need to be able to store around
1700 tweets per second and read around 115k tweets per second. This is a very read-heavy system. Also in case of any major events, popular celebrity tweets, there can be very spiky traffic, hence we need to scale our system accordingly.

The figure shows the basic high-level design for the Twitter service, Requests land on the load balancers which distribute the traffic to the application servers.
Media files like photos and videos are uploaded to a separate media server and its metadata will be stored with the tweet.

At a high level, we will need the following components in our Newsfeed service:

Web servers: receive the request for publishing the post via the REST API. The web servers check for valid authentication and authorization before allowing users to post. Besides this, there can be additional validations for example character limit, rate limiting, and anti-abuse check.
Tweet service: Tweets coming in are processed by the Tweet service. Tweets are written in a NoSQL DB like Cassandra.
Media Service: It is responsible for storing the media like photos and videos associated with the tweets. Media files are uploaded separately and then linked to the tweet.
Newsfeed generation service: It is responsible for creating the relevant news feed for all the active users. The feed will constantly keep on updating and new feed items will be reflected in the news feed of the user.
Notification service: It is responsible to notify the users about the updates from the people the user follows.
Analytics service: It is used to analyze tweets and generate trending items.

Home Timeline/ Newsfeed Generation:

Home Timeline or the newsfeed is generated from the tweets of people that the user follows.
To generate the timeline we need to follow the below steps.
Query the social graph to get the list of users (followee), the current user follows.
Get User timelines for every followee.
Rank the tweets from this timeline based on various ranking parameters/signals in reverse chronological order.
Store the feed-in cache and return the first page of tweets.

There are two approaches to generate the news feed:
Online Generation: We generate the news feed when the user comes online and loads the home page.

Advantages: Simpler to implement

Disadvantages: This approach is not scalable as news feed generation is an expensive process. Also as new tweets constantly come in, we need a mechanism to rank and add new posts to the feed.

We will need to constantly check for updates of all the followers even though they might not have posted any tweets.

Offline Generation: instead of generating the news feed online when the user logs in, we can generate it beforehand offline, cache it and serve it to the user as soon as he logs in.
We can choose to keep the 200 latest tweets for each user, allowing the user to scroll through 10 pages of tweets on his homepage.

Advantages : Fast: Users need not wait for news feed generation.
Scalable: Since the system is not loaded when the user comes online, this approach is more scalable. Also, we need not store the news feed for all the users in the cache. We can store the timelines of the users that were active in the last 15 or 30 days. As we scale our product, we can add intelligent systems that can generate the news feed by predicting the login patterns of the user.

Disadvantage : Think about the scenario where a celebrity with millions of followers tweets, this can lead to the regeneration of millions of timelines and can put a heavy load on the system.

Fanout

If someone tweets, the timeline of all the followers is affected. Only if the users are active users.

Fanout is the process of delivering the tweet of a user to all of his followers. Two types of fanout models are:
fanout on writing (also called push model) and fanout on reading (also called pull model). Both
models have pros and cons. We explain their workflows and explore the best approach to
support our system.

Push Based (Fanout on write) - When one of the followers has published a post, we can push the post to all of his followers. There is a persistent connection(Use sockets for push, at any given point there are millions of such sockets getting data from the push cluster at twitter.) between the server and the client over which feed is pushed. There is a lot more processing on the Write ingest to figure out where to route a particular tweet.

Issues with this approach: As discussed in the feed generation part, for a celebrity with millions of followers tweets, it can put a heavy load on the system.

Pull Based (Fanout on reading) - When the user comes online, the client (web or mobile app) will pull the data after certain intervals or when the user refreshes the page.
Issues with this approach: New data will be shown only after a pull request is issued. and some pull might result in no data.

Hybrid Approach - Use a combination of the Pull and Push based approach. Wherein we push data for the users with a limited number of followers, we can limit that too only for the currently online users. For the celebrity user, the client will pull the updates.

The above image shows a detailed overview of the feed publishing mechanism.

References
https://www.infoq.com/presentations/Twitter-Timeline-Scalability/

Design Notification System

Kapil Gupta — Wed, 30 Mar 2022 11:58:13 GMT

A notification system is a system to send notification alerts to the users. Notification Alert is a piece of information used to provide some updates to the user for example - updates about the ongoing order, new offering, meeting reminder, bill due alert, OTP notification, etc.

Notification systems are an integral part of any mobile or web application and act as a channel to communicate with users. Notification can be of multiple types, Mobile push notification, Email Notification, SMS, Whatsapp chat. Let us call them different notification channels

Functional Requirements

Notification Types: Push(Desktop, IOS, and Android), SMS, and email.

Notification Preferences: The system should respect User preferences or settings. For example, a user who has opted out of a particular type of notification should no longer receive the notifications.

Notification Scale: Our notification system should be able to handle a scale of close to 10 Million notifications per day.

Notification Prioritization - certain messages like OTP are higher priority, whereas promotional messages are lower priority

Non Functional Requirements

Always Available: Since we are using the Notification System to send critical information to the customer, the system should be always available.

Latency: Notifications should be delivered in almost real-time. Slight delays are acceptable when the system is under peak workload.

Pluggable: Our notification system should allow additional channels without much effort and redesign.

Scalable: the system should be scalable to cater to increased loads without much increase in latency.

Client Services: The services on the left represent different services that want to send notification messages via our Notification service. It can be an auth service, trying to send OTP over SMS or email channel, or it could be an order service trying to send order updates.
Application Servers: expose the API‘s which client services can use to initiate notifications
Validate incoming messages for valid emails, phone numbers, etc.
Verify user settings if the user has opted out of such notifications.
Fetch other data required to send the notification for example notification templates, device info, etc.
Push message to the appropriate message queue for processing.
Cache: used for caching the user settings, device info, and notification templates to prevent frequent DB trips.
DB: to get info in case of a cache miss.

Message queues: Asynchronous way of handling the messages, messages are buffered in the queue till they are picked by the workers for processing. Different queues are used for different channels like email, SMS, etc.

Workers: the set of servers that pull messages from the messaging queue and process them one by one by sending them to corresponding services. Here are few examples

SMS
Email
Mobile Push Notifications
a) iOS b) Android

API

POST https://notification-api.example.com/api/v1/sms

Request body

{
	"user_ids" : [
			"ee0ad2aa-2f71-11ec-8d3d-0242ac130003", 
			"980ea932-76ee-4816-b46e-4c26d99f0f9a",
			"5cea9e39-f2f1-4277-aeaf-5eb344f4a7bd"
		    ],
	"sub": "The Big Festival sale starts in 10 minutes!",
	"body": {
		"type": "text/plain",
		"value": "Don't miss the latest deals, visit abcd.com now!"
	}

}

Reliability

Our notification system should be reliable so that the customer does not miss any update meant for him/her. Even in the case of peak traffic, we should avoid any data loss. Some amount of data delay is acceptable though. For this, we can employ acknowledgment and delivery semantics by acknowledging the message only in case the processing of the message was successful at the worker end. We can also keep an append-only log of all notifications for auditing purposes at the worker level.

Analytics

Analytics is an important component for notification systems as it will help us determine how effective our system is by collecting engagement statistics like open rate and click rate.
This will provide great feedback about the customer needs and help us understand the customer behavior effectively.

Design URL Shortener

Kapil Gupta — Wed, 30 Mar 2022 11:57:17 GMT

URL shortening Service creates a compact version of a long URL called tinyURL or shortURL. These URL's can be shared as an alias to the long URL and when someone hits this shortURL they are redirected to the original long URL. Benefits of short URL includes more readablity, convenience to the user and save space. They are much more user friendly when displayed on screen, shared, or tweeted.

URL shortening is also used for tracking and analytics purposes for performance measurements of different campaigns and affiliates. Some URL shortening services also allow yout to provide custom domains to promote your own brand.

Example:

https://www.google.com/imgres?imgurl=https%3A%2F%2Fnationalzoo.si.edu%2Fsites%2Fdefault%2Ffiles%2Fnewsroom%2F649a1243-cropped.jpg&imgrefurl=https%3A%2F%2Fnationalzoo.si.edu%2Fanimals%2Fnews%2Fcelebrating-national-panda-day&tbnid=ExJDq27RAbY-3M&vet=12ahUKEwiMnrW6pqbqAhUyFrcAHeVmDxsQMygDegUIARDYAQ..i&docid=WNx1NKHAfQ_wJM&w=4819&h=2410&q=panda%20image&ved=2ahUKEwiMnrW6pqbqAhUyFrcAHeVmDxsQMygDegUIARDYAQ

can be shortened to this

https://bit.ly/3i8Yitn

Requirements and Goals of the System

Requirements for a URL Shortening service.

Functional Requirements:

Create a unique short URL for any given URL.
When user hits the shortened URL, he should be redirected to the original long URL.
Shortened URL must exprire after a configurable expriation time.
Based on the subscription user should be able to pick up a custom domain for the links to promote their own brand.

Non-Functional Requirements:

Our redirect service which redirects any short URL to the original URL must be highly available. Otherwise users won't be able to access the required URL.
Redirection should have minimum latency otherwise it will be a bad user experience and users might switch back to long URL.

Good to have features

Our service should provide analytics on the usage of shortened URL, like the total number of times it was hit, countries where it was hit, etc.
We can also expose our services throught REST endpoint to allow programmatic access to developers, supporting bulk requests.

Estimations

Traffic estimates :

We can assume that we will have around 10M new URL shortenings per day. There will more number of redirections to the original URL than the creation of new shortened URLs. Hence let's assume a 1:100 write to read ratio.

Total number of new URL shortened per day = 10M
URL Shortening Queries per second = 10M/(24 * 3600) ~= 40M/(100 *3600) = 120QPS

Total number of redirection requests per day = 100 * 10M = 1B request/day
URL Redirection Queries per second = 100 * 120 QPS ~= 12K QPS

Storage estimates:
Assuming that we 10M new URL shortened per day,

Total number of records stored per day = 10M * 365 * 5 ~= 10M*400*5 = 20 Billion

Assuming 1 KB storage space per record,
Total memory required to store the data = 20B*1KB = 20TB

Assuming, 20% of the total read requests are cached
Total cache memory required = .2 * 12K *3600 *24 *1KB = 200GB

API's

Our service can expose the following REST API for creating and deleting shor URL's.

generateShortURL(String longURL, String customDomain, String apiKey):

longURL(String) : The URL which needs to be shortened.
Custom Domain (String) : Custom domain to be used for shortened URL. This could be a limited feature based upon the subscription of the client.
apiKey(String) : The API key for the client making the API request, used for access check, rate limiting, and analytics purpose.

API will provide a JSON response with the generated short URL.

deleteURL(String shortURL, String apiKey):

shortURL(String) : The URL which needs to be deleted.
apiKey(String) : The API key for the client making the API request, used for access check, rate limiting, and analytics purpose.

API will provide a JSON response with appropriate status if url was deleted.

Database

We need to store billions of records over the time. Also our service is read heavy.

What about the consitency requirements? this is a very simple use case where we only create a new record for the shortened URL and then never update the same record. Hence we don't have strict consitency requirements.

What about the availability requirements ? Our service should be highly available otherwise users will not be able to access the original URL and will result into loss of business. Hence we want strong availability.

The trivial choice will be to use a NoSQL database like Cassandra, MongoDB or DynamoDB because we need a database that can easily scale. We can store the URL's in the following collection. To retreive the URL fast we can add index no the hash column.


   //URL Collection Sample Document:

    {
        "_id": "507f191e810c19729de860ea",
        "hash": "3i8Yitn",
        "original_url": "https://www.google.com/imgres?imgurl"
        "creation_date": "2020 Jul 11 16:54:20 UTC+5:30",
        "expiration_date": "2021 Jul 11 16:54:20 UTC+5:30",
        "user_id": ""
    }

    // User Collection Sample Document:
    {
        "_id": "507f1f77bcf86cd799439011",
        "name":"John Doe",
        "subscription_tier": "",
        "creation_date": "2019 Jul 11 16:54:20 UTC+5:30"
        "last_login_date": "2020 June 11 16:54:20 UTC+5:30"

    }

High Level System Design

Approach

Since we want to convert a long URL to a short one, we can choose to use 6-8 characters to represent our tinyURL. Let say we have a character set of 64 (0-9, a-z, A-Z , -, .), and
We can have a total of:

    64^6 ~= 68 billion
    64^7 ~= 4.39 trillion
    64^8 ~= 281 trillion

Based upon our estimations, we are expecting around 20 billion URL's over the 5 years, we can go with 6 characters to represent our tinyURL.

The Naive approach to convert the URL will be to maintain a counter, increment that counter every time we have a new request for tinyURL. However this scheme is not suitable for a scalable service. Several servers will compete to get the new value of the counter and it will become a bottleneck.

Another approach could be to generate a random number, this approach could scale as different servers could generate random numbers in parallel. In Practical scenarios there is no possibility of collisions if we use the correct random number generator. We can use a UUID generator. However it will have 36 characters, if we take only first six characters, we can increase the chance of collisions. In such scenario we can check whether the first six characters are already used. In that case we can use the next 6 characters and so on until we get a unique tinyURL.
The problem with this approach is that if you make several requests for converting the same long URL, it will result into a different tinyURLs.

To overcome above problem, we could use any popular Hash Algorithm like (SHA-2 or MD5) to generate a hash of the original URL. We can encode the hash value to the base 64 and use the first six characters of the hash value. This way we would always generate the same hash value for different requests for the same long URL. But this approach could also generate duplicates, we can use the same approach descibed above by checking if the tinyURL is already used and belongs to the same long URL. If so, we can use the next six characters and repeat the process.

Scaling the Key Generation.

The problem with the above approach will be that the key generation generation can become a bottleneck as we scale our service. It would be time consuming to check whether the required key is already present in the database.

We can overcome this limitation by generating the keys in advance using. We can have a dedicated key generation service that will create a range of keys in advance. This range will be distributed across multiple nodes and will be loaded in memory, hence no processing time will be spent in hashing or encoding the URL.

Since each server has its own unique set of keys, we don't need to worry about any duplicates or coordination between the servers. When the request comes in to shorten a particular URL, the server will pick one of the keys and assign it to the URL, by making an entry into the database. It will then mark the key as used, or remove it from the memory.

What happens if a server dies ? If the server dies, the range of keys will be lost. However that is absolutely fine, since we have huge number of keys.

What happens if the Key Generation service fails ? Since we are now depending upon on the key generation service, it should have replicas to make it highly unlikely available in event of node failure.

URL Redirection

When the User enters the tinyURL in the browser, the request will hit our API, API will fetch the data from the database based upon the tinyURL ke. If the key is not present user will be shown a "404 Not Found" HTTP error, other wise, user will be redirected to the original URL that was shortened.

URL Expiration

Since we need to store data for the shortened URL's, we can choose to clean up URL's which were created long back based upon a certain expiration date and the frequency of usage. This expiration date could depend on subscription of the user who created the URL. This will clean up the data

For the Free tier users, we can have an expiration time of 6 month or 1 year from the last used time, before the data is purged. Similarly we can have a longer expiration time for paid users according to their subscription plan.

We can run a scheduled job every 24 hours that will cleanup the expired URL's from the database. The service should run at time when user traffic is low to prevent load on the database. For example, we can choose to run the job at midnight in each region. We can also reuse the key from the expired link.

Data Partitioning and Replication

In order to scale our DB, we need to introduce some partitioning scheme to store URL's. This will distribute our load to different database servers.

We can do one of the following:

Partitioning based upon Range: We can do a range based partitioning by distributing URL's across partitions based upon the first letter of the URL. Although this is a simple approach, but it can lead to unbalanced partitions. If we have a lot of URL's that start from 'G', the server that stores these URL's can becom a hotspot.
Partitioning based upon Hash: Under this scheme, we can distribute the URL's among different DB partitions based upon a hash value. If our hash function can ensure uniform distribution the partitions will be balanced. We can use consistent hashing to avoid any problems related to uneven loads.

Caching

Since our application is read heavy, we must use some caching strategies to store the hot URL's.
We can assume that almost 20% of the URL's will contribute to 80% of the traffic. We can store a mapping of the shortened URL and the original URL.

To store 20% of the hot URLs, we need to (200GB) of memory. We can use any popular caching system like Redis, or Memcached. We can create a distribute caching system to scale our caching layer using more servers.

Caching the server will reduce the load on the DB servers and improve the overall performance of the system.
As discussed in Caching tutorial. We can use a suitable cache eviction policy like Least Recently Used (LRU).

Similarly we can use a suitable cache write policy to update the cache in case of a cache miss. We can use the write around cache policy, initally data will be written directly tothe database while bypassing the cache. While reading recent data there will be a cache miss and will result in data being read from the disk. At this point data will put into the cache.

Yelp System Design

Kapil Gupta — Wed, 30 Mar 2022 11:56:03 GMT

Yelp is a platform for crowd-sourced local business reviews and listings. Locations like restaurants, bars, and other businesses have dedicated pages, where users can read or submit reviews and ratings. Businesses can provide information about themselves on their pages so that people can discover their services and timings. In addition to reviews, Yelp also provides the ability to upload images and videos related to a location.

Users can search for any place or nearby attractions like restaurants, theaters as well as events. Other similar services include TripAdvisor (for travel), Zomato (for Restaurants)

Functional Requirements

Business Profile - Yelp provides dedicated pages for each business, business owners or customers should be able to create a page for Businesses.
Search by Location or GPS coordinates - Users should be able to search for attractions by providing location and also able to search nearby places.
Reviews and Rating System - Users should be able to provide reviews for the places they have visited, services they have availed. Users should be able to provide ratings in terms of stars out of 5.
Photo and Video upload - Users should be able to add photos and videos related to the place.

Non Functional Requirement

High Availability: Our search service should be highly available.
Scalable: There will be peak demand on the Holiday season or certain tourist attractions.
Low Latency: Users should be able to access information as fast as possible.

Additional Requirements

Fake Reviews Detection: The system should be able to detect fake reviews by business owners that try to increase their ratings unethically.
Image and Video Processing: These services process the images and videos uploaded to provide the best photos to its users.

Estimations

We need to store location entries, Assuming that we will be storing around 500M places. Let’s also assume a 20% growth in the number of places each year, we can safely assume that we don't need more than 1MB(excluding photos and videos) to store the metadata related to a location.

Our system will be read-heavy, we can assume a 1:1000 write to read ratio with around a peak of 100k concurrent users.

Database Schema

We can store the following data in a relational database. We can have a place/location table to store the place information

Location Table :

id : integer
name : varchar(256)
geohash : varchar(12)
description: varchar (512 bytes)
address : varchar (1024)
category_id: integer

Similarly, we will have tables for users, reviews, ratings, photos, and videos. We can store the photos and videos on the cloud blob storage like AWS S3 and maintain a reference in the photo and video table respectively.

API's

Our service can expose the following REST API for searching nearby places.

search(String searchTerm, Map search params, String apiKey):

searchTerm(String) : The term which the user is searching for, it could be the name or location of the business.
search params(Map) : search criteria, includes the location for which search needs to be performed, the search radius, additional filter params like categories, number of results to return, sorting order, etc.

    sample searchParams :

    {
        "location": [34.3, -118.243],
        "categories" : ["restuarent", "bar", "dine-out"],
        "searchRadius": 5,
        "maxResults": 100,
        "sortCritera": 1,
        "sortAscending" true,
    }

apiKey(String) : The API key for the client making the API request, used for access check, rate limiting, and analytics purpose.

API will provide a JSON response for the list of places matching the search criteria.

We can expose additional API to Add/Modify new listing, add reviews, add ratings, upload photo/video to the listing, or a particular review.

High-Level Design

At a high level, The key feature of the Yelp system design is to find the nearby places/businesses with minimum latency and sort them by distance, because everyone wants to do business with the nearest store.

How to store the geographical context in the database.

Naive solution - A naive solution will be to store latitude and longitude data for every POI, same for the user, and then compare the information of the user's location with the database. Given a certain range query, the database where latitude is between xi and xj and longitude is between yi and yj.

This approach will not be scalable given the huge number of places, that users can query for.

GeoHash: Sequence of characters that identify a particular region to design a scalable system that provides geolocation data, we can use the concept of Geohash.

GeoHash essentially encodes the latitude and longitude information about a place to a String. The world map is divided into a rectangular grid system. Each rectangle is identified by a hash string and is further divided into 32 rectangles following a hierarchical structure, each level having an extra character than the hash of the parent rectangle. The "Geohash alphabet" (32ghs) uses all digits 0-9 and almost all lower case letters except "a", "i", "l" and "o".

Geohash represents a boundary and not a point. Since our mission is to calculate the nearby places, it is well suited for the application. Also, there will be some corner scenarios that we need to consider while determining the neighbors as shown in the below picture.

Geohashes are popular for spatial indexing and search applications thought initially were used only as part of URL-shortening service. If you have never heard the concept behind Geohash. I highly recommend viewing the following video.

https://www.youtube.com/watch?v=UaMzra18TD8

Let's return to our Yelp System Design scenario for finding nearby places and restaurants.

The naive algorithm to find out the nearest restaurants and businesses would be to find out the points with the same geohash as the location of the user. As discussed above due to corner cases, some of the nearby points of interest, but depending on the distance we want to search, likely leave out nearby points of interest that are in neighboring geohash bin.

To get correct and a more comprehensive set of points of interest, we have to get all the points in the current bin as well as all the points in the surrounding geohash bins i.e 8 immediate neighboring bins. Then we calculate the distance between all the points around the center.

Using Geohash based approach to find a point on the map, we can handle a very large dataset since on entering a particular geohash bin, we are essentially discarding the remaining bins, reducing the size of the problem exponentially. Users of a Yelp like service needs to see the results in real-time, hence we need to store and index the data about geohash of the places and associated reviews and ratings. Since the location data will not change, we need to implement the indexing for reading efficiency.

We can use a reasonable Geohash size of 12 characters to represent any place on earth with good precision. We can store the Geohash for locations in the database.

Storing GeoHash

Let’s see what are different ways to store this data and find out which method will suit best for our use cases.

Custom Approach

Since this is hierarchical data, we can store the data in the form of a tree structure. Each node will represent a geohash and will 32 children for each of the nodes to model the geohash like structure.
We will have additional tables to store the metadata for any place corresponding to each geohash.

Building the tree structure

We will start with the root node that would represent the whole map. The root node will have 32 children. We will insert take the 12 character geohash of the place we want to insert into the tree and will create a node for each character. For example to insert a place represented by the geohash "9q8zhuyj1ccb1" we would create a node under root node to represent the grid "9", under that node we will create a child node to represent "9q" and so on.

Querying the grid for a place

We can start from the root node and search downward for the required node. At each node, we will move towards the child node with the next character in the geohash, Once we reach the leaf node, that is the required node representing the point of interest. If at any point we don't find the required node, then the point of interest does not exist in the grid.

Finding out the neighbors

To find out the neighbors of any point, we can go to the parent node, or grandparent node depending upon the radius for which we want to search for neighbors, find out all the points under them, Calculate their distance from the Point of Interest and then sort according to the given criteria.

We can optimize the data structure for finding the neighbors, by connecting the leaf nodes with a double linked list to allow quick forward and backward iterations among the neighboring places.

After getting the nearby geohashes, we can query our metadata table to find the details corresponding to those places.

Memory Requirements
Since each leaf node represents a place using a Geohash of 12 characters, Total memory needed to store the Tree Structure for 500M places will be :

12 * ~1Byte * 500M = 6GB

Since we also have internal nodes in the tree, assuming around 1/3 internal nodes with 32 children, they will occupy:

500M * 1/3 * 32 * ~1 Byte = ~5GB

Use NoSQL offerings like Redis, ElasticSearch, MongoDB to store the Geospatial data

Instead of writing custom logic to implement Geospatial search, we can use some popular NoSQL databases like Redis, Elasticsearch to achieve the same functionality.

For example, We can use the Geospatial indexing feature of the Redis database. to quickly implement this feature offloading the indexing, searching, and sorting work to Redis using very few lines of code.
We can use the Geo Set to work with spatial data in Redis, Redis provides commands like GEOADD, GEODIST, GEORADIUS, and GEORADIUSBYMEMBER.

More information can be found using the below linked
Working with Geospatial Data in Redis

Similar support is provided by ElasticSearch and MongoDB databases. Going with elastic search gives full-text search as well as we need to search by the geographic tuple.

Working with Geospatial Data in ElasticSearch

Working with Geospatial Data in MongoDB

Data Partitioning

If we are using NoSQL offerings like Redis, MongoDB, Elasticsearch, etc. We can easily scale horizontally as we scale and add new places.

If we are building a custom application using the Tree-based structure. We can use the following schemes for Partitioning our data.

Partitioning based on regions: We can divide the complete tree into multiple parts based upon the region. Places in the same region will be stored on the same node. Before storing a new location, we need to first find out which nodes store the data for a particular region, and then we can store a new location on those nodes, Similarly to querying the data.
The problem with this approach is that there will be certain regions like in San Francisco which will be densely populated and will have more restaurants and other Points of Interest. This will result in an uneven distribution of data and would cause more loads on the server than others. To prevent this problem, we might partition these hot regions further or use consistent hashing.
Partitioning based on Hash: We can use consistent hashing to decide which server will hold the data of a region, we can use a hash function to hash the geohash of a place and then distribute the data uniformly across multiple servers.
to fetch all the nearby places, we will get the data from all servers and aggregate the data.

Processing the photos and videos Yelp stores the photos and videos and applies deep learning models to enhance them on the go.
More information can be found on the Yelp Engineering Blog.

Yelp Engineering Blog

Spam detection : Similarly Yelp like services uses machine learning models to prevent false reviews and ratings.

Caching

As our service is read-heavy, there will be around 20% of places which are searched frequently. We can use a caching solution like Memcache or Redis to store this data. This will improve the performance of 80% search queries.
To deal with hot Places, we can introduce a cache in front of our database. We can use an off-the-shelf solution like Memcache, which can store all data about hot places. Application servers before hitting the backend database can quickly check if the cache has that Place. Based on clients’ usage patterns, we can adjust how many cache servers we need. For cache eviction policy, Least Recently Used (LRU) seems suitable for our system.

WebSockets

Kapil Gupta — Wed, 30 Mar 2022 11:51:55 GMT

WebSockets is essentially a transport layer built on the top of the TCP/IP Protocol. and provides a persistent bidirectional communication channel between a client and a server.
The initial connection is established similar to a normal HTTP request and then upgraded to a lightweight and real-time bidirectional channel which allows the server to send downstream messages to the connected clients.

WebSockets allow exchanging messages using any protocol as long as both client and server agree on the same for example JSON, XML, and any more.

Before the WebSockets, achieving the same functionalities would require constant polling the server which is generally high latency and overloads the backend servers.

Starting from around 2010, WebSockets are supported on all platforms including web and mobile devices. The standard for the WebSocket Protocol (RFC 6455 – The WebSocket Protocol) was published by IETF in 2011.
Backend servers can go for any client authentication mechanism for example cookie-based, HTTP, or TLS authentication.

WebSocets Usecases

Websockets are used in a myraid of applications which demands low latency, realtime connection for exchanging data between client and server.

Notification/ Chat messages
Multiplayer online games
Live Sports commentary / ticker
Realtime Social Network updates.
Realtime monitoring

Considerations while using WebSockets

Security Issues: WebSockets are prone to common security vulnerabilities linked to the HTTP Protocol such as DDos, Authentication and Authorization issues, Sniffing, Cross-Site Websocket Hijacking (Similar to CSRF (Cross-Site Request Forgery))
Connection Issues : WebSocket connection, once terminated don't recover automatically and need to be handled. However, Reconnection logic is generally handled by the available client side libraries.

WebSockets Implementations

There are a plenty of popular WebSocket implemenations/libs available in the most popular languages, which makes it easy to get up and running easily.

Javascript : WS, Sockets.io, SockJS
Java : javax.websocket-api, java.net.Socket, Jetty
Ruby : EventMachine, websocket-client-simple, em-websocket
Python : pywebsocket, Tornado

SQL vs NoSQL Database

Kapil Gupta — Wed, 30 Mar 2022 11:50:47 GMT

A key aspect of any Large scale system is the ability to handle a large amount of Data.

A Database is used to persist information that will be useful later on. Broadly there are two types of database categories available to store the data and those are SQL or NoSQL. Which one to use depends on the needs of the system. For many years in the past SQL or Relational databases were the standard to store information. As the systems became more and more complex and the data grew many-fold NoSQL databases have gained their ground over the last decade and emerged as the primary data store for many applications. To understand which database is better for your needs to let's understand the difference between these two.

SQL or Relational Databases

A SQL or Relational database stores data in an organized set of tables with rows and columns. All the data related to a particular entity is stored in one table or more tables. A row stores all the related values for a particular instance of entity or object and each column stores one attribute of that object. Queries data using SQL syntax and JOINS. CRUD uses SCHEMA and transactions.

NoSQL Databases

NoSQL Database refers to Non Relational database, In contrast to a relational database where data is stored in well-defined tabular relations and Structured Query Language is used to access data, a NoSQL database uses different mechanisms for storage and retrieval. Different flavors of NoSQL use proprietary storage methods. Based on the problem they are solving NoSQL Databases store data in a wide variety of forms like key-value pairs, documents, columnar forms, graphs, or special time-sequenced events.

SQL v/s NoSQL

Schema

SQL: Data is nicely organized in appropriate tables according to the predefined schema, which reduces redundant information. Data conforms to the applied constraints. Requires a lot of initial thought to minimize changes later which might require migration of existing data and related downtime. Structured data prevents developers from sloppily adding data, and constraints prevent the corruption of data due to software bugs. This translates to more effort for developers initially.

NoSQL: schema is flexible, depending upon the type of NoSQL database, we might add or skip different columns and data in each document. It provides the flexibility of the data model to the developers allowing for easy iterations through the development process, without any downtime. On the flip side, it can lead to data corruption, if constraints and checks are not implemented correctly at the application layer.

Storage

SQL: Data is typically stored across multiple tables in normalized forms to prevent duplication of data.

NoSQL: Data is stored in a nested form with everything related to an entity stored in a single document. Though it introduces data redundancy but provides very fast access.

Data Retrieval:

SQL: Due to structured data, the Relational database provides a powerful SQL(Structured Query Language) interface for Data definition, Data Control, and Data Manipulation. The split structure allows us to join data in any way. Numerous join types are available and can be done with any number of tables in any way.

NoSQL: Since data for a particular entity is stored together, data access is simple and does not require any complex queries.

Performance :

SQL: With proper use of indexes and query tuning, it is possible to query very high volumes of data with reasonable performance.

NoSQL: Depending upon the type of NoSQL certain queries and access patterns are extremely optimized and fast. for example, the Key-Value database provides fast lookup, Column based database provides fast aggregation capabilities.

Scalability:

SQL: SQL Database is powerful and flexible but constrained in terms of scaling. Scaling SQL Database systems require expensive hardware to scale vertically. If we distribute SQL over multiple servers using data partitioning, however, performance suffers for certain queries and joins. This can also be attributed to the fact that relational databases came into existence in the early 1970s, in those times the scale and traffic that we witness today were unheard of.

NoSQL: These are built for web-scale applications and are easily scalable horizontally. queries. NoSQL does not use custom expensive hardware that allows for faster retrieval of data at a high scale. Since most of the NoSQL databases were created in recent times, they were developed cloud-native and with a distributed approach in mind. Almost all popular cloud vendors provide managed solutions that scale very easily according to growing traffic.

Data Integrity:

SQL: Majority of SQL databases offer strong ACID compliance and ensure consistency and reliability of data that is stored, thus making them relevant for transactional data even today.

NoSQL:, Unlike SQL counterparts, most NoSQL databases offer vague interpretations of ACID compliance and guarantees. These are majorly designed for scale and hence focus on Availability and Partition Tolerance by compromising on the consistency of the data. Always dig deep about the guarantees offered by the database that you are using before making any assumptions.

After discussing all the differences between SQL vs NoSQL, it is essential to note that in recent times, the lines between the SQL vs NoSQL has been blurring.

SQL providing NoSQL like Features

SQL databases like PostgreSQL and MySQL now provide support for storing, manipulating, and querying JSON Data. Similarly we can use PostgreSQL HStore to store key-value pairs.

NoSQL providing SQL like Features

MongoDB provides support for Joins and Transactions, although the underlying implementation and guarantees can be very different.

Choosing between SQL vs NoSQL

While designing a system or during a system design interview, it is of utmost importance to decide on the right database for storing data as it will allow you to get good performance out of your system. We need to choose the database that is the best fit for a particular use case.

For example: If you are designing a Banking application or other financial services-related application, It is a no-brainer to choose a Relational Database because of the data guarantees and security it has to offer which is a prime concern for this use case.

On the other hand, If you are designing a web crawler or search engine application, it is not advisable to use a Relational database due to the unstructured nature of web data and the scale at which you need to operate.

It is a common source of confusion for the new and in-experienced developers on whether to choose SQL or NoSQL while doing the required system design. So how to approach this?

Always think in terms of the access patterns or operations that you will need, does the database supports that as a core feature ?.
What kind of Guarantees and Isolation level (for transactions) do you need for your application?

Here is a list of Popular Use cases along with the most suitable Database types for each use case.

Sno Use case Choice of Database 1 Cache NoSQL inmemory datastore : Redis, Memcached 2 Banking Application SQL Database 3 Social Network NoSQL Graph Database: Neo4j 4 Facebook Ad Platform NoSQL columnar Databse: Cassandra, BigTable

Below is Genius Cheatsheet created by Satish Chandra Gupta, that can help you in picking the correct DB for your use case.

Here is a list of all the Popular Database and their rankings - DB Ranking

Scaling

Kapil Gupta — Wed, 30 Mar 2022 11:48:44 GMT

Scaling means increasing the system capacity to cope up with increased load for example number of requests or handling more data.

Need for Scaling

If our system is currently handling 100 requests/min and suddenly the traffic increases to 1000 requests/min, even though our system might be able to handle the same the traffic but due to increased load, the performance might become unacceptable, hence we will need to scale our system to be able to support that additional load with acceptable performance.

The performance of a system can be measured using throughput
and response time.

Type of Scaling

Vertical Scaling (Scaling Up)

Increasing the capacity to the current machine by improving its specifications like attaching more RAM, adding better CPU, upgrading HDD to SSD or adding more storage capacity, increasing network bandwidth etc.

Advantages:

Architecture - Relatively simple, Less issues to worry about like inter service dependencies, network partitions etc.
Maintenance - Easy to maintain due to less moving parts.

Disadvantages:

Cost - Adding more resources leads to exponential increase in cost.
Scope - Limited scope of Scaling due to physical limitation of size.
Downtime - Risk of fault or complete downtime due to hardware failure.

Horizontal Scaling (Scaling Out)

Adding more machines and distributing the load among them.

Advantages:

Cost - Cost efficient due to the use of commodity hardware.
Scope - Theoretically unlimited scope of scaling.
Resilience and Fault Tolerant - Due to redundancy, if few nodes go down, other nodes take up the requests.

Disadvantages:

Architecture - Complex to design, due to fallacies of distributed systems.
Maintenance - Difficult to observe and maintain due to moving parts.

Which approach is the best ?

There is no single correct answer to this, the choice of how to scale a particular system depends specifically on our system. The bottleneck for the system may be the number of requests, the volume of data , the access patterns or something else.
Depending upon whether you are targeting sub 200ms response time on your API’S or trying to manage petabytes of data, The System design and scaling strategies will be entirely different.
It is not possible to find an approach that fits all.

In general, it is easy to distribute a stateless system across multiple machines. Managing and synchronizing state within shared systems can introduce a lot of additional complexity and becomes hard to design and maintain.

It is a good practice to use a hybrid approach. To keep it simple, We can first scale vertically by using relatively powerful machines till we reach a point where it makes sense to use a large number of smaller machines.

Also, it is an ongoing process, a system designed for a particular load may not be able to handle 5x or 10x the load. At each milestone, we need to optimize, monitor and constantly re-architect on every magnitude of increased load.

Rate Limiting

Kapil Gupta — Wed, 30 Mar 2022 11:42:15 GMT

Rate Limiting means to throttle the number requests to your service from a particular source (user, device, IP, location, etc) to some maximum limit. The requests submitted over the limit are either immediately rejected or they are delayed. Rate limiting allows us to create resilient services, that can handle various scenarios discussed below.

The simplest example of rate limiting is that you are allowed to enter 3 incorrect passwords before your bank account is locked for online transactions for a day. Similarly, the Github API allows around 5000 requests from a user account per hour, after that, you will get an error to wait for some time before sending another request.
Services typically send 503 (service unavailable) or 429(Too many requests) Http status code when the limit is exceeded.

Advantages of Rate Limiting

Security: Rate Limiting is implemented to prevent DDoS attacks which overwhelms the servers which are a costly affair for any company both in terms of money and customer experience. Rate limiting your APIs will also limit the amount of data that gets exposed when security is somehow compromised.
Protection against faulty Clients: Rate Limiting is also useful to make your system foolproof against any faulty or malicious client software, which might be sending a lot of requests due to a software bug.
Cost Optimization: Rate Limiting Computationally intensive API will help optimize the cost of infrastructure.
Maintain Quality: If you have a public API, implementing rate limit offers a better experience for all the users by distributing the number of requests that can be served. It prevents the Noisy Neighbour problem when one user utilizes too much-shared resources, such that it causes higher latency or higher failure rates.
Maintain Priority: Several API's have paid version and free version, by assigning lower rate limits to the free users, you can make sure that Paid users get a better experience for their money.

Types of Rate Limiting

Due to ever-increasing loads on the servers for popular services, almost all companies use rate-limiting in their services, If you are also creating such services, it is very important that you add some form of rate limiting to prevent your infrastructure and customer base.

We can implement rate limiting based upon various methods and parameters that can be defined when setting rate limits. Based on the security and business requirements, we can choose one of the following criteria.

User rate limiting: This is the most popular criteria used for rate limiting. Based on the number of requests from a particular User's IP or API Key, requests will be throttled once the limit is reached. Users will be shown the appropriate status code to reflect that their requests are throttled.
Geographic rate limiting: Based upon the security or business requirements, We can set rate limits based on the geographic regions, This could reduce the likelihood of DDOS attacks and other suspicious activity.
Resource Based Rate limiting: We can also employ flexible Rate Limits based upon the amount of resources available like CPU, network bandwidth, etc. When resources are constrained, we can reduce the rate limits, Once the resources are available, we can increase the rate limits.
Hybrid rate limiting: We can combine certain characteristics of the above rate limiting algorithm to achieve optimum results, like a user can send 100 requests per second from a particular IP address in a geographic region.

Requirement Gathering.

Function Requirements

For a given request, return a boolean value of whether the request is throttled or not.

NonFunction Requirements

Low Latency - We want the rate limiting module to be as fast as possible, otherwise it adds latency to the processing time of the request.
Accurate - Our rate limiting module should be as accurate as possible. It should not be throttling requests which should have actually gone through.
Scalable - Our service should be highly scalable to support more number of requests.

In case the rate limiting service is not available, the system should not block the processing of the request.

Rate Limiting Algorithms

Token Bucket Algorithm - In this algorithm, we place n tokens in a bucket, every bucket has a maximum capacity and is filled at a constant rate, ex 10 tokens pers second. When a request for our API is received, tokens are withdrawn from the bucket if available. If required number of tokens are not available then the request is rejected or delayed. Eventually, the bucket is refilled with tokens and the client can make more requests to our API. Please note that depending upon the operations in the API request, one request might need more than one token.

Leaky Bucket Algorithm - It is one of the simplest rate limiting algorithms, It implements rate limiting using a bucket concept per user (depending upon the factor on which rate limiting is applied) which holds all the incoming requests of that user. The bucket is of the fixed size corresponding to the number of requests we want to allow. When a request is received, it is put into the bucket. Requests are picked up from the other end of the bucket (more like a queue than a bucket) for processing. If the number of requests at any point in time exceeds the size of the bucket, those will leak and will not be considered. It can lead to queueing up of old requests if they take long time to process and prevent recent requests from processing.

Fixed Window Algorithm - As the name suggests, in this algorithm, we fix a time window over which the rate limiting is imposed. Irrespective of the actual time the request came in, the count is maintained for fixed time windows of x seconds.

Consider the example in the above image where the rate limit is 3 requests per second, the first request came in after 500 ms. It is clearly visible from the previous example, that from 500ms to 1500 ms, the server actually serves 4 requests this might lead to a rate limiter allowing extra of requests if more such requests come near the boundary of the time window. The next algorithm will overcome this issue with Fixed Window Algorithm.

Rolling Window Algorithm - In the Fixed Window Algorithm, the reference time window was fixed, however in case of a rolling window, the time window starts when the first request is received. The rate limit is imposed based upon the number of requests that came in from the start of the window to the end of the window. This will improve the performance around the boundary of the time zone.

As shown in the image above, now the time window starts at 500 ms when the first request comes in, and it allows us to serve only 3 requests R1, R2, R3 for that second. Thus request R4 is throttled, after 1 second is complete, the new time window starts once a new request comes in.

High Level Design for Rate Limiter.

The above diagram shows the High level design for a rate limiting system. When the Webserver receives the request it would ask the Rate Limiter service, whether the request should be served or throttled. If the request is not throttled, it will be forwarded to one of the API servers.

Distributed Rate Limiter Design

The true benefits of rate limiters are predominant in a distributed system, where we are operating on a large scale, and we want to have efficient use of our resources. In a distributed system, there will be multiple API servers serving requests from users.

Due to the number of requests, we would need to scale our rate limiter service by distributing over multiple nodes. In each of the algorithms, described above we store some kind of count that constraints the number of requests that will be served. How can we achieve this if our counters/buckets are distributed over multiple nodes?

Communication between the hosts is the key here. We want to implement the rate limit at a global level, hence we need a way to synchronize among different cluster nodes.

We can use a global data store to maintain the counters for each window and user. But having a global data store is a bottleneck as all the nodes will flock to the data store to get the count. The worst cases could lead to race conditions. If one of the nodes has read the value of the counter before it could update the counter, another node could read the same value and the final value will be incremented by only one instead of two.

We could avoid race conditions by introducing locks. However, that could degrade the performance of the system and it will not scale well. Querying the central data store will add the additional overhead of several milliseconds for each request.

Can we do better ?
Instead of querying the centralized data store for each request, we can maintain the counters locally. Let's take an example: If our rate limit is 4 requests per second per user. We can have a limit of 4 requests per second per user on each server as shown in examples below.

Global Limit : 4 req/sec/user

Let's say Server1 received 2 requests in a particular second, Server2 received 1 request and Server3 received 1 request. Since each server is maintaining the counters locally, If 2 more requests come to Server1, it would allow the API call. Thus our rate limiter has served 6 requests per second for the user.

This scheme will lead to relaxed rate limits. However, each node can synchronize the counters with a central data store eventually as a part of the synchronization cycle. If a particular node had served more requests in a time window it would receive a negative counter value for the future time window and hence the average number of requests served per second per user will still be the same as the original limit. We can configure the interval between synchronization cycles to achieve optimal results.

References and interesting reads.

Rate Limiting Techniques

Alternative Approach to Rate Limiting

NoSQL

Kapil Gupta — Wed, 30 Mar 2022 11:33:45 GMT

NoSQL Database refers to Non Relational database, In contrast to a relational database where data is stored in well defined tabular relations and Structured Query Language is used to access data, NoSQL database uses different mechanisms for storage and retrieval and generally does not provide SQL based data access and hence the name "Non-SQL", Although some of NoSQL implementations may provide SQL kind of interface, hence NoSQL is also sometimes called as "Not Only SQL".

NoSQL Database doesn't use relational tables or schema. Different flavors of NoSQL use proprietary storage methods. Based on the problem they are solving NoSQL Databases store data in a wide variety of forms like key-value pairs, like documents, in columnar form, in the form of graphs or special time-sequenced events.

NoSQL Database is inherently designed for Availability and Partition tolerance over Consistency, however, different vendors do provide the support to increase the level of consistency by trading off other attributes.

NoSQL database has gained a lot of popularity in the past decade as it is designed for overcoming limits of scale and providing the ability to scale horizontally. These are created by large web companies and a lot of popular ones open-sourced. NoSQL database is highly prevalent in the Cloud Data storage model. Major cloud vendors like AWS, GCP, and Azure provide managed NoSQL solutions with enterprise-level support and tooling.

Characteristics of NoSQL

Scale: NoSQL databases can store data on very large scales. These are designed and optimized for specific uses cases at a super large scale, something that cannot be achieved by the relational database. This helps in scaling your data storage to such extent and allowing fast retrieval for specific queries. NoSQL database is used for solving problems like indexing the web for search engines like Google and Bing, predicting customer behavior for analytics and ads on platforms like Google and Facebook or backing recommendation systems for Netflix.

Flexibility and Ease of Use: Another reason for widespread popularity is the flexibility of use, Due to flexible schema, these are best to get started without putting much thought about the structure initially.
OpenSource and Community Driven: Most of the popular NoSQL database are open-sourced, and provide community editions of these products.

Types of NoSQL Database

Following are the popular types of NoSQL database:

Key-Value NoSQL: These databases store huge lists of key-value pairs, where the key represents the field or attribute name, and the value represents the value of that field. These store hot datasets mostly for caching or lookup purposes. They provide extremely fast access through in-memory storage options. Most popular key-value stores include DynamoDB, Redis, Memcached, and Voldemort.
document-oriented NoSQL: Data is stored in the form of documents, the documents are further grouped in collections. In contrast to the rows in a Relational Database, where each row stores data for a fixed number of columns, The structure of each document in Document-Oriented DB is flexible and does not need to be the same as other documents. These databases are fast for querying and easily scalable, Most popular Document Datastore are MongoDB, AWS DynamoDB(can act as both key/value and Document) CouchBase, and Elasticsearch.
Columnar NoSQL : Data is stored in columnar families, unlike conventional database this database allows us to read specific column values. Best used for storing ragged datasets, for purpose of aggregation. These are extremely fast and scalable and suited for analytical applications. Most popular Columnar databases are Cassandra by Facebook, BigTable, and BigQuery by Google Cloud Platform.
Graph NoSQL : Data is stored in the form of graph bases structures to store entities and their properties. These are extremely fast for relationship queries and easily scalable. Examples include Neo4j and AWS Neptune DB.
Speciality NoSQL : These are specialty databases created and optimized for certain special types of data like Time based events, IoT events, and blockchain ledger data.
Example of this type of database includes Google Cloud Platform Firebase/Firestore, AWS IoT, and AWS QLDB.

Trends in NoSQL Landscape

As NoSQL database are becoming widely used, different vendors are adding multiple features within the same offering, like the AWS DynamoDB can act both as a Key-Value and Document Database. Redis has support for in-memory and persistence on the disk. New features are getting added for elastic scaling, backup, and monitoring.

Messaging System

Kapil Gupta — Wed, 30 Mar 2022 11:32:37 GMT

Messaging System is a common approach to transfer data between systems and applications. A producer generates a message containing some information that is transmitted to the consumers.

We can do this via direct communication via TCP connection between producer and consumer. However, such a system allows communication between exactly one producer and one consumer. Also what happens if producers are producing messages faster than the consumers can consume, And what happens if consumer nodes go down?

Depending upon the application, a popular way to send a message is via a message broker or a queue.

Message Brokers

Message brokers (or Queues) run as servers sitting between producers and consumers. Producers can produce messages to the message broker. On the other hand, consumers can receive messages from the broker. Thus it helps to decouple different interacting systems providing them an asynchronous way of messaging.

One or more producers can communicate with one or more consumers using such messaging systems.

Producers can now produce data at any rate without worrying about the rate at how data is consumed or who is consuming the data.

These systems can tolerate consumers going down and coming back and messages can be persisted in the broker from the time they are produced by the producers till the time they are consumed or even later depending upon the implementation.

Messaging Patterns

Two common ways of handling the Messages are Load balancing and Fan-out

Load Balancing

Each message in the queue is delivered to exactly one of the consumers. In general, multiple consumers will be present in the system parallelizing the processing of the messages from the queue. Once the message is extracted by the consumer, it is removed from the queue. This pattern is generally used to process messages that result in long-running tasks and you want to achieve some parallel processing.

For example, Your credit card company generates a statement of transactions at the end of the billing cycle and sends an email with the statement. Such a system can be implemented by a scheduler that fetches user information from the DB and generates a message in the queue for processing data for each customer. A separate application receives this information, fetches data about the user transactions, crunches numbers, generates the statement, and delivers the email in our inbox.

Fan-out

Each message in the broker is delivered to all the consumers, allowing multiple consumers to listen to the same message and perform different tasks. This pattern is used to process messages that have to be handled in multiple ways by different systems.

For example: While logging in to some applications, we receive the same passcode via text and email. Such a system can be implemented by sending all passcode messages through a messaging system and having Text messaging application and Email messaging application listen to the same message.

Hybrid

We can combine the two patterns by using two consumer groups ( containing multiple consumers) listening to the same messages, such that each consumer group receives all messages but within a consumer group only one of the nodes receives each message. Kafka uses such an implementation.

Benefits of using a Messaging System:

Buffering: Messaging Systems provide a way to buffer the messages, persisting them in the broker (in memory or disc) while they wait to be processed by the consumers. Allowing our systems to deal with sudden traffic spikes up to some extent.
Message Delivery Guarantees: Using techniques described above messages are guaranteed to be delivered and processed by the consumers before they are marked as processed allowing retries in failure scenarios.
Easy Scaling: Because of the asynchronous nature of messaging systems and fault tolerance and configurability, it is rather easy to scale software systems that use messaging systems for communication, allowing multiple producers to communicate with multiple consumers.
Separation of concerns: Using a messaging system allows us flexibility in architecture by making the producers and consumers independent.

Consideration while using a Messaging System:

Back Pressure.

What happens if our system is under heavy sustained load, There are large amounts of messages flowing into the broker and it starts to grow significantly. Depending upon the server configuration, if the size becomes larger than the memory, it will result in cache misses and expensive disk reads. Having the entire system degrade under such circumstances is not the way to build scalable and fault-tolerant systems. Hence if the arrival rate is more than what the system can cope up with, can use backpressure in such scenarios by helping limit the queue size preventing throughput degradation, and maintaining good response times for the message processing. Once the broker gets filled, it can return a service unavailable using the HTTP status code 503, indicating to try at a later time.

Here is an interesting read on the this topic :

Apply back pressure

Acknowledgement and redelivery

If consumers crash while processing a message, which may result in loss of message. However, message brokers implement acknowledgments wherein the consumer must assure the message broker that it has completed processing the message and that broker can mark it as consumed and remove it from the queue.

If the broker does not receive the acknowledgment, it will assume that the message did not get processed successfully and it will redeliver the message again until it is processed with an acknowledgment. We will learn more about this when we discuss implementations of different messaging systems like Kafka, RabbitMQ, etc.

Load Balancing

Kapil Gupta — Wed, 30 Mar 2022 11:19:31 GMT

Load Balancing means efficiently distributing the network traffic across multiple machines to balance out the load and prevent any hotspots. Load Balancers also keep track of servers which are not functional and avoid sending requests to those machines. When the server comes back up or new servers are added, Load Balancer will resume the distribution of traffic to that server again.

Apart from routing, Load balancers might also do other activities like SSL termination, which means SSL traffic is decrypted by the load balancer and then passed on to the server. This saves the work of SSL decryption on the web servers improving performance and reducing latency.

Load balancing can happen at Layer 4 (L4- Transport) or Layer 7 (L7- Application) of the OSI Model. Also, there are different kinds of Load Balancers.

Hardware Load Balancer or Software Load Balancer

Hardware-based load balancers are high-performance solutions where specialized processors are used to achieving optimum performance. To handle increased loads you need to buy more machines. Hence they are generally a costly solution and require specialized maintenance. Due to the above limitations, these are not very popular. The most widely used ones are Citrix Netscaler and F5.

Software-based load balancers uses generic hardware to run and are easy to scale and flexible in configuration. Some of the popular ones come as installable software and others are provided as a Service called Load Balancer as a Service (LBaas). Popular software-based load balancers included Nginx, Varnish, HAProxy, and LVS. Popular LBaas include AWS ELB and Stratoscale LBaas. LBaas are managed by the cloud vendors and handle fault tolerance and elasticity for the users.

Check out the following video of how Facebook handles billions of request using their unique load balancing techniques.

Load Balancing Layer

Load Balancing can be introduced at any layer in the application increase scalability depending upon the number of requests served.

Frontend Layer - The most common use case of the load balancers is at the frontend of the application, between the client and the webserver. This helps to increase the number of requests that can be served by the system. Also, SSL Termination is done here to save CPU cycles of the application server.
Application Layer - Load Balancers are placed between the webserver which is taking the requests and the application servers which are doing CPU intensive tasks. This helps in the proper utilization of the application servers without overloading them.
Persistance Layer - Load Balancers are placed between the Application Layer and the Persistence Layer to serve more data requests without overloading Database servers.

Load Balancing Algorithms

Several popular load balancing algorithms are used by Software-based load balancers. Depending upon the type of system and workloads, a particular scheme may be well suited than others.

Round Robin: In this scheme, the load balancer forwards requests to different nodes sequentially in a round-robin fashion, This is simpler to implement, however, it is not very efficient as it doesn't account for the current load on the server, which may already be loaded when a new request is assigned. Hence is mostly suited for simpler use cases where loads are not processor intensive.
Least Connection: The requests will be forwarded to a server that has the least number of connections or serving the least number of requests at that point in time. This delivers better performance than the round-robin, but again suited not suited for workloads where the single request might load the server.
Least Response Time: This scheme is based upon the time taken by the server to respond to a dummy request sent to a server selected by an algorithm based on the least connections and response time.
Hash-Based: Requests are distributed based upon a defined key example: Request URL, Request IP address.
Custom Load Algorithm: In this scheme, load balancer determines which server to forward the request to based on a combination of server metrics like Memory usage, CPU Usage, Response Time.

Summary

Load balancer has become an essential component of any large scale system as it helps to balance loads across multiple machines. As the systems become more complex, increasingly popular and traffic volume surges, Load balancers act as a traffic cop to route the loads systematically, preventing uneven loads and performance issues on the systems. We discussed different types of Load Balancers and algorithms, in a system design interview based upon the problem at hand a particular configuration may be well suited than the other.

CDN

Kapil Gupta — Wed, 30 Mar 2022 11:18:08 GMT

It’s a network of servers that are distributed geographically and deliver content at fast speeds. It gives quick transfer of static content like files(HTML, js, text, etc.), images, and videos.
CDN plays a vital role in designing modern web applications and serves a majority of web traffic in recent times for sites like Instagram, Twitter, Netflix, Amazon, and many more.

Popular CDN offerings include Cloudflare, Akamai, and AWS Cloudfront.

Working of CDN

When the user requests a webpage or any other content, the content will be delivered from a CDN server close to the user. The more the distance from the CDN server the more time content will take to reach the user and the slower will be load time for the content.

If the content is not available on the CDN, it can be fetched from the backend servers. Content on the CDN can have an expiry set by the application or configuration. When the particular content is requested, it is returned with a TTL (time-to-live) header providing the value of expiry.

For example, if you are located in Asia, data fetched from a server in Europe will take more time than the same data fetched from a server in Asia itself due to the distance the data has to travel.

Similar to static content delivery, nowadays CDNs are also capable of serving dynamic content. To generate the dynamic content, scripts are run at the CDN server rather than the backend server, these scripts generate the content based on some variables and events like time, location, or input data from APIs and cache the content. For example, Cloudflare Workers offer Serverless Javascript functions that are executed on the Cloudflare CDN.

Advantages of using CDN

Faster Load times - Since files are fetched from a server closer to the user, load times are drastically improved. This leads to a better user experience overall, which in turn increases user engagement and reduces bounce rates.
Reduced Infrastructure cost - Since the request does not receive the backend servers, it decreases the load on the servers, hence reducing the infrastructure cost.
Increased Efficiency - By using the techniques like minification and file compression, CDNs reduce the size of files that are transferred to the client. CDN can speed up the TLS/SSL-based sites by connection reuse and TLS false start.
Security - If configured correctly, CDN might help in mitigating security issues like DDoS attacks.

Though there are so many advantages of using CDN, we need to be careful while using a CDN and configuring it. Here are some of the aspects we should keep in mind.

Need - Do we need a CDN?, In case our content is not accessed frequently or refreshed too quickly, then CDN might not be of any use.
Cost - The major CDN providers mentioned in the introduction section have pricing based on the number of requests or the amount of data in/out. Hence we should be careful about the cost to prevent massive billings.
Fallback - Clients should be coded in such a way that if CDN is not available, they should be able to connect to backend servers for the content.
Cache Expiry - We should be mindful while setting the expiry for the content, Have too long of expiry, the content might be stale. Have too short of Expiry and there might be unnecessary reloading of the content from the origin servers to the CDN.

Caching

Kapil Gupta — Wed, 30 Mar 2022 11:07:13 GMT

Caching means storing data into the cache, which is a high-speed low latency data storage. It will store a limited amount of information, that is very likely to be used in the immediate future. Caching layer typically stores the data in memory (or RAM) and hence avoids accessing the secondary memory (disk) which is an expensive operation in terms of time. Caching allows optimal use of available resources by using the principle of locality of reference, according to this principle processors tend to access the same set of memory locations repetitively over a short period of time.

Caching Layers

In modern large scale systems which serve billions of request, depending upon the needs caching layer may exist at multiple levels like at the front end between the load balancer and application server to cache response data of frequent requests, or between the application server and the persistence layer to prevent costly disc access.

Caching at scale

It is easy to maintain and use cache for small scale applications where a single node is applicable, but what happens when we scale the application to multiple nodes. We have the following options:

Global Cache

This is a simpler option where we maintain a single global cache space, which is used by all nodes. For every request, the application queries this cache to retrieve data. A global cache is simpler and effective up to a certain scale, however, it could become a potential performance bottleneck and point of failure, if the number of requests increases beyond the maximum capacity of the cache. This is where the distributed cache comes in.

Distributed cache

For a distributed system application, we need a distributed cache. We distribute the cached data to multiple nodes. When a request comes in, a lookup happens to determine on which machine the data is cached and then that node fulfills the cache request.

A distributed cache is very effective and performant and capable of handling requests for very large scale systems, however, it is more complex to maintain, as it has all the intricacies of a distributed system.

Examples of Cache systems used in Distributed Systems:
Redis
Memcached
Aerospike DBS
Apache Ignite

Cache Write Policies

Before reading the data from a cache, we need to write it to the cache when the data is first requested so that we can later serve the data in less time. The following policies are used to write and maintain data in the cache.

--Write through cache : As the name suggests, In this scheme data is written to a database through a cache, and write operations are confirmed only if write to the database as well as cache are successful. This ensures strong data consistency between cache and persistence layers but makes the write operation costly with respect to time due cache write overhead.
This scheme is suited for systems which are read-heavy and have much fewer write operations than read.

--Write around cache : In this scheme, data is written to the database while bypassing the cache. Hence write operation is faster. However, while reading recent data there will be a cache miss and will result in data being read from the disk.

The write-around scheme is good for use cases where the recently written data is not read frequently.

--Write back cache : In this scheme, data is written only to the cache, and then write operation is confirmed to the client. This results in blazing fast write operations as we don't write the data to the database. Subsequently, data is asynchronously written to the database in a separate thread. All read requests are served from the cache hence strong consistency is maintained.

This scheme is well suited for write-heavy applications, However, this poses a huge risk of data loses in case cache goes down, though the risk of data loss can be minimized through replication across multiple nodes in a distributed system.

Cache Eviction policies

Since caches can hold only a portion of data, we need to discard the data that is no more useful, How do we determine what data is useful and what is not? Following are the most popular cache eviction policies:

-- Least Recently Used (LRU): Removes the items which have least recently used or in other words the items which were used recently are kept. Most popular eviction policy as it provides good performance while maintaining simplicity.
-- Most Recently Used (MRU): Opposite of LRU, this policy discards the most recent items first.
-- Least Frequently Used (LFU): Discards based upon the frequency of items and items with less frequency are evicted in comparison to more frequent ones.
-- First In First Out (FIFO): One of the simpler strategies, where the item which was used first goes out first, like in a queue.
-- Last In First Out (LIFO): The items which were used last go out first, like a stack.
-- Random Replacement (RR): This eviction policy discards a random item.

Besides these classical approaches, there are some modern eviction policies that provide better hit rates and concurrencies like LIRS, Window-TinyLFU, and ARC. More information can be found at this link

Content Delivery Network (CDN)

Content Delivery Networks are powerful systems distributed geographically, which are mostly used to cache and serve static content and media for any application. Requests are first sent to CDN for any static content and served from there. Only in case of a miss request will be directed to the application servers. Due to the distributed and powerful networks, CDNs today serve a majority of web traffic and allows blazing-fast transfer of static media.

ACID, BASE AND CAP Theorem

Kapil Gupta — Wed, 30 Mar 2022 10:58:00 GMT

ACID

ACID is an acronym for Atomicity, Consistency, Isolation, and Durability coined by Andreas Reuter and Theo Härder as a set of properties of database transactions that every database storage engine should strive for. ACID compliance guarantees the validity of data in events of errors, hardware, or power failures.

Atomicity refers to All or nothing which means all the changes which are part of a single transaction are performed or everything is rolled back and none of these changes take effect.

Consistency refers to guarantee that all data that is written to the database will conform to defined schema and constraints at the time of saving the data.

Isolation refers to the ability of a database to isolate data among transactions providing an independent view of the data. Thus if multiple transactions are executed concurrently, these should not interfere or see intermediate or incorrect data and the result should be the same as if they were run sequentially. Isolation levels are configurable in most DBMS, providing control to Database Administrators to decide the level of isolation. Most DBMS provides the following levels of Isolation: Serializable, Repeatable reads, Read committed, Read uncommitted.

Durability refers to the permanent nature of the data that was stored as a part of the transaction once it is successful. Once the transaction is complete the subsequent reads should fetch the last written data.

BASE

BASE refers to Basically available, soft state, eventual consistency.

Basically Available means the system is mostly available and every working node responds to requests in a reasonable amount of time.
Soft State refers that the state of a system might vary over time, even without any new input.
Eventual Consistency refers to the ability of a system to become consistent over some time. The data across different nodes will reflect the same value.

The CAP Theorem

The CAP theorem provided by Eric Brewer in 2000 states that a network-shared system can only guarantee /strongly support two of the following three properties:

Consistency: Data should be sequential consistent(refers to linearizability across all nodes of a distributed system, such that all nodes in a system return same data after a successful write operation.

Availability: Every request served by a non-failing node must result in a response in a reasonable amount of time.

Partition tolerance: In the case of partition failure, System will continue to work and provide consistent data.

The CAP theorem provides a tool to make design choices while building a distributed system. The CAP theorem is, however, an oversimplification, highly misunderstood, and has received negative publicity over the years. This is because any distributed network shared system is inherently prone to Partition failure. One should always try to tradeoff between consistency or availability in case of partition failure. Also in modern distributed systems, the notion of Availability is dependent on the latency of the system. It is not acceptable to call a system available if it takes more than 100 seconds.

Here is an interesting article from Martin Kleppmann, on how CAP theorem uses very narrow definitions that cannot describe modern distributed systems.