Before we dive into the question of why we should use databases, let us consider the question-
Why should we consider a database and not just any other file system?
- Databases store data in data structures like B+ Trees, LSM Trees and SSTs so that it is easy to perform CRUD operations.
- They also provide interfaces like queries or APIs to manipulate the data while if we use a file system, it is likely that we have to parse the data to perform meaningful operations on them which can be very complex.
The various databases available can be grouped into the following two solutions:
They differ fundamentally on the basis of their kind of information they store and the way they store the data.
SQL databases have a structured format (or as we call them, schemas) while NoSQL databases are unstructured and generally follow a distributed architecture. NoSQL was developed as a response to web data, the need for processing unstructured data, and the need for faster processing.
SQL databases store the data in the traditional row and column format. Each row is an entity and the various columns indicate the various properties of the entity.
Examples: MySQL, Postgres
Why so many relations between tables?
- Reduce redundancy in tables.
- Ensure consistency.
- Make operations less expensive.
Why won’t SQL databases solve our every problem?
- Because they are not horizontally scalable by a huge factor.
- Joins become very expensive as the data grows and then stored across multiple devices.
NoSQL databases do not follow the traditional follow the pattern of storing data in rows and columns. They store the data as key-value pairs (DynamoDB), documents (MongoDB) or using data structures like graphs (InfiniteGraph). The data in NoSQL is mostly partitioned horizontally by default which ensures availability.
Key differences between SQL and NoSQL databases
As we mentioned earlier, SQL databases store data in form of rows and columns while NoSQL databases store data in form of key-value pairs, documents or various data structures like graphs.
SQL databases or relational databases have to adhere to a strict schema. This essentially means that for each row data has to exist in each of its columns. The schema can be altered but it’s not hassle free.
NoSQL databases on the other hand offer dynamic schemas. This means that for each ‘row’, some columns may or may not exist and they can be added in real time.
Relational databases offer Structured Query Language to access and modify data on the tables whereas the NoSQL databases generally offer APIs to modify and query the database.
SQL databases are generally scaled vertically which makes maintaining it a much costlier affair than maintaining a NoSQL database which can be scaled horizontally.
- Reliability or ACID Compliancy (Atomicity, Consistency, Isolation, Durability)
When operating and handling traffic on a large scale one has to often choose either scalability & performance or reliability. SQL databases trade reliability for scale and performance, i.e., they guarantee the ACID properties. NoSQL databases on the other hand offer high scalability and performance by sacrificing ACID properties.
Some key features of NoSQL databases:
- They are generally distributed by design.
- Each host stores almost equal amount of data so the storage stays balanced.
- Data is generally replicated across multiple hosts so that in the event of an outage, the whole system does not go down.
- They abide by the CAP theorem.
- They have no fixed schema.
P.S. God forbid this happens, but if you need joins in a NoSQL environment then you’ll have to write your own custom logic.
Hey! Should I go for SQL or NoSQL?
If everyone had the same shoe size, the world be a much simpler place! The trade-offs for the use case should be taken into account before a decision is taken to use either of the databases.
When should I use a SQL Database?
- If a system has to be designed where the ACID properties are important. E-Commerce websites that handle payment gateways and deal with a database must use a SQL database.
- The scale of the traffic handled is not very high.
- The data has a consistent schema.
- Joins are important in the performed operations.
When should I use a NoSQL Database?
- If the scale at which the systems operate is very large and the databases are the only bottleneck in scaling up the system.
- When your data has no particular structure and adding or deleting ‘columns’ might have a regular occurrence.
- If you’re in a team that focuses on rapid development. NoSQL databases do not need to be prepped ahead of time and thus allow the schema to modified on the fly.
- When joins are not preferred and should be avoided.
NoSQL bases are not read optimised because the whole blob is read contrary to the fact that SQL databases only read the required data.
Choose a good sort key while making a NoSQL database.