Monday, August 11, 2014

Why NoSQL databases?


As you probably know, adoption of NoSQL databases is on the rise; and we see more and more NoSQL data stores being developed. So, what is the secret? Why is there a rise in the adoption? Here are some of the key point I think of:

- Some applications do not require all the complicated features offered by RDBMS, instead they want to scale. For example, ranking pages based on search keywords. NoSQL databases scales out (horizontal scaling) better than RDBMS.
- RDBMS is poor at handling unstructured data such as documents, user reviews, comments, etc. Certain NoSQL databases, such as MongoDB, and couchDB, are built specifically for such purposes.
- For rapid development, there is an increased desire to avoid data/schema pre-design. For example, in MongoDB the developer specify the schema in the code itself.
- Simpler APIs to access the data. For example, key-value stores such as Redis and BerkelyDB have a very simple store and fetch methods.
- Most of the NoSQL databases are open-source and works/scales better on commodity servers compared to RDBMS.
- Law latency compared to RDBMS for certain operations.
- Some applications do not require strict consistency guarantees. They can live with weaker eventual consistency (eventually everyone sees the same data). For example, twitter/facebook feeds do not need to be super consistent across all users. NoSQL takes advantage of this fact to scale better.

It's all good with NoSQL and let's switch from RDBMS to NoSQL, or is it not? NoSQL is not for all applications. In fact, not for 90% of the current business use cases. So, when you decide to use NoSQL think twice about the requirements before you jump into it. Here are some pointers to consider before you dive into long term adoption of NoSQL:

Some dangers of NoSQL:
- No schema - NoSQL databases such as MongoDB do not use fixed schemas and the schemas are not defined inside the database; schemas are defined in the code itself. While it looks great to have such flexibility, in the long run, it can create management issues if these schemas are not document properly as there is no fixed schema to look at.
- Denormalized data - maintain relationship in the code; this can create data quality issues
- Inconsistent APIs by different NoSQL providers (compared to SQL) - switching between products is not easy. So, fully understand the capabilities and limitations of the NoSQL database you are going to use before you commit to it.

NoSQL is not for you when you want:
- Strong consistency guarantees
- To execute complex queries (for example, MongoDB does not support transactions, nor joins)
- To maintain normalized data at the database to support integrity of the data.
- Strong compliance and security features - currently NoSQL databases lacks rigorous security controls
- To use transactions

While the adoption of NoSQL databases continue to rise, it is important to understand that one specific NoSQL database or a RDBMS may not meet all your organizations requirements. Depending the on the type of data you are working with (e.g. structural, documents, key-values, etc.) and the type of workload (e.g. OLTP, OLAP, batch processing, search, stream processing, etc.), you may have to utilize multiple databases in your organization. The industry has identified this need and that is why we increasingly hear about polyglot persistence. It basically means that multiple databases are utilized to build applications. We will look into polyglot persistence in another blog post.

No comments: