Discussion on: What's the difference between a SQL and a NoSQL database?

View post

A SQL database is a database that implements SQL, and almost always a "relational database", which means data is stored in (many) tables, with a fixed set of columns, and a changeable set of rows (the values in the row, and the number of rows, can change, but the names of columns are mostly constant, and every row in the same table has the same set of attributes or columns). Many vendors (oracle, microsoft, ibm, mysql, postgresql) provide a SQL database product, and the SQL language to query and update data has been standardized (and the standard receives updates and maintenance) since the 1980's, so you are only ever lightly locked into any product, you could always ask (in SQL) for all of your data from the database, and push it all (using SQL, again) into another vendor's database system, as long as you didn't rely too heavily on any product-specific extensions. Finding and working with your data will be almost the exact same no matter which system you're using, but managing these systems might be very different.

A NoSQL database usually includes any database system that doesn't use the relational model. Document databases like MongoDB store all of the information about an entity in a single JSON document, making retrieving all of the relevant information about the item a single fetch, graph databases or key-value stores like caches also fit in the NoSQL name. There are a lot of ideas, but not a common shared interface.

One of the key guarantees that most SQL databases give, and many NoSQL databases do not, is the idea of an atomic transaction, basically ensuring that either all of the changes, or none of the changes, in a set of updates, is applied. A good example of when you might want this kind of guarantee would be a bank transfer between two accounts, with a transfer receipt logged to the ledger table. It would be an error to add money to my account while leaving the money in your account, and it would be an error to have removed money from your account without also adding it to mine, and we never want to have moved money without preserving a record of it (with the amount, a timestamp, and both account numbers) - if we can't successfully do all three updates at the same time, we don't want to do any of them. Additionally, no other query should show only part of the changes and not the other parts.

One of the major benefits you can get from relaxing that restriction is much faster updates, at the cost of sometimes getting out of date information. Because you don't need to have strict guarantees about the correctness of the data, you can distribute the system to multiple machines in multiple places. It's also much simpler if you're storing objects in your system to have representations that look like your objects, instead of possibly needing to update many tables when something changes, and searching or aggregating data from many tables when fetching an object from the database.