diff --git a/4-3-NOSQL.md b/4-3-NOSQL.md new file mode 100644 index 0000000..da6ec9f --- /dev/null +++ b/4-3-NOSQL.md @@ -0,0 +1,292 @@ +# Cloud Databases + +## Cloud Databases + +### Traditional database + +#### Intro + +- Most traditional DB are relational +- Relational data model has been developed to serve application that focus on + consistency and reliability +- Data consistency and reliability are guaranteed by ACID properties + +### ACID properties + +- Atomicity: No partial transaction, if it fail, rolls back +- Consistency: Changes made by transaction respect all database integrity + constraints, and database remain in consistent state +- Isolation: Transactions can not see changes made by concurrent transactions +- Durability: Changes stored permanently, persistent data + +### NOSQL Database System + +- Not relational database, called (Not Only SQL) +- Features: + - Relax one of ACID properties + - Key-Value data model, for storing data + - schemeless, or scheme lacks the constraints + - Uses data partitioning, has data replication and distribution + - Has no joins, and use data sharding + +### NOSQL vs. relation DB + +| Comparison | NoSQL | Relational Database | +| ----------------------- | ------------------- | ------------------- | +| Type of data | Mainly unstructured | Structured only | +| Volume of data | High volume | Low volume | +| Single point of failure | No | Yes | +| Data access | Many locations | Few locations | + +### Advantages of NOSQL + +- Cheap +- Data replication and partition, leading to no single point of failure +- Easy to distribute +- No schema reqired + +### Issues with relational database + +- Vertical scaling is hard +- Horizontal scaling will break ACID + +## CAP + +### Theorem + +#### Content + +- A distributed system can't have all of the three +- Consistency of storage: having the same value for all copies, every read + receive the most write or error +- Availability: Request received from non-failing node, results in a response +- Partitioning tolerance: System continues to operate despite a number of + messages being dropped by network + +#### NoSQL and CAP + +- Cloud databases's integrity are not expected +- Availability is characterized by a small latency +- No Clear border between availability and network partitions + +#### Trade offs + +- SQL: Strong consistency and availability, but not partition tolerance +- NoSQL, for example Dynamo: availability and partition tolerance, but weak + consistency + - High latency losts revenues, so sacrificed strong consistency + - Now has a model of eventual consistency + +### Relaxing Availability: + +- Not possible, we want our website to be available at all time + +### Consistency + +#### Definition + +- Strong consistency: replicas duplicate linearly in same total order + - No apparent conflicts + - The traditional way + - ACID SQL database are strongly consistent +- Distributed and ACID: Need to form a **concensus** by synchronizing the + replica states + +#### Eventual Consistency + +- May not converge to same total order +- Update accepted by local node, and propagated to other nodes +- No synchronization needed + - Eventually all replicas are updated, if there's no update for a long + enough time + +#### Breakdown of a example + +- TODO: read the diagrams from P25 + +#### Example: Facebook + +- When checking out a story, it may not be available at first, because the + consistency is not yet achieved +- Because it's impossible without eventual consistency, with 2.7 billion users. +- This reduces the load and improve the availability + +#### Example: Dropbox + +- Based on personal syncing, not real time collaboration, so immediate + consistency is not that important +- Eventual consistency, your updates are propagated + +#### Example: ATM + +- Event ATM uses eventual consistency +- Because of higher availability needed +- You can overdraw money, if the machined is partitioned from network +- It has a limit on the amount of withdraw, and you will be charged + +#### Strong consistency + +- Counterpart of Eventual consistency +- All read must return data from latest write +- Very hard to achieve with high availability and high partition tolerance + +## Replication of data + +### Data partitioning and replication + +- Reasons: + - Amount of data exceeds the capacity of a single machine + - Scaling for load balancing + - Ensure reliability and availability by replication +- NoSQL databases usually maximize availability when partitioning +- Techniques to achieve this + - Memory caches + - Separating RW + - Clustering and sharding + +### Caches + +- Memory caches are transient, partly partitioned and replicated in-memory + databases +- Replicates most requested parts to main memories of a number of servers +- Advantages: + - Fast response to clients + - Off-loading of database servers + - Counterpart of Eventual consistency + - All read must return data from latest write + - Very hard to achieve with high availability and high partition tolerance + +### Separating Reads from Writes + +- Master server dedicated to **writes** +- Some replicas servers (Slaves) dedicated to **reads** +- Master replicates updates to slaves +- Error handling: if the master crashes + - Before completing replication: Write operation is lost + - Otherwise: Most up to date slave take the master role + +### Clustering + +- HA (High availability) clusters are **groups** of computers that support + server applications +- Using redundant computers in groups to provide continued service, in case one + of the components fail, to elimimate **Single Point of Failure** (SPOF) +- **Failover**: When an HA cluster detects a fault, the application is + immediately restarted on another system without requiring intervention + +### Sharding + +- Data partitioning scheme, that + - data that is accessed and modified frequently are stored on the same shard + - Each shard is on a different node +- To spread load across different servers +- This technique is not appropriate for relational database, since they are + normalized +- Popular with non-relational databases + +## Cloud DB Services + +### Types of NOSQL, and examples + +- Key value databases: + - Single value or row, indexed by a key + - `Memcached`, `vertica` +- Column Family databases + - A sparse, distributed, persistent multidimensional sorted **map** + - `Google BigTable`, `Apache Cassandra` +- Document databases + - Multi-field documents or objects with JSON access + - `MongoDB` +- Graph databases + - Nodes, edges, properties + - `Neo4J`, `sones` + +### Comparing solutions based on CAP + +- CA: `RDBMS` +- CP: `BigTable`, `MongoDB` +- AP: `Dynamo`, `Cassandra` + +### Examples + +#### Memcached + +- Features + - High performance distributed in-memory caching service + - Similar to hash table, manages objects + - Use standard `get` and `set` +- To serve requests for read-heavy workloads +- Used in `Facebook`, `LinkedIn` +- Example pseudocode: + ```text + public String getFoo(String fooId) + String foo = memcached.get(fooId) + if foo != null return foo + foo = fetchFooFromDatabase(fooId) + memcached.set(fooId, foo) + return foo + ``` +- Size: + - Has set number of bits 128 bits + - Each value no more than 1MB + - LRU policy: when full, new one use old one + - Least used item removed + - This way, frequently accessed items are in cache, regardless of + capacity +- Clustering: + - Multiple nodes, each one of them has different keys + - Servers are not aware there are multiple of them (transparent) + +#### Cassandra + +##### Features: + +- Elastic scaling: read and write increases linearly +- Replication, per node and per data center +- Decentralized +- Column based, multi dimensional hash table without join capabilities +- Consistency is tunable + +##### Consistent hashing: Point on a circle + +- Each database object is mapped to a point on circle, by hashing key +- Node is also mapped to a point on the same circle +- To find an object, it: + - Hashes the object's key to a point on the circle + - Walk clockwise, until it encounters the first node, the node will store + the object +- Benefits: reduce the **impact** of node entering and leaving the ring +- Node changes: redistribute existing data objects + - Leave: node in the clockwise direction will store objects that belong to + left code + - Add: mapped to a point, map objects that should belong to it to the node + +##### Replication + +- Replication factor: how many copies of my data exists: data item replicated at + N hosts, this is associated to a key. And replicated nodes are chosen + clockwise +- Coordinator node: replication of the data items +- Consistency level: how recent and in-sync all replicas of a row of data are. + It prefers availability over consistency, and let you tune how consistent it + would be, replica consistency levels: + - ANY: one replica node, including handoff + - ONE: one node + - QUORUM: N / 2 + 1 + - `LOCAL_QUORUM`: local datacenter + - `EACH_QUORUM`: each datacenter + - ALL: all replicas +- Consistency policies: Rack unaware, rack aware, datacenter aware + +##### Data Model + +- 3D hash table: +- A table: column families +- Column family: row keys inside +- Row key: paired with name-value columns + +## Example + +### EBay + +- Managed by cassandra