# Cloud Databases ## Cloud Databases ### Traditional database #### Intro - Most traditional DB are relational - Relational data model has been developed to serve application that focus on consistency and reliability - Data consistency and reliability are guaranteed by ACID properties ### ACID properties - Atomicity: No partial transaction, if it fail, rolls back - Consistency: Changes made by transaction respect all database integrity constraints, and database remain in consistent state - Isolation: Transactions can not see changes made by concurrent transactions - Durability: Changes stored permanently, persistent data ### NOSQL Database System - Not relational database, called (Not Only SQL) - Features: - Relax one of ACID properties - Key-Value data model, for storing data - schemeless, or scheme lacks the constraints - Uses data partitioning, has data replication and distribution - Has no joins, and use data sharding ### NOSQL vs. relation DB | Comparison | NoSQL | Relational Database | | ----------------------- | ------------------- | ------------------- | | Type of data | Mainly unstructured | Structured only | | Volume of data | High volume | Low volume | | Single point of failure | No | Yes | | Data access | Many locations | Few locations | ### Advantages of NOSQL - Cheap - Data replication and partition, leading to no single point of failure - Easy to distribute - No schema reqired ### Issues with relational database - Vertical scaling is hard - Horizontal scaling will break ACID ## CAP ### Theorem #### Content - A distributed system can't have all of the three - Consistency of storage: having the same value for all copies, every read receive the most write or error - Availability: Request received from non-failing node, results in a response - Partitioning tolerance: System continues to operate despite a number of messages being dropped by network #### NoSQL and CAP - Cloud databases's integrity are not expected - Availability is characterized by a small latency - No Clear border between availability and network partitions #### Trade offs - SQL: Strong consistency and availability, but not partition tolerance - NoSQL, for example Dynamo: availability and partition tolerance, but weak consistency - High latency losts revenues, so sacrificed strong consistency - Now has a model of eventual consistency ### Relaxing Availability: - Not possible, we want our website to be available at all time ### Consistency #### Definition - Strong consistency: replicas duplicate linearly in same total order - No apparent conflicts - The traditional way - ACID SQL database are strongly consistent - Distributed and ACID: Need to form a **concensus** by synchronizing the replica states #### Eventual Consistency - May not converge to same total order - Update accepted by local node, and propagated to other nodes - No synchronization needed - Eventually all replicas are updated, if there's no update for a long enough time #### Breakdown of a example - TODO: read the diagrams from P25 #### Example: Facebook - When checking out a story, it may not be available at first, because the consistency is not yet achieved - Because it's impossible without eventual consistency, with 2.7 billion users. - This reduces the load and improve the availability #### Example: Dropbox - Based on personal syncing, not real time collaboration, so immediate consistency is not that important - Eventual consistency, your updates are propagated #### Example: ATM - Event ATM uses eventual consistency - Because of higher availability needed - You can overdraw money, if the machined is partitioned from network - It has a limit on the amount of withdraw, and you will be charged #### Strong consistency - Counterpart of Eventual consistency - All read must return data from latest write - Very hard to achieve with high availability and high partition tolerance ## Replication of data ### Data partitioning and replication - Reasons: - Amount of data exceeds the capacity of a single machine - Scaling for load balancing - Ensure reliability and availability by replication - NoSQL databases usually maximize availability when partitioning - Techniques to achieve this - Memory caches - Separating RW - Clustering and sharding ### Caches - Memory caches are transient, partly partitioned and replicated in-memory databases - Replicates most requested parts to main memories of a number of servers - Advantages: - Fast response to clients - Off-loading of database servers ### Separating Reads from Writes - Master server dedicated to **writes** - Some replicas servers (Slaves) dedicated to **reads** - Master replicates updates to slaves - Error handling: if the master crashes - Before completing replication: Write operation is lost - Otherwise: Most up to date slave take the master role ### Clustering - HA (High availability) clusters are **groups** of computers that support server applications - Using redundant computers in groups to provide continued service, in case one of the components fail, to elimimate **Single Point of Failure** (SPOF) - **Failover**: When an HA cluster detects a fault, the application is immediately restarted on another system without requiring intervention ### Sharding - Data partitioning scheme, that - data that is accessed and modified frequently are stored on the same shard - Each shard is on a different node - To spread load across different servers - This technique is not appropriate for relational database, since they are normalized - Popular with non-relational databases ## Cloud DB Services ### Types of NOSQL, and examples - Key value databases: - Single value or row, indexed by a key - `Memcached`, `vertica` - Column Family databases - A sparse, distributed, persistent multidimensional sorted **map** - `Google BigTable`, `Apache Cassandra` - Document databases - Multi-field documents or objects with JSON access - `MongoDB` - Graph databases - Nodes, edges, properties - `Neo4J`, `sones` ### Comparing solutions based on CAP - CA: `RDBMS` - CP: `BigTable`, `MongoDB` - AP: `Dynamo`, `Cassandra` ### Examples #### Memcached - Features - High performance distributed in-memory caching service - Similar to hash table, manages objects - Use standard `get` and `set` - To serve requests for read-heavy workloads - Used in `Facebook`, `LinkedIn` - Example pseudocode: ```text public String getFoo(String fooId) String foo = memcached.get(fooId) if foo != null return foo foo = fetchFooFromDatabase(fooId) memcached.set(fooId, foo) return foo ``` - Size: - Has set number of bits 128 bits - Each value no more than 1MB - LRU policy: when full, new one use old one - Least used item removed - This way, frequently accessed items are in cache, regardless of capacity - Clustering: - Multiple nodes, each one of them has different keys - Servers are not aware there are multiple of them (transparent): **Smart Clients & Dumb Servers** - Client has function that maps keys to servers, while - Servers are not aware there are multiple of them #### Cassandra ##### Features: - Elastic scaling: read and write increases linearly - Replication, per node and per data center - Decentralized - Column based, multi dimensional hash table without join capabilities - Consistency is tunable ##### Consistent hashing: Point on a circle - Each database object is mapped to a point on circle, by hashing key - Node is also mapped to a point on the same circle - To find an object, it: - Hashes the object's key to a point on the circle - Walk clockwise, until it encounters the first node, the node will store the object - Benefits: reduce the **impact** of node entering and leaving the ring - Node changes: redistribute existing data objects - Leave: node in the clockwise direction will store objects that belong to left code - Add: mapped to a point, map objects that should belong to it to the node ##### Replication - Replication factor: how many copies of my data exists: data item replicated at N hosts, this is associated to a key. And replicated nodes are chosen clockwise - Coordinator node: replication of the data items - Consistency level: how recent and in-sync all replicas of a row of data are. It prefers availability over consistency, and let you tune how consistent it would be, replica consistency levels: - ANY: one replica node, including handoff need to reply, in order for write to success - ONE: one node - QUORUM: N / 2 + 1 - `LOCAL_QUORUM`: local datacenter - `EACH_QUORUM`: each datacenter - ALL: all replicas - Consistency policies: Rack unaware, rack aware, datacenter aware ##### Data Model - 3D hash table: - A table: column families - Column family: row keys inside - Row key: paired with name-value columns ## Example ### EBay - Managed by cassandra