293 lines
9 KiB
Markdown
293 lines
9 KiB
Markdown
# Cloud Databases
|
|
|
|
## Cloud Databases
|
|
|
|
### Traditional database
|
|
|
|
#### Intro
|
|
|
|
- Most traditional DB are relational
|
|
- Relational data model has been developed to serve application that focus on
|
|
consistency and reliability
|
|
- Data consistency and reliability are guaranteed by ACID properties
|
|
|
|
### ACID properties
|
|
|
|
- Atomicity: No partial transaction, if it fail, rolls back
|
|
- Consistency: Changes made by transaction respect all database integrity
|
|
constraints, and database remain in consistent state
|
|
- Isolation: Transactions can not see changes made by concurrent transactions
|
|
- Durability: Changes stored permanently, persistent data
|
|
|
|
### NOSQL Database System
|
|
|
|
- Not relational database, called (Not Only SQL)
|
|
- Features:
|
|
- Relax one of ACID properties
|
|
- Key-Value data model, for storing data
|
|
- schemeless, or scheme lacks the constraints
|
|
- Uses data partitioning, has data replication and distribution
|
|
- Has no joins, and use data sharding
|
|
|
|
### NOSQL vs. relation DB
|
|
|
|
| Comparison | NoSQL | Relational Database |
|
|
| ----------------------- | ------------------- | ------------------- |
|
|
| Type of data | Mainly unstructured | Structured only |
|
|
| Volume of data | High volume | Low volume |
|
|
| Single point of failure | No | Yes |
|
|
| Data access | Many locations | Few locations |
|
|
|
|
### Advantages of NOSQL
|
|
|
|
- Cheap
|
|
- Data replication and partition, leading to no single point of failure
|
|
- Easy to distribute
|
|
- No schema reqired
|
|
|
|
### Issues with relational database
|
|
|
|
- Vertical scaling is hard
|
|
- Horizontal scaling will break ACID
|
|
|
|
## CAP
|
|
|
|
### Theorem
|
|
|
|
#### Content
|
|
|
|
- A distributed system can't have all of the three
|
|
- Consistency of storage: having the same value for all copies, every read
|
|
receive the most write or error
|
|
- Availability: Request received from non-failing node, results in a response
|
|
- Partitioning tolerance: System continues to operate despite a number of
|
|
messages being dropped by network
|
|
|
|
#### NoSQL and CAP
|
|
|
|
- Cloud databases's integrity are not expected
|
|
- Availability is characterized by a small latency
|
|
- No Clear border between availability and network partitions
|
|
|
|
#### Trade offs
|
|
|
|
- SQL: Strong consistency and availability, but not partition tolerance
|
|
- NoSQL, for example Dynamo: availability and partition tolerance, but weak
|
|
consistency
|
|
- High latency losts revenues, so sacrificed strong consistency
|
|
- Now has a model of eventual consistency
|
|
|
|
### Relaxing Availability:
|
|
|
|
- Not possible, we want our website to be available at all time
|
|
|
|
### Consistency
|
|
|
|
#### Definition
|
|
|
|
- Strong consistency: replicas duplicate linearly in same total order
|
|
- No apparent conflicts
|
|
- The traditional way
|
|
- ACID SQL database are strongly consistent
|
|
- Distributed and ACID: Need to form a **concensus** by synchronizing the
|
|
replica states
|
|
|
|
#### Eventual Consistency
|
|
|
|
- May not converge to same total order
|
|
- Update accepted by local node, and propagated to other nodes
|
|
- No synchronization needed
|
|
- Eventually all replicas are updated, if there's no update for a long
|
|
enough time
|
|
|
|
#### Breakdown of a example
|
|
|
|
- TODO: read the diagrams from P25
|
|
|
|
#### Example: Facebook
|
|
|
|
- When checking out a story, it may not be available at first, because the
|
|
consistency is not yet achieved
|
|
- Because it's impossible without eventual consistency, with 2.7 billion users.
|
|
- This reduces the load and improve the availability
|
|
|
|
#### Example: Dropbox
|
|
|
|
- Based on personal syncing, not real time collaboration, so immediate
|
|
consistency is not that important
|
|
- Eventual consistency, your updates are propagated
|
|
|
|
#### Example: ATM
|
|
|
|
- Event ATM uses eventual consistency
|
|
- Because of higher availability needed
|
|
- You can overdraw money, if the machined is partitioned from network
|
|
- It has a limit on the amount of withdraw, and you will be charged
|
|
|
|
#### Strong consistency
|
|
|
|
- Counterpart of Eventual consistency
|
|
- All read must return data from latest write
|
|
- Very hard to achieve with high availability and high partition tolerance
|
|
|
|
## Replication of data
|
|
|
|
### Data partitioning and replication
|
|
|
|
- Reasons:
|
|
- Amount of data exceeds the capacity of a single machine
|
|
- Scaling for load balancing
|
|
- Ensure reliability and availability by replication
|
|
- NoSQL databases usually maximize availability when partitioning
|
|
- Techniques to achieve this
|
|
- Memory caches
|
|
- Separating RW
|
|
- Clustering and sharding
|
|
|
|
### Caches
|
|
|
|
- Memory caches are transient, partly partitioned and replicated in-memory
|
|
databases
|
|
- Replicates most requested parts to main memories of a number of servers
|
|
- Advantages:
|
|
- Fast response to clients
|
|
- Off-loading of database servers
|
|
- Counterpart of Eventual consistency
|
|
- All read must return data from latest write
|
|
- Very hard to achieve with high availability and high partition tolerance
|
|
|
|
### Separating Reads from Writes
|
|
|
|
- Master server dedicated to **writes**
|
|
- Some replicas servers (Slaves) dedicated to **reads**
|
|
- Master replicates updates to slaves
|
|
- Error handling: if the master crashes
|
|
- Before completing replication: Write operation is lost
|
|
- Otherwise: Most up to date slave take the master role
|
|
|
|
### Clustering
|
|
|
|
- HA (High availability) clusters are **groups** of computers that support
|
|
server applications
|
|
- Using redundant computers in groups to provide continued service, in case one
|
|
of the components fail, to elimimate **Single Point of Failure** (SPOF)
|
|
- **Failover**: When an HA cluster detects a fault, the application is
|
|
immediately restarted on another system without requiring intervention
|
|
|
|
### Sharding
|
|
|
|
- Data partitioning scheme, that
|
|
- data that is accessed and modified frequently are stored on the same shard
|
|
- Each shard is on a different node
|
|
- To spread load across different servers
|
|
- This technique is not appropriate for relational database, since they are
|
|
normalized
|
|
- Popular with non-relational databases
|
|
|
|
## Cloud DB Services
|
|
|
|
### Types of NOSQL, and examples
|
|
|
|
- Key value databases:
|
|
- Single value or row, indexed by a key
|
|
- `Memcached`, `vertica`
|
|
- Column Family databases
|
|
- A sparse, distributed, persistent multidimensional sorted **map**
|
|
- `Google BigTable`, `Apache Cassandra`
|
|
- Document databases
|
|
- Multi-field documents or objects with JSON access
|
|
- `MongoDB`
|
|
- Graph databases
|
|
- Nodes, edges, properties
|
|
- `Neo4J`, `sones`
|
|
|
|
### Comparing solutions based on CAP
|
|
|
|
- CA: `RDBMS`
|
|
- CP: `BigTable`, `MongoDB`
|
|
- AP: `Dynamo`, `Cassandra`
|
|
|
|
### Examples
|
|
|
|
#### Memcached
|
|
|
|
- Features
|
|
- High performance distributed in-memory caching service
|
|
- Similar to hash table, manages objects
|
|
- Use standard `get` and `set`
|
|
- To serve requests for read-heavy workloads
|
|
- Used in `Facebook`, `LinkedIn`
|
|
- Example pseudocode:
|
|
```text
|
|
public String getFoo(String fooId)
|
|
String foo = memcached.get(fooId)
|
|
if foo != null return foo
|
|
foo = fetchFooFromDatabase(fooId)
|
|
memcached.set(fooId, foo)
|
|
return foo
|
|
```
|
|
- Size:
|
|
- Has set number of bits 128 bits
|
|
- Each value no more than 1MB
|
|
- LRU policy: when full, new one use old one
|
|
- Least used item removed
|
|
- This way, frequently accessed items are in cache, regardless of
|
|
capacity
|
|
- Clustering:
|
|
- Multiple nodes, each one of them has different keys
|
|
- Servers are not aware there are multiple of them (transparent)
|
|
|
|
#### Cassandra
|
|
|
|
##### Features:
|
|
|
|
- Elastic scaling: read and write increases linearly
|
|
- Replication, per node and per data center
|
|
- Decentralized
|
|
- Column based, multi dimensional hash table without join capabilities
|
|
- Consistency is tunable
|
|
|
|
##### Consistent hashing: Point on a circle
|
|
|
|
- Each database object is mapped to a point on circle, by hashing key
|
|
- Node is also mapped to a point on the same circle
|
|
- To find an object, it:
|
|
- Hashes the object's key to a point on the circle
|
|
- Walk clockwise, until it encounters the first node, the node will store
|
|
the object
|
|
- Benefits: reduce the **impact** of node entering and leaving the ring
|
|
- Node changes: redistribute existing data objects
|
|
- Leave: node in the clockwise direction will store objects that belong to
|
|
left code
|
|
- Add: mapped to a point, map objects that should belong to it to the node
|
|
|
|
##### Replication
|
|
|
|
- Replication factor: how many copies of my data exists: data item replicated at
|
|
N hosts, this is associated to a key. And replicated nodes are chosen
|
|
clockwise
|
|
- Coordinator node: replication of the data items
|
|
- Consistency level: how recent and in-sync all replicas of a row of data are.
|
|
It prefers availability over consistency, and let you tune how consistent it
|
|
would be, replica consistency levels:
|
|
- ANY: one replica node, including handoff
|
|
- ONE: one node
|
|
- QUORUM: N / 2 + 1
|
|
- `LOCAL_QUORUM`: local datacenter
|
|
- `EACH_QUORUM`: each datacenter
|
|
- ALL: all replicas
|
|
- Consistency policies: Rack unaware, rack aware, datacenter aware
|
|
|
|
##### Data Model
|
|
|
|
- 3D hash table:
|
|
- A table: column families
|
|
- Column family: row keys inside
|
|
- Row key: paired with name-value columns
|
|
|
|
## Example
|
|
|
|
### EBay
|
|
|
|
- Managed by cassandra
|