add 3-2, took 2hr and more
This commit is contained in:
parent
dd6c4dad47
commit
b56bb2278e
|
@ -165,14 +165,23 @@
|
||||||
|
|
||||||
#### Running
|
#### Running
|
||||||
|
|
||||||
- Shuffle and sort
|
- Shuffle, moving data from mappers to reducers and sort, ordering outputs
|
||||||
- Every intermediate key-value pairs generated by mapper are collected
|
before being processed by reducer
|
||||||
- Same key items are grouped to a list of values
|
- Shuffle:
|
||||||
- Data is shuffled and sent to each reducer
|
- Every intermediate key-value pairs generated by mapper are
|
||||||
- Data provided to each reducer is sorted by keys
|
**collected**
|
||||||
|
- Pairs are **partitioned** to a list of values, each partition sorted
|
||||||
|
by key
|
||||||
|
- Combiner run on each partition, to combine the key-value pairs with
|
||||||
|
key- list of values
|
||||||
|
- Data is shuffled and sent to each reducer
|
||||||
|
- Sort:
|
||||||
|
- Reducer copy data from mapper
|
||||||
|
- Downloaded data are **merged** and **sorted** as input for reducer, key-list
|
||||||
|
of values, sorted by keys
|
||||||
- Runtime
|
- Runtime
|
||||||
- Partitions input
|
- Partitions input
|
||||||
- schedules executions
|
- Schedules executions
|
||||||
- Handles load balancing
|
- Handles load balancing
|
||||||
- Shuffle, partition, sort intermediate data
|
- Shuffle, partition, sort intermediate data
|
||||||
- Handle failure
|
- Handle failure
|
||||||
|
|
211
3-2-hadoop.md
Normal file
211
3-2-hadoop.md
Normal file
|
@ -0,0 +1,211 @@
|
||||||
|
# Hadoop Architecture
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- Designed to run in clusters of pcs
|
||||||
|
- Scales up linearly
|
||||||
|
- Suitable for local networks or data centers
|
||||||
|
- Design principles
|
||||||
|
- Data is distributed around the network
|
||||||
|
- Computation is sent to data: code is sent to run on nodes
|
||||||
|
- Basic architecture is mater / worker
|
||||||
|
- Offers the following:
|
||||||
|
- Redundant, fault tolerant data storage
|
||||||
|
- Parallel computation framework
|
||||||
|
- Job coordination
|
||||||
|
|
||||||
|
## Structure of a MapReduce job
|
||||||
|
|
||||||
|
- **Job**: a program to be executed across the **entire** dataset
|
||||||
|
- Packaged as a jar file, with all the code needed
|
||||||
|
- Job is assigned a cluster-unique ID
|
||||||
|
- Data attached to job is replicated over the entire internet
|
||||||
|
- **Task**: an execution on a slice of data
|
||||||
|
- **Task Attempt**: An instance of a local execution
|
||||||
|
|
||||||
|
## Job execution flow
|
||||||
|
|
||||||
|
- Split data into computing chunks
|
||||||
|
- Assign a chunk to node manager
|
||||||
|
- Run many mappers
|
||||||
|
- [Shuffle and sort](/3-1-map-reduce.md#running)
|
||||||
|
- Run many reducers
|
||||||
|
- Result from reducers create the job output
|
||||||
|
|
||||||
|
## The Optional Combiner
|
||||||
|
|
||||||
|
- The bottleneck in map-reduce frameworks:
|
||||||
|
- Map and reduce jobs scale close to linearly, so they are not
|
||||||
|
- Potential bottleneck at the shuffle and sort operations (between map and
|
||||||
|
reduce)
|
||||||
|
- Data need to be copied over network
|
||||||
|
- Lots of keys are emitted by mapper, and sorting they are costly
|
||||||
|
- Combiner can be used to execute before shffling and sorting
|
||||||
|
- Reason
|
||||||
|
- Acts as a preliminary reducer
|
||||||
|
- Executed at each mapper node, just before sending all the pairs for
|
||||||
|
shuffling
|
||||||
|
- **Reduces** amount of data emitted by mapper, to improve efficiency
|
||||||
|
- Restrictions and rules:
|
||||||
|
- **Cannot** be mandatory, the job should work the correctly without it
|
||||||
|
- **Idempotent**: The number of time the combiner is applied shouldn't
|
||||||
|
change the output
|
||||||
|
- No **Side effect**, or they won't be idempotent
|
||||||
|
- **Preserve** the keys: can't change the keys to disrupt the **sort**
|
||||||
|
order, or changing the **partitioning**
|
||||||
|
- Example of using reducer code as the combiner:
|
||||||
|
```java
|
||||||
|
public void Combine(String key, List<Integer> values) {
|
||||||
|
int sum = 0;
|
||||||
|
for (Integer count: values){
|
||||||
|
sum+=count;
|
||||||
|
}
|
||||||
|
emit(key, sum);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
- TODO review the combiner diagram
|
||||||
|
|
||||||
|
## Apache Hadoop
|
||||||
|
|
||||||
|
### Architecture of Hadoop
|
||||||
|
|
||||||
|
- Executes on nodes connected by network
|
||||||
|
- Each node runs a set of daemons
|
||||||
|
- Computing:
|
||||||
|
- `ResourceManager`
|
||||||
|
- `NodeManager`
|
||||||
|
- Storage:
|
||||||
|
- `NameNode`
|
||||||
|
- `SecondaryNameNode` as backup
|
||||||
|
- `DataNode`
|
||||||
|
- Nodes are in Master Slave architecture
|
||||||
|
- Master node: `NameNode`, `ResourceManager`
|
||||||
|
- Aware of slave nodes
|
||||||
|
- Receives external requests
|
||||||
|
- Decide the work split of slaves
|
||||||
|
- Notify slaves
|
||||||
|
- Slave node, also called Worker node: `DataNode`, `NodeManager`
|
||||||
|
- Executes the tasks, received from master
|
||||||
|
|
||||||
|
### What Hadoop Does
|
||||||
|
|
||||||
|
- Resource Management: the existence and availability of resources
|
||||||
|
- Job Allocation: needed resources for job, and the split of work
|
||||||
|
- Job Execution: Run job, make sure it's completed, deal with failures
|
||||||
|
|
||||||
|
## Job execution: YARN
|
||||||
|
|
||||||
|
### Intro
|
||||||
|
|
||||||
|
- Estimate how many map and reduce tasks are needed for a job, based on _input
|
||||||
|
dataset_ and _job definition_
|
||||||
|
- Ideally, one different node for each map / reduce tasks
|
||||||
|
|
||||||
|
### Deciding the number of workers
|
||||||
|
|
||||||
|
#### Mapper Parallelization
|
||||||
|
|
||||||
|
- Different input split are processed on each mapper
|
||||||
|
- Input data size is known
|
||||||
|
- Number of mappers: Input size $/$ Split Size
|
||||||
|
- If input size is small, and has many files, they won't be splitted and
|
||||||
|
will use more mappers
|
||||||
|
|
||||||
|
#### Reducer
|
||||||
|
|
||||||
|
- Number of reducer is **user defined**, since it's hard to figure out
|
||||||
|
automatically
|
||||||
|
- Keys are partitioned, partitioning too much lead to overhead in shuffle
|
||||||
|
and sort
|
||||||
|
|
||||||
|
### Execution Daemons
|
||||||
|
|
||||||
|
#### `ResourceManager`
|
||||||
|
|
||||||
|
- On master: one per cluster
|
||||||
|
- Responsibility:
|
||||||
|
- Receive job requests from client
|
||||||
|
- Create a `ApplicationMaster` per job to manage it
|
||||||
|
- Allocate **container** in slave nodes, with the assigned resources
|
||||||
|
- Provision the health of `NodeManager` nodes
|
||||||
|
|
||||||
|
#### `NodeManager`
|
||||||
|
|
||||||
|
- Responsibility:
|
||||||
|
- Coordinate the execution of tasks on node
|
||||||
|
- Send health information to `ResourceManager`
|
||||||
|
|
||||||
|
#### `ApplicationMaster`
|
||||||
|
|
||||||
|
- Only one per job
|
||||||
|
- Responsibility: Job allocation and job execution
|
||||||
|
- Implements specific framework, for example MapReduce
|
||||||
|
- Negitiates with `ResourceManager` on the resources required
|
||||||
|
- Decides which node will run which job, in the container, given by
|
||||||
|
`ResourceManager`
|
||||||
|
- Destroyed when job completed
|
||||||
|
|
||||||
|
## Storage: HDFS
|
||||||
|
|
||||||
|
### Definition
|
||||||
|
|
||||||
|
- Hadoop Distributed File System
|
||||||
|
- This is the storage for Hadoop's input and output
|
||||||
|
- Features:
|
||||||
|
- Tailored for MapReduce jobs
|
||||||
|
- Large Block size (64MB)
|
||||||
|
- Not a POSIX compliant file system
|
||||||
|
|
||||||
|
### Data distribution: key element of map reduce
|
||||||
|
|
||||||
|
- Job code (jars) moved to where data is stored
|
||||||
|
- Blocks are replicated on the cluster, by default **three** times, to ensure
|
||||||
|
**reliability**
|
||||||
|
|
||||||
|
### Storage daemon
|
||||||
|
|
||||||
|
- `DataNode`: many per cluster
|
||||||
|
- Stores block from HDFS
|
||||||
|
- Report the blocks stored to `NameNode`
|
||||||
|
- `NameNode`: one per cluster
|
||||||
|
- Keep index and location for every block
|
||||||
|
- Don't do computation, because this is heavy
|
||||||
|
- Single point of failure
|
||||||
|
- `SecondaryNameNode`
|
||||||
|
- Communicates directly with a `NameNode`
|
||||||
|
- Store backup of index table
|
||||||
|
|
||||||
|
### Data Replication
|
||||||
|
|
||||||
|
- Format: csv, example:
|
||||||
|
|
||||||
|
| `Filename` | `numReplicas` | `block-ids` |
|
||||||
|
| ---------- | ------------- | ----------- |
|
||||||
|
| part-0 | r:2 | {1,3} |
|
||||||
|
| part-1 | r:3 | {2,4,5} |
|
||||||
|
|
||||||
|
- Definition: Creating and maintaining multiple copies of data, across different
|
||||||
|
nodes in the HDFS
|
||||||
|
- Significance
|
||||||
|
- Fault tolerance
|
||||||
|
- Data availability
|
||||||
|
- System Reliability
|
||||||
|
- Support Parallel Processing
|
||||||
|
|
||||||
|
### Failure recovery
|
||||||
|
|
||||||
|
- Identifying failure: by missing the hearbeat signal
|
||||||
|
- Replication: NameNode initializes replication once a node failure is detected
|
||||||
|
- Maintaining Integrity: use a predetermined replication factor
|
||||||
|
- Mitigating Potential Disruptions: Dynamic data management
|
||||||
|
|
||||||
|
### Operation
|
||||||
|
|
||||||
|
- TODO: See the graph in p59
|
||||||
|
- Input data:
|
||||||
|
- Mappers are assigned input splits from HDFS input path (default 64MB)
|
||||||
|
- Data locality: `ApplicationMaster` try to assign mapper where data is
|
||||||
|
stored
|
||||||
|
- Output data:
|
||||||
|
- Copied to HDFS, one file per reducer
|
||||||
|
- Replicate
|
Loading…
Reference in a new issue