3-4, 1hr
This commit is contained in:
parent
28b481f821
commit
40f6f6b436
|
@ -172,8 +172,6 @@
|
||||||
**collected**
|
**collected**
|
||||||
- Pairs are **partitioned** to a list of values, each partition sorted
|
- Pairs are **partitioned** to a list of values, each partition sorted
|
||||||
by key
|
by key
|
||||||
- Combiner run on each partition, to combine the key-value pairs with
|
|
||||||
key- list of values
|
|
||||||
- Data is shuffled and sent to each reducer
|
- Data is shuffled and sent to each reducer
|
||||||
- Sort:
|
- Sort:
|
||||||
- Reducer copy data from mapper
|
- Reducer copy data from mapper
|
||||||
|
|
110
3-4-map-reduce-reliability-perf.md
Normal file
110
3-4-map-reduce-reliability-perf.md
Normal file
|
@ -0,0 +1,110 @@
|
||||||
|
# Map Reduce Reliability and Performance
|
||||||
|
|
||||||
|
## Hadoop Performance
|
||||||
|
|
||||||
|
### The concept of speedup:
|
||||||
|
|
||||||
|
- $$S(n) = \frac{time_taken_with_1_processor}{time_taken_with_n_processor}$$
|
||||||
|
- Speedup is problem dependent, as well as architecture dependent
|
||||||
|
- If a job is fully parallelized, it the speed up is linear:
|
||||||
|
[Amdahl's law](/1-4-scalability.md#amdahls-law-important)
|
||||||
|
|
||||||
|
### Calculating speedup
|
||||||
|
|
||||||
|
- Calculating the maximum possible theoretical speedup(ignore IPC and workload
|
||||||
|
imbalance):
|
||||||
|
- According to Amdahl's law, we set $n$ to infinity, thus the speedup
|
||||||
|
becomes: $$S = \frac{1}{\alpha}$$
|
||||||
|
- Which means, if most of the parts are serialized, the speedup is low
|
||||||
|
- The real speedup is lower, because of the communication overhead and workload
|
||||||
|
imbalance
|
||||||
|
|
||||||
|
### Hadoop performane metrics
|
||||||
|
|
||||||
|
- Latency: time between starting a job and delivering output: execution time
|
||||||
|
- Job setup
|
||||||
|
- Reading from HDFS
|
||||||
|
- Lock contention from concurrent tasks
|
||||||
|
- Throughput: the amount of data per second (byte per second)
|
||||||
|
- HDFS throughput(IO or network bound, usually not CPU)
|
||||||
|
- Disk throughput
|
||||||
|
|
||||||
|
### Optimizing for the performance
|
||||||
|
|
||||||
|
- Focus on bottlenecks
|
||||||
|
- Reduce number of key-value pairs, emitted by mapper
|
||||||
|
- Reduce network traffic
|
||||||
|
- Simplify [shuffle and sort](/3-1-map-reduce.md#running)
|
||||||
|
- Sorting in the _shuffle and sort_ stage
|
||||||
|
- Avoid unnecessary java object creation, reuse Writable when possible
|
||||||
|
|
||||||
|
### Problem in load balancing
|
||||||
|
|
||||||
|
- Data skew
|
||||||
|
- Not every task proceess the same amount of data
|
||||||
|
- In general, mappers are less likely to have skew, the splits should be
|
||||||
|
balanced
|
||||||
|
- Reducer are more likely to have skew, since the number of values for a key
|
||||||
|
is hard to predict.
|
||||||
|
- The partition can be trained to provide a balanced speed
|
||||||
|
- For example use a initial sampling of data to train it
|
||||||
|
|
||||||
|
### Summary:
|
||||||
|
|
||||||
|
- Input dataset: size, and number of records
|
||||||
|
- Mapper: number of records, effect of combiner
|
||||||
|
- Reducer: data skew: key with too many records
|
||||||
|
|
||||||
|
## Hadoop Reliability
|
||||||
|
|
||||||
|
### High Availability (HA)
|
||||||
|
|
||||||
|
- A characteristic of a system
|
||||||
|
- To ensure agreed level of operational performance for a higher than normal
|
||||||
|
period
|
||||||
|
- Features
|
||||||
|
- Fault tolerance: System continue to operate correctly on the event of
|
||||||
|
failure
|
||||||
|
- No single point of failure
|
||||||
|
- Graceful degradation: When some component fail, the system temporarily
|
||||||
|
works with worse performance
|
||||||
|
- Calculating availability:
|
||||||
|
[Calculation](/1-4-scalability.md#availaility-important)
|
||||||
|
- Testing: Chaos monkey: open source tool to simulate real-world failure, to
|
||||||
|
test the resilience of it infra
|
||||||
|
- To help testing the potential problems before they become actual problems
|
||||||
|
|
||||||
|
### Error Management
|
||||||
|
|
||||||
|
- Errors are part of a job execution
|
||||||
|
- Examples:
|
||||||
|
- Data integrity
|
||||||
|
- `DataNode` verifies the checksum before writing
|
||||||
|
- clients verify checksum of read blocks
|
||||||
|
- When error found, report it to `NameNode`,
|
||||||
|
- mark the block as corrupt
|
||||||
|
- Clients redirected to other replicas
|
||||||
|
- Replication scheduled
|
||||||
|
- Task failure
|
||||||
|
- Environment problem: java version
|
||||||
|
- Error and hanging occur, mark task as failed, report back to
|
||||||
|
`NodeManager`
|
||||||
|
- `ApplicationMaster` or `ResourceManager` try to re-schedule the task
|
||||||
|
on a different node
|
||||||
|
- Reach maximum number of retries before declaring job failure
|
||||||
|
- `NodeManager` failure
|
||||||
|
- `NodeManager` check the health of node, and report to
|
||||||
|
`ResourceManager`
|
||||||
|
- When it fail, RM can't detect any heartbeat from it
|
||||||
|
- mark as killed, report back to AM
|
||||||
|
- AM try to re-run all the hosted containers in other nodes, after
|
||||||
|
negotiating with RM
|
||||||
|
- Completed map tasks are also rescheduled, since map results are
|
||||||
|
not stored in HDFS
|
||||||
|
- `ResourceManager` or `NameNode` failures
|
||||||
|
- Single point of failure
|
||||||
|
- No way of automatic recovery
|
||||||
|
- Need manual intervention:
|
||||||
|
- Secondary `NameNode` has a backup copy of index table
|
||||||
|
- `ResourceManager` Stores state of jobs, and can re-launch when
|
||||||
|
restarted
|
Loading…
Reference in a new issue