3-4, 1hr
This commit is contained in:
parent
28b481f821
commit
40f6f6b436
|
@ -172,8 +172,6 @@
|
|||
**collected**
|
||||
- Pairs are **partitioned** to a list of values, each partition sorted
|
||||
by key
|
||||
- Combiner run on each partition, to combine the key-value pairs with
|
||||
key- list of values
|
||||
- Data is shuffled and sent to each reducer
|
||||
- Sort:
|
||||
- Reducer copy data from mapper
|
||||
|
|
110
3-4-map-reduce-reliability-perf.md
Normal file
110
3-4-map-reduce-reliability-perf.md
Normal file
|
@ -0,0 +1,110 @@
|
|||
# Map Reduce Reliability and Performance
|
||||
|
||||
## Hadoop Performance
|
||||
|
||||
### The concept of speedup:
|
||||
|
||||
- $$S(n) = \frac{time_taken_with_1_processor}{time_taken_with_n_processor}$$
|
||||
- Speedup is problem dependent, as well as architecture dependent
|
||||
- If a job is fully parallelized, it the speed up is linear:
|
||||
[Amdahl's law](/1-4-scalability.md#amdahls-law-important)
|
||||
|
||||
### Calculating speedup
|
||||
|
||||
- Calculating the maximum possible theoretical speedup(ignore IPC and workload
|
||||
imbalance):
|
||||
- According to Amdahl's law, we set $n$ to infinity, thus the speedup
|
||||
becomes: $$S = \frac{1}{\alpha}$$
|
||||
- Which means, if most of the parts are serialized, the speedup is low
|
||||
- The real speedup is lower, because of the communication overhead and workload
|
||||
imbalance
|
||||
|
||||
### Hadoop performane metrics
|
||||
|
||||
- Latency: time between starting a job and delivering output: execution time
|
||||
- Job setup
|
||||
- Reading from HDFS
|
||||
- Lock contention from concurrent tasks
|
||||
- Throughput: the amount of data per second (byte per second)
|
||||
- HDFS throughput(IO or network bound, usually not CPU)
|
||||
- Disk throughput
|
||||
|
||||
### Optimizing for the performance
|
||||
|
||||
- Focus on bottlenecks
|
||||
- Reduce number of key-value pairs, emitted by mapper
|
||||
- Reduce network traffic
|
||||
- Simplify [shuffle and sort](/3-1-map-reduce.md#running)
|
||||
- Sorting in the _shuffle and sort_ stage
|
||||
- Avoid unnecessary java object creation, reuse Writable when possible
|
||||
|
||||
### Problem in load balancing
|
||||
|
||||
- Data skew
|
||||
- Not every task proceess the same amount of data
|
||||
- In general, mappers are less likely to have skew, the splits should be
|
||||
balanced
|
||||
- Reducer are more likely to have skew, since the number of values for a key
|
||||
is hard to predict.
|
||||
- The partition can be trained to provide a balanced speed
|
||||
- For example use a initial sampling of data to train it
|
||||
|
||||
### Summary:
|
||||
|
||||
- Input dataset: size, and number of records
|
||||
- Mapper: number of records, effect of combiner
|
||||
- Reducer: data skew: key with too many records
|
||||
|
||||
## Hadoop Reliability
|
||||
|
||||
### High Availability (HA)
|
||||
|
||||
- A characteristic of a system
|
||||
- To ensure agreed level of operational performance for a higher than normal
|
||||
period
|
||||
- Features
|
||||
- Fault tolerance: System continue to operate correctly on the event of
|
||||
failure
|
||||
- No single point of failure
|
||||
- Graceful degradation: When some component fail, the system temporarily
|
||||
works with worse performance
|
||||
- Calculating availability:
|
||||
[Calculation](/1-4-scalability.md#availaility-important)
|
||||
- Testing: Chaos monkey: open source tool to simulate real-world failure, to
|
||||
test the resilience of it infra
|
||||
- To help testing the potential problems before they become actual problems
|
||||
|
||||
### Error Management
|
||||
|
||||
- Errors are part of a job execution
|
||||
- Examples:
|
||||
- Data integrity
|
||||
- `DataNode` verifies the checksum before writing
|
||||
- clients verify checksum of read blocks
|
||||
- When error found, report it to `NameNode`,
|
||||
- mark the block as corrupt
|
||||
- Clients redirected to other replicas
|
||||
- Replication scheduled
|
||||
- Task failure
|
||||
- Environment problem: java version
|
||||
- Error and hanging occur, mark task as failed, report back to
|
||||
`NodeManager`
|
||||
- `ApplicationMaster` or `ResourceManager` try to re-schedule the task
|
||||
on a different node
|
||||
- Reach maximum number of retries before declaring job failure
|
||||
- `NodeManager` failure
|
||||
- `NodeManager` check the health of node, and report to
|
||||
`ResourceManager`
|
||||
- When it fail, RM can't detect any heartbeat from it
|
||||
- mark as killed, report back to AM
|
||||
- AM try to re-run all the hosted containers in other nodes, after
|
||||
negotiating with RM
|
||||
- Completed map tasks are also rescheduled, since map results are
|
||||
not stored in HDFS
|
||||
- `ResourceManager` or `NameNode` failures
|
||||
- Single point of failure
|
||||
- No way of automatic recovery
|
||||
- Need manual intervention:
|
||||
- Secondary `NameNode` has a backup copy of index table
|
||||
- `ResourceManager` Stores state of jobs, and can re-launch when
|
||||
restarted
|
Loading…
Reference in a new issue