# Map Reduce Reliability and Performance ## Hadoop Performance ### The concept of speedup: - $$S(n) = \frac{time_taken_with_1_processor}{time_taken_with_n_processor}$$ - Speedup is problem dependent, as well as architecture dependent - If a job is fully parallelized, it the speed up is linear: [Amdahl's law](/1-4-scalability.md#amdahls-law-important) ### Calculating speedup - Calculating the maximum possible theoretical speedup(ignore IPC and workload imbalance): - According to Amdahl's law, we set $n$ to infinity, thus the speedup becomes: $$S = \frac{1}{\alpha}$$ - Which means, if most of the parts are serialized, the speedup is low - The real speedup is lower, because of the communication overhead and workload imbalance ### Hadoop performane metrics - Latency: time between starting a job and delivering output: execution time - Job setup - Reading from HDFS - Lock contention from concurrent tasks - Throughput: the amount of data per second (byte per second) - HDFS throughput(IO or network bound, usually not CPU) - Disk throughput ### Optimizing for the performance - Focus on bottlenecks - Reduce number of key-value pairs, emitted by mapper - Reduce network traffic - Simplify [shuffle and sort](/3-1-map-reduce.md#running) - Sorting in the _shuffle and sort_ stage - Avoid unnecessary java object creation, reuse Writable when possible ### Problem in load balancing - Data skew - Not every task proceess the same amount of data - In general, mappers are less likely to have skew, the splits should be balanced - Reducer are more likely to have skew, since the number of values for a key is hard to predict. - The partition can be trained to provide a balanced speed - For example use a initial sampling of data to train it ### Summary: - Input dataset: size, and number of records - Mapper: number of records, effect of combiner - Reducer: data skew: key with too many records ## Hadoop Reliability ### High Availability (HA) - A characteristic of a system - To ensure agreed level of operational performance for a higher than normal period - Features - Fault tolerance: System continue to operate correctly on the event of failure - No single point of failure - Graceful degradation: When some component fail, the system temporarily works with worse performance - Calculating availability: [Calculation](/1-4-scalability.md#availaility-important) - Testing: Chaos monkey: open source tool to simulate real-world failure, to test the resilience of it infra - To help testing the potential problems before they become actual problems ### Error Management - Errors are part of a job execution - Examples: - Data integrity - `DataNode` verifies the checksum before writing - clients verify checksum of read blocks - When error found, report it to `NameNode`, - mark the block as corrupt - Clients redirected to other replicas - Replication scheduled - Task failure - Environment problem: java version - Error and hanging occur, mark task as failed, report back to `NodeManager` - `ApplicationMaster` or `ResourceManager` try to re-schedule the task on a different node - Reach maximum number of retries before declaring job failure - `NodeManager` failure - `NodeManager` check the health of node, and report to `ResourceManager` - When it fail, RM can't detect any heartbeat from it - mark as killed, report back to AM - AM try to re-run all the hosted containers in other nodes, after negotiating with RM - Completed map tasks are also rescheduled, since map results are not stored in HDFS - `ResourceManager` or `NameNode` failures - Single point of failure - No way of automatic recovery - Need manual intervention: - Secondary `NameNode` has a backup copy of index table - `ResourceManager` Stores state of jobs, and can re-launch when restarted