Optimizing Hadoop Block Placement Policy and Cluster Blocks Distribution
Authors: Nchimbi Edward Pius, Liu Qin, Fion Yang, Zhu Hong Ming
Abstract:
The current Hadoop block placement policy do not fairly and evenly distributes replicas of blocks written to datanodes in a Hadoop cluster.
This paper presents a new solution that helps to keep the cluster in a balanced state while an HDFS client is writing data to a file in Hadoop cluster. The solution had been implemented, and test had been conducted to evaluate its contribution to Hadoop distributed file system.
It has been found that, the solution has lowered global execution time taken by Hadoop balancer to 22 percent. It also has been found that, Hadoop balancer respectively over replicate 1.75 and 3.3 percent of all re-distributed blocks in the modified and original Hadoop clusters.
The feature that keeps the cluster in a balanced state works as a core part to Hadoop system and not just as a utility like traditional balancer. This is one of the significant achievements and uniqueness of the solution developed during the course of this research work.
Keywords: Balancer, Datanode, Distributed file system, Hadoop, Replicas.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1335698
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4970References:
[1] http://www.aosabook.org/en/hdfs.html
[2] Tom White, "Hadoop: The definitive guide”, 2nd ed., O’REILLY, pp. 304.
[3] Brad Hedlund, "Understanding Hadoop Cluster and the Network”, www.bradhedlund.com