|
Abstract
|
Big data are data with special structure that are used to store a large amount of information. To process these data, distributed methods such as MapReduce are required. The structure of MapReduce in software such as Hadoop is consisted of a distributed system with two phases of map and reduce. Effective implementation of orders in this system has decreased running time and consumption of nodes in distributed system. The purpose of this study was to understand how we can use MapReduce in an efficient way to implement hash join in this programming model. As we know, SQL is one of the high-level declarative languages to process queries. Therefore, this study aimed to implement hash join algorithm as one of the important orders of SQL in Hadoop and its query algorithms and investigate their efficiency to provide an algorithm with minimum running time for processing a large volume of data compared to SQL and improve hash join and query speed.
|