Big Data Analysis of Households Income and Expenditure by Applying Hadoop Distributed File System

Alipour, Reza; Entezari Maleki, Reza

[Home ] [Archive]

[ فارسی ]

Ijoss Iranian Journal of Official Statistics Studies

Main Menu

Home

Journal Information

Articles archive

For Authors

For Reviewers

Registration

Contact us

Site Facilities

Search in website

Receive site information

	All	Since 2021
Citations	17	3
h-index	2	1
i10-index	1	0

Volume 32, Issue 1 (8-2021)

مجله‌ی بررسی‌ها 2021, 32(1): 97-123

Back to browse issues page

Big Data Analysis of Households Income and Expenditure by Applying Hadoop Distributed File System

Reza Alipour ^*

, Reza Entezari Maleki

Iran University of Science and Technology

Abstract: (2034 Views)

Big data is one of the most important resources in today's world, from which valuable information and knowledge is obtained by using various analyzes that are performed on it. Over the last two decades, the volume of this data has been expanding and its volume is gradually increasing. The Hadoop framework for distributing and processing metadata is one of the most widely used tools written in the Java programming language. Hadoop is a convenient tool that allows the processing of large data sets with clustering and facilitates the management of semi-structured and unstructured data.

In Iran, as in other countries, Household data is collected every year in the field of official statistics. These data contain valuable information, the results of which are published only in the whole country and province, and so far no results and information have been extracted in the city. The purpose of this study is to use the Hadoop framework for the distribution and processing of household data in the cities of the province, then the extracted information is used for analysis.

Based on the proposed model, data clustering of 31 provinces of the country was done in 4 clusters and 4 virtual machine servers with 4 nodes were considered. The raw data was converted from sql to csv and uploaded into HDFS files and then Map/Reduce operations were performed. Therefore, based on the objectives of this research, the outputs such as the average communication expenditure of a household and the Internet indicator at the level of the cities of 01 province were extracted and the comparisons were also shown.

It is obvious that the same information and indicators can be extracted and analyzed at a wider level, at the level of other cities of other provinces and even at the village level. According to the results of this research, it is suggested that by using the Hadoop distributed file system, household data can be prepared faster than in the past, which is now collected centrally, offline and with a delay. By providing timely outputs and information, faster and better analyzes can be performed than in the past. It is also suggested that by using the Hadoop distributed system, it will be possible to establish a relationship between the extracted annual household information at the city level with the population census information of the country and fill the statistical gap and household access indicators.

Keywords: Hadoop framework, Hadoop Distributed File System, MapReduce, Big Data, Household Data.

Full-Text [PDF 1055 kb] (396 Downloads)

Type of Study: Research | Subject: Special
Received: 2024/06/12 | Accepted: 2021/08/24 | Published: 2024/12/8

Send email to the article author

Mendeley

Zotero

RefWorks

Alipour R, Entezari Maleki R. Big Data Analysis of Households Income and Expenditure by Applying Hadoop Distributed File System. مجله‌ی بررسی‌ها 2021; 32 (1) :97-123
URL: http://ijoss.srtc.ac.ir/article-1-469-en.html

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Volume 32, Issue 1 (8-2021)

Back to browse issues page

Persian site map - English site map - Created in 0.21 seconds with 38 queries by YEKTAWEB 4758