Search results for: imbalanced data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24135

Search results for: imbalanced data

24105 The Analysis of Changes in Urban Hierarchy of Isfahan Province in the Fifty-Year Period (1956-2006)

Authors: Hamidreza Joudaki, Yousefali Ziari

Abstract:

The appearance of city and urbanism is one of the important processes which have affected social communities. Being industrialized urbanism developed along with each other in the history. In addition, they have had simple relationship for more than six thousand years, that is, from the appearance of the first cities. In 18th century by coming out of industrial capitalism, progressive development took place in urbanism in the world. In Iran, the city of each region made its decision by itself and the capital of region (downtown) was the only central part and also the regional city without any hierarchy, controlled its realm. However, this method of ruling during these three decays, because of changing in political, social and economic issues that have caused changes in rural and urban relationship. Moreover, it has changed the variety of performance of cities and systematic urban network in Iran. Today, urban system has very vast imbalanced apace and performance. In Isfahan, the trend of urbanism is like the other part of Iran and systematic urban hierarchy is not suitable and normal. This article is a quantitative and analytical. The statistical communities are Isfahan Province cities and the changes in urban network and its hierarchy during the period of fifty years (1956 -2006) has been surveyed. In addition, those data have been analyzed by model of Rank and size and Entropy index. In this article Iran cities and also the factor of entropy of primate city and urban hierarchy of Isfahan Province have been introduced. Urban residents of this Province have been reached from 55 percent to 83% (2006). As we see the analytical data reflects that there is mismatching and imbalance between cities. Because the entropy index was.91 in 1956.And it decreased to.63 in 2006. Isfahan city is the primate city in the whole of these periods. Moreover, the second and the third cities have population gap with regard to the other cities and finally, they do not follow the system of rank-size.

Keywords: urban network, urban hierarchy, primate city, Isfahan province, urbanism, first cities

Procedia PDF Downloads 221
24104 Unbalanced Mean-Time and Buffer Effects in Lines Suffering Breakdown

Authors: Sabry Shaaban, Tom McNamara, Sarah Hudson

Abstract:

This article studies the performance of unpaced serial production lines that are subject to breakdown and are imbalanced in terms of both of their processing time means (MTs) and buffer storage capacities (BCs). Simulation results show that the best pattern in terms of throughput is a balanced line with respect to average buffer level; the best configuration is a monotone decreasing MT order, together with an ascending BC arrangement. Statistical analysis shows that BC, patterns of MT and BC imbalance, line length and degree of imbalance all contribute significantly to performance. Results show that unbalanced lines cope well with unreliability.

Keywords: unreliable unpaced serial lines, simulation, unequal mean operation times, uneven buffer capacities, patterns of imbalance, throughput, average buffer level

Procedia PDF Downloads 440
24103 A Monte Carlo Fuzzy Logistic Regression Framework against Imbalance and Separation

Authors: Georgios Charizanos, Haydar Demirhan, Duygu Icen

Abstract:

Two of the most impactful issues in classical logistic regression are class imbalance and complete separation. These can result in model predictions heavily leaning towards the imbalanced class on the binary response variable or over-fitting issues. Fuzzy methodology offers key solutions for handling these problems. However, most studies propose the transformation of the binary responses into a continuous format limited within [0,1]. This is called the possibilistic approach within fuzzy logistic regression. Following this approach is more aligned with straightforward regression since a logit-link function is not utilized, and fuzzy probabilities are not generated. In contrast, we propose a method of fuzzifying binary response variables that allows for the use of the logit-link function; hence, a probabilistic fuzzy logistic regression model with the Monte Carlo method. The fuzzy probabilities are then classified by selecting a fuzzy threshold. Different combinations of fuzzy and crisp input, output, and coefficients are explored, aiming to understand which of these perform better under different conditions of imbalance and separation. We conduct numerical experiments using both synthetic and real datasets to demonstrate the performance of the fuzzy logistic regression framework against seven crisp machine learning methods. The proposed framework shows better performance irrespective of the degree of imbalance and presence of separation in the data, while the considered machine learning methods are significantly impacted.

Keywords: fuzzy logistic regression, fuzzy, logistic, machine learning

Procedia PDF Downloads 35
24102 Association Between Malnutrition and Dental Caries in Children

Authors: Mohammed Khalid Mahmood, Delphine Tardivo, Romain Lan

Abstract:

Dental caries is one of the most common diseases in the world, affecting billions of people and significantly lowering the quality of life. Malnutrition, on the other hand, is defined as inadequate, imbalanced, or excessive consumption of macronutrients, micronutrients, or both, which is characterized as an abnormal physiological condition. Oral health is impacted by malnutrition, and malnutrition can result from poor oral health. The objective of this paper was to study the association of serum Vitamin D level and body mass index as representatives of malnutrition at micro and macro levels, respectively, on dental caries. Results showed that: 1. The majority of the population studied (70%) are Vitamin D deficient. 2. Having a normal and even a sufficient level of serum Vitamin D and having a normal body mass index increase the chances of children being caries-free and having a lower caries index.

Keywords: children, dental Caries, malnutrition, vitamin D

Procedia PDF Downloads 46
24101 Using Autoencoder as Feature Extractor for Malware Detection

Authors: Umm-E-Hani, Faiza Babar, Hanif Durad

Abstract:

Malware-detecting approaches suffer many limitations, due to which all anti-malware solutions have failed to be reliable enough for detecting zero-day malware. Signature-based solutions depend upon the signatures that can be generated only when malware surfaces at least once in the cyber world. Another approach that works by detecting the anomalies caused in the environment can easily be defeated by diligently and intelligently written malware. Solutions that have been trained to observe the behavior for detecting malicious files have failed to cater to the malware capable of detecting the sandboxed or protected environment. Machine learning and deep learning-based approaches greatly suffer in training their models with either an imbalanced dataset or an inadequate number of samples. AI-based anti-malware solutions that have been trained with enough samples targeted a selected feature vector, thus ignoring the input of leftover features in the maliciousness of malware just to cope with the lack of underlying hardware processing power. Our research focuses on producing an anti-malware solution for detecting malicious PE files by circumventing the earlier-mentioned shortcomings. Our proposed framework, which is based on automated feature engineering through autoencoders, trains the model over a fairly large dataset. It focuses on the visual patterns of malware samples to automatically extract the meaningful part of the visual pattern. Our experiment has successfully produced a state-of-the-art accuracy of 99.54 % over test data.

Keywords: malware, auto encoders, automated feature engineering, classification

Procedia PDF Downloads 43
24100 Rank-Based Chain-Mode Ensemble for Binary Classification

Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu

Abstract:

In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.

Keywords: consensus, curse of correlation, imbalance classification, rank-based chain-mode ensemble

Procedia PDF Downloads 106
24099 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 304
24098 Farmers' Perspective on Soil Health in the Indian Punjab: A Quantitative Analysis of Major Soil Parameters

Authors: Sukhwinder Singh, Julian Park, Dinesh Kumar Benbi

Abstract:

Although soil health, which is recognized as one of the key determinants of sustainable agricultural development, can be measured by a range of physical, chemical and biological parameters, the widely used parameters include pH, electrical conductivity (EC), organic carbon (OC), plant available phosphorus (P) and potassium (K). Soil health is largely affected by the occurrence of natural events or human activities and can be improved by various land management practices. A database of 120 soil samples collected from farmers’ fields spread across three major agro-climatic zones of Punjab suggested that the average pH, EC, OC, P and K was 8.2 (SD = 0.75, Min = 5.5, Max = 9.1), 0.27 dS/m (SD = 0.17, Min = 0.072 dS/m, Max = 1.22 dS/m), 0.49% (SD = 0.20, Min = 0.06%, Max = 1.2%), 19 mg/kg soil (SD = 22.07, Min = 3 mg/kg soil, Max = 207 mg/kg soil) and 171 mg/kg soil (SD = 47.57, Min = 54 mg/kg soil, Max = 288 mg/kg soil), respectively. Region-wise, pH, EC and K were the highest in south-western district of Ferozpur whereas farmers in north-eastern district of Gurdaspur had the best soils in terms of OC and P. The soils in the central district of Barnala had lower OC, P and K than the respective overall averages while its soils were normal but skewed towards alkalinity. Besides agro-climatic conditions, the size of landholding and farmer education showed a significant association with Soil Fertility Index (SFI), a composite index calculated using the aforementioned parameters’ normalized weightage. All the four stakeholder groups cited the current cropping patterns, burning of rice crop residue, and imbalanced use of chemical fertilizers for change in soil health. However, the current state of soil health in Punjab is unclear, which needs further investigation based on temporal data collected from the same field to see the short and long-term impacts of various crop combinations and varied cropping intensity levels on soil health.

Keywords: soil health, punjab agriculture, sustainability, soil fertility index

Procedia PDF Downloads 333
24097 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: big data, learning analytics, analytics, big data in education, Hadoop

Procedia PDF Downloads 380
24096 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 510
24095 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 532
24094 A Systematic Review on Measuring the Physical Activity Level and Pattern in Persons with Chronic Fatigue Syndrome

Authors: Kuni Vergauwen, Ivan P. J. Huijnen, Astrid Depuydt, Jasmine Van Regenmortel, Mira Meeus

Abstract:

A lower activity level and imbalanced activity pattern are frequently observed in persons with chronic fatigue syndrome (CFS) / myalgic encephalomyelitis (ME) due to debilitating fatigue and post-exertional malaise (PEM). Identification of measurement instruments to evaluate the activity level and pattern is therefore important. The objective is to identify measurement instruments suited to evaluate the activity level and/or pattern in patients with CFS/ME and review their psychometric properties. A systematic literature search was performed in the electronic databases PubMed and Web of Science until 12 October 2016. Articles including relevant measurement instruments were identified and included for further analysis. The psychometric properties of relevant measurement instruments were extracted from the included articles and rated based on the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist. The review was performed and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. A total of 49 articles and 15 unique measurement instruments were found, but only three instruments were evaluated in patients with CFS/ME: the Chronic Fatigue Syndrome-Activity Questionnaire (CFS-AQ), Activity Pattern Interview (API) and International Physical Activity Questionnaire-Short Form (IPAQ-SF), three self-report instruments measuring the physical activity level. The IPAQ-SF, CFS-AQ and API are all equally capable of evaluating the physical activity level, but none of the three measurement instruments are optimal to use. No studies about the psychometric properties of activity monitors in patients with CFS/ME were found, although they are often used as the gold standard to measure the physical activity pattern. More research is needed to evaluate the psychometric properties of existing instruments, including the use of activity monitors.

Keywords: chronic fatigue syndrome, data collection, physical activity, psychometrics

Procedia PDF Downloads 196
24093 Rotor Dynamic Analysis for a Shaft Train by Using Finite Element Method

Authors: M. Najafi

Abstract:

In the present paper, a large turbo-generator shaft train including a heavy-duty gas turbine engine, a coupling, and a generator is established. The method of analysis is based on finite element simplified model for lateral and torsional vibration calculation. The basic elements of rotor are the shafts and the disks which are represented as circular cross section flexible beams and rigid body elements, respectively. For more accurate results, the gyroscopic effect and bearing dynamics coefficients and function of rotation are taken into account, and for the influence of shear effect, rotor has been modeled in the form of Timoshenko beam. Lateral critical speeds, critical speed map, damped mode shapes, Campbell diagram, zones of instability, amplitudes, phase angles response due to synchronous forces of excitation and amplification factor are calculated. Also, in the present paper, the effect of imbalanced rotor and effects of changing in internal force and temperature are studied.

Keywords: rotor dynamic analysis, finite element method, shaft train, Campbell diagram

Procedia PDF Downloads 114
24092 A Machine Learning Approach for Detecting and Locating Hardware Trojans

Authors: Kaiwen Zheng, Wanting Zhou, Nan Tang, Lei Li, Yuanhang He

Abstract:

The integrated circuit industry has become a cornerstone of the information society, finding widespread application in areas such as industry, communication, medicine, and aerospace. However, with the increasing complexity of integrated circuits, Hardware Trojans (HTs) implanted by attackers have become a significant threat to their security. In this paper, we proposed a hardware trojan detection method for large-scale circuits. As HTs introduce physical characteristic changes such as structure, area, and power consumption as additional redundant circuits, we proposed a machine-learning-based hardware trojan detection method based on the physical characteristics of gate-level netlists. This method transforms the hardware trojan detection problem into a machine-learning binary classification problem based on physical characteristics, greatly improving detection speed. To address the problem of imbalanced data, where the number of pure circuit samples is far less than that of HTs circuit samples, we used the SMOTETomek algorithm to expand the dataset and further improve the performance of the classifier. We used three machine learning algorithms, K-Nearest Neighbors, Random Forest, and Support Vector Machine, to train and validate benchmark circuits on Trust-Hub, and all achieved good results. In our case studies based on AES encryption circuits provided by trust-hub, the test results showed the effectiveness of the proposed method. To further validate the method’s effectiveness for detecting variant HTs, we designed variant HTs using open-source HTs. The proposed method can guarantee robust detection accuracy in the millisecond level detection time for IC, and FPGA design flows and has good detection performance for library variant HTs.

Keywords: hardware trojans, physical properties, machine learning, hardware security

Procedia PDF Downloads 104
24091 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

Procedia PDF Downloads 366
24090 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach

Authors: Theertha Chandroth

Abstract:

This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.

Keywords: XML, JSON, data comparison, integration testing, Python, SQL

Procedia PDF Downloads 83
24089 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 347
24088 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 487
24087 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 439
24086 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 366
24085 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 608
24084 Assessment of Soil Quality Indicators in Rice Soil of Tamil Nadu

Authors: Kaleeswari R. K., Seevagan L .

Abstract:

Soil quality in an agroecosystem is influenced by the cropping system, water and soil fertility management. A valid soil quality index would help to assess the soil and crop management practices for desired productivity and soil health. The soil quality indices also provide an early indication of soil degradation and needy remedial and rehabilitation measures. Imbalanced fertilization and inadequate organic carbon dynamics deteriorate soil quality in an intensive cropping system. The rice soil ecosystem is different from other arable systems since rice is grown under submergence, which requires a different set of key soil attributes for enhancing soil quality and productivity. Assessment of the soil quality index involves indicator selection, indicator scoring and comprehensive score into one index. The most appropriate indicator to evaluate soil quality can be selected by establishing the minimum data set, which can be screened by linear and multiple regression factor analysis and score function. This investigation was carried out in intensive rice cultivating regions (having >1.0 lakh hectares) of Tamil Nadu viz., Thanjavur, Thiruvarur, Nagapattinam, Villupuram, Thiruvannamalai, Cuddalore and Ramanathapuram districts. In each district, intensive rice growing block was identified. In each block, two sampling grids (10 x 10 sq.km) were used with a sampling depth of 10 – 15 cm. Using GIS coordinates, and soil sampling was carried out at various locations in the study area. The number of soil sampling points were 41, 28, 28, 32, 37, 29 and 29 in Thanjavur, Thiruvarur, Nagapattinam, Cuddalore, Villupuram, Thiruvannamalai and Ramanathapuram districts, respectively. Principal Component Analysis is a data reduction tool to select some of the potential indicators. Principal Component is a linear combination of different variables that represents the maximum variance of the dataset. Principal Component that has eigenvalues equal or higher than 1.0 was taken as the minimum data set. Principal Component Analysis was used to select the representative soil quality indicators in rice soils based on factor loading values and contribution percent values. Variables having significant differences within the production system were used for the preparation of the minimum data set. Each Principal Component explained a certain amount of variation (%) in the total dataset. This percentage provided the weight for variables. The final Principal Component Analysis based soil quality equation is SQI = ∑ i=1 (W ᵢ x S ᵢ); where S- score for the subscripted variable; W-weighing factor derived from PCA. Higher index scores meant better soil quality. Soil respiration, Soil available Nitrogen and Potentially Mineralizable Nitrogen were assessed as soil quality indicators in rice soil of the Cauvery Delta zone covering Thanjavur, Thiruvavur and Nagapattinam districts. Soil available phosphorus could be used as a soil quality indicator of rice soils in the Cuddalore district. In rain-fed rice ecosystems of coastal sandy soil, DTPA – Zn could be used as an effective soil quality indicator. Among the soil parameters selected from Principal Component Analysis, Microbial Biomass Nitrogen could be used quality indicator for rice soils of the Villupuram district. Cauvery Delta zone has better SQI as compared with other intensive rice growing zone of Tamil Nadu.

Keywords: soil quality index, soil attributes, soil mapping, and rice soil

Procedia PDF Downloads 51
24083 Energy Efficiency Analysis of Discharge Modes of an Adiabatic Compressed Air Energy Storage System

Authors: Shane D. Inder, Mehrdad Khamooshi

Abstract:

Efficient energy storage is a crucial factor in facilitating the uptake of renewable energy resources. Among the many options available for energy storage systems required to balance imbalanced supply and demand cycles, compressed air energy storage (CAES) is a proven technology in grid-scale applications. This paper reviews the current state of micro scale CAES technology and describes a micro-scale advanced adiabatic CAES (A-CAES) system, where heat generated during compression is stored for use in the discharge phase. It will also describe a thermodynamic model, developed in EES (Engineering Equation Solver) to evaluate the performance and critical parameters of the discharge phase of the proposed system. Three configurations are explained including: single turbine without preheater, two turbines with preheaters, and three turbines with preheaters. It is shown that the micro-scale A-CAES is highly dependent upon key parameters including; regulator pressure, air pressure and volume, thermal energy storage temperature and flow rate and the number of turbines. It was found that a micro-scale AA-CAES, when optimized with an appropriate configuration, could deliver energy input to output efficiency of up to 70%.

Keywords: CAES, adiabatic compressed air energy storage, expansion phase, micro generation, thermodynamic

Procedia PDF Downloads 284
24082 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 343
24081 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 127
24080 World on the Edge: Migration and Cross Border Crimes in West Africa

Authors: Adeyemi Kamil Hamzah

Abstract:

The contiguity of nations in international system suggests that world is a composite of socio-economic unit with people exploring and exploiting the potentials in the world via migrations. Thus, cross border migration has made positive contributions to social and economic development of individuals and nations by increasing the household incomes of the host countries. However, the cross border migrations in West Africa are becoming part of a dynamic and unstable world migration system. This is due to the nature and consequences of trans-border crimes in West Africa, with both short and long term effects on the socio-economic viability of developing countries like West African States. The paper identified that migration influenced cross-border crimes as well as the high spate of insurgencies in the sub-region. Furthermore, the consequential effect of a global village has imbalanced population flows, making some countries host and parasites to others. Also, stern and deft cross-border rules and regulations, as well as territorial security and protections, ameliorate cross border crimes and migration in West African sub-regions. Therefore, the study concluded that cross border migration is the linchpin of all kinds of criminal activities which affect the security of states in the sub-region.

Keywords: cross-border migration, border crimes, security, West Africa, development, globalisation

Procedia PDF Downloads 187
24079 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

Procedia PDF Downloads 179
24078 A Machine Learning Decision Support Framework for Industrial Engineering Purposes

Authors: Anli Du Preez, James Bekker

Abstract:

Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.

Keywords: Data analytics, Industrial engineering, Machine learning, Value creation

Procedia PDF Downloads 136
24077 Prediction of Coronary Artery Stenosis Severity Based on Machine Learning Algorithms

Authors: Yu-Jia Jian, Emily Chia-Yu Su, Hui-Ling Hsu, Jian-Jhih Chen

Abstract:

Coronary artery is the major supplier of myocardial blood flow. When fat and cholesterol are deposit in the coronary arterial wall, narrowing and stenosis of the artery occurs, which may lead to myocardial ischemia and eventually infarction. According to the World Health Organization (WHO), estimated 740 million people have died of coronary heart disease in 2015. According to Statistics from Ministry of Health and Welfare in Taiwan, heart disease (except for hypertensive diseases) ranked the second among the top 10 causes of death from 2013 to 2016, and it still shows a growing trend. According to American Heart Association (AHA), the risk factors for coronary heart disease including: age (> 65 years), sex (men to women with 2:1 ratio), obesity, diabetes, hypertension, hyperlipidemia, smoking, family history, lack of exercise and more. We have collected a dataset of 421 patients from a hospital located in northern Taiwan who received coronary computed tomography (CT) angiography. There were 300 males (71.26%) and 121 females (28.74%), with age ranging from 24 to 92 years, and a mean age of 56.3 years. Prior to coronary CT angiography, basic data of the patients, including age, gender, obesity index (BMI), diastolic blood pressure, systolic blood pressure, diabetes, hypertension, hyperlipidemia, smoking, family history of coronary heart disease and exercise habits, were collected and used as input variables. The output variable of the prediction module is the degree of coronary artery stenosis. The output variable of the prediction module is the narrow constriction of the coronary artery. In this study, the dataset was randomly divided into 80% as training set and 20% as test set. Four machine learning algorithms, including logistic regression, stepwise regression, neural network and decision tree, were incorporated to generate prediction results. We used area under curve (AUC) / accuracy (Acc.) to compare the four models, the best model is neural network, followed by stepwise logistic regression, decision tree, and logistic regression, with 0.68 / 79 %, 0.68 / 74%, 0.65 / 78%, and 0.65 / 74%, respectively. Sensitivity of neural network was 27.3%, specificity was 90.8%, stepwise Logistic regression sensitivity was 18.2%, specificity was 92.3%, decision tree sensitivity was 13.6%, specificity was 100%, logistic regression sensitivity was 27.3%, specificity 89.2%. From the result of this study, we hope to improve the accuracy by improving the module parameters or other methods in the future and we hope to solve the problem of low sensitivity by adjusting the imbalanced proportion of positive and negative data.

Keywords: decision support, computed tomography, coronary artery, machine learning

Procedia PDF Downloads 202
24076 Providing Security to Private Cloud Using Advanced Encryption Standard Algorithm

Authors: Annapureddy Srikant Reddy, Atthanti Mahendra, Samala Chinni Krishna, N. Neelima

Abstract:

In our present world, we are generating a lot of data and we, need a specific device to store all these data. Generally, we store data in pen drives, hard drives, etc. Sometimes we may loss the data due to the corruption of devices. To overcome all these issues, we implemented a cloud space for storing the data, and it provides more security to the data. We can access the data with just using the internet from anywhere in the world. We implemented all these with the java using Net beans IDE. Once user uploads the data, he does not have any rights to change the data. Users uploaded files are stored in the cloud with the file name as system time and the directory will be created with some random words. Cloud accepts the data only if the size of the file is less than 2MB.

Keywords: cloud space, AES, FTP, NetBeans IDE

Procedia PDF Downloads 172