Search results for: imbalanced data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24722

Search results for: imbalanced data

24692 The Analysis of Changes in Urban Hierarchy of Isfahan Province in the Fifty-Year Period (1956-2006)

Authors: Hamidreza Joudaki, Yousefali Ziari

Abstract:

The appearance of city and urbanism is one of the important processes which have affected social communities. Being industrialized urbanism developed along with each other in the history. In addition, they have had simple relationship for more than six thousand years, that is, from the appearance of the first cities. In 18th century by coming out of industrial capitalism, progressive development took place in urbanism in the world. In Iran, the city of each region made its decision by itself and the capital of region (downtown) was the only central part and also the regional city without any hierarchy, controlled its realm. However, this method of ruling during these three decays, because of changing in political, social and economic issues that have caused changes in rural and urban relationship. Moreover, it has changed the variety of performance of cities and systematic urban network in Iran. Today, urban system has very vast imbalanced apace and performance. In Isfahan, the trend of urbanism is like the other part of Iran and systematic urban hierarchy is not suitable and normal. This article is a quantitative and analytical. The statistical communities are Isfahan Province cities and the changes in urban network and its hierarchy during the period of fifty years (1956 -2006) has been surveyed. In addition, those data have been analyzed by model of Rank and size and Entropy index. In this article Iran cities and also the factor of entropy of primate city and urban hierarchy of Isfahan Province have been introduced. Urban residents of this Province have been reached from 55 percent to 83% (2006). As we see the analytical data reflects that there is mismatching and imbalance between cities. Because the entropy index was.91 in 1956.And it decreased to.63 in 2006. Isfahan city is the primate city in the whole of these periods. Moreover, the second and the third cities have population gap with regard to the other cities and finally, they do not follow the system of rank-size.

Keywords: urban network, urban hierarchy, primate city, Isfahan province, urbanism, first cities

Procedia PDF Downloads 248
24691 Unbalanced Mean-Time and Buffer Effects in Lines Suffering Breakdown

Authors: Sabry Shaaban, Tom McNamara, Sarah Hudson

Abstract:

This article studies the performance of unpaced serial production lines that are subject to breakdown and are imbalanced in terms of both of their processing time means (MTs) and buffer storage capacities (BCs). Simulation results show that the best pattern in terms of throughput is a balanced line with respect to average buffer level; the best configuration is a monotone decreasing MT order, together with an ascending BC arrangement. Statistical analysis shows that BC, patterns of MT and BC imbalance, line length and degree of imbalance all contribute significantly to performance. Results show that unbalanced lines cope well with unreliability.

Keywords: unreliable unpaced serial lines, simulation, unequal mean operation times, uneven buffer capacities, patterns of imbalance, throughput, average buffer level

Procedia PDF Downloads 467
24690 Association Between Malnutrition and Dental Caries in Children

Authors: Mohammed Khalid Mahmood, Delphine Tardivo, Romain Lan

Abstract:

Dental caries is one of the most common diseases in the world, affecting billions of people and significantly lowering the quality of life. Malnutrition, on the other hand, is defined as inadequate, imbalanced, or excessive consumption of macronutrients, micronutrients, or both, which is characterized as an abnormal physiological condition. Oral health is impacted by malnutrition, and malnutrition can result from poor oral health. The objective of this paper was to study the association of serum Vitamin D level and body mass index as representatives of malnutrition at micro and macro levels, respectively, on dental caries. Results showed that: 1. The majority of the population studied (70%) are Vitamin D deficient. 2. Having a normal and even a sufficient level of serum Vitamin D and having a normal body mass index increase the chances of children being caries-free and having a lower caries index.

Keywords: children, dental Caries, malnutrition, vitamin D

Procedia PDF Downloads 72
24689 A Monte Carlo Fuzzy Logistic Regression Framework against Imbalance and Separation

Authors: Georgios Charizanos, Haydar Demirhan, Duygu Icen

Abstract:

Two of the most impactful issues in classical logistic regression are class imbalance and complete separation. These can result in model predictions heavily leaning towards the imbalanced class on the binary response variable or over-fitting issues. Fuzzy methodology offers key solutions for handling these problems. However, most studies propose the transformation of the binary responses into a continuous format limited within [0,1]. This is called the possibilistic approach within fuzzy logistic regression. Following this approach is more aligned with straightforward regression since a logit-link function is not utilized, and fuzzy probabilities are not generated. In contrast, we propose a method of fuzzifying binary response variables that allows for the use of the logit-link function; hence, a probabilistic fuzzy logistic regression model with the Monte Carlo method. The fuzzy probabilities are then classified by selecting a fuzzy threshold. Different combinations of fuzzy and crisp input, output, and coefficients are explored, aiming to understand which of these perform better under different conditions of imbalance and separation. We conduct numerical experiments using both synthetic and real datasets to demonstrate the performance of the fuzzy logistic regression framework against seven crisp machine learning methods. The proposed framework shows better performance irrespective of the degree of imbalance and presence of separation in the data, while the considered machine learning methods are significantly impacted.

Keywords: fuzzy logistic regression, fuzzy, logistic, machine learning

Procedia PDF Downloads 62
24688 Using Autoencoder as Feature Extractor for Malware Detection

Authors: Umm-E-Hani, Faiza Babar, Hanif Durad

Abstract:

Malware-detecting approaches suffer many limitations, due to which all anti-malware solutions have failed to be reliable enough for detecting zero-day malware. Signature-based solutions depend upon the signatures that can be generated only when malware surfaces at least once in the cyber world. Another approach that works by detecting the anomalies caused in the environment can easily be defeated by diligently and intelligently written malware. Solutions that have been trained to observe the behavior for detecting malicious files have failed to cater to the malware capable of detecting the sandboxed or protected environment. Machine learning and deep learning-based approaches greatly suffer in training their models with either an imbalanced dataset or an inadequate number of samples. AI-based anti-malware solutions that have been trained with enough samples targeted a selected feature vector, thus ignoring the input of leftover features in the maliciousness of malware just to cope with the lack of underlying hardware processing power. Our research focuses on producing an anti-malware solution for detecting malicious PE files by circumventing the earlier-mentioned shortcomings. Our proposed framework, which is based on automated feature engineering through autoencoders, trains the model over a fairly large dataset. It focuses on the visual patterns of malware samples to automatically extract the meaningful part of the visual pattern. Our experiment has successfully produced a state-of-the-art accuracy of 99.54 % over test data.

Keywords: malware, auto encoders, automated feature engineering, classification

Procedia PDF Downloads 64
24687 Rank-Based Chain-Mode Ensemble for Binary Classification

Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu

Abstract:

In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.

Keywords: consensus, curse of correlation, imbalance classification, rank-based chain-mode ensemble

Procedia PDF Downloads 130
24686 Farmers' Perspective on Soil Health in the Indian Punjab: A Quantitative Analysis of Major Soil Parameters

Authors: Sukhwinder Singh, Julian Park, Dinesh Kumar Benbi

Abstract:

Although soil health, which is recognized as one of the key determinants of sustainable agricultural development, can be measured by a range of physical, chemical and biological parameters, the widely used parameters include pH, electrical conductivity (EC), organic carbon (OC), plant available phosphorus (P) and potassium (K). Soil health is largely affected by the occurrence of natural events or human activities and can be improved by various land management practices. A database of 120 soil samples collected from farmers’ fields spread across three major agro-climatic zones of Punjab suggested that the average pH, EC, OC, P and K was 8.2 (SD = 0.75, Min = 5.5, Max = 9.1), 0.27 dS/m (SD = 0.17, Min = 0.072 dS/m, Max = 1.22 dS/m), 0.49% (SD = 0.20, Min = 0.06%, Max = 1.2%), 19 mg/kg soil (SD = 22.07, Min = 3 mg/kg soil, Max = 207 mg/kg soil) and 171 mg/kg soil (SD = 47.57, Min = 54 mg/kg soil, Max = 288 mg/kg soil), respectively. Region-wise, pH, EC and K were the highest in south-western district of Ferozpur whereas farmers in north-eastern district of Gurdaspur had the best soils in terms of OC and P. The soils in the central district of Barnala had lower OC, P and K than the respective overall averages while its soils were normal but skewed towards alkalinity. Besides agro-climatic conditions, the size of landholding and farmer education showed a significant association with Soil Fertility Index (SFI), a composite index calculated using the aforementioned parameters’ normalized weightage. All the four stakeholder groups cited the current cropping patterns, burning of rice crop residue, and imbalanced use of chemical fertilizers for change in soil health. However, the current state of soil health in Punjab is unclear, which needs further investigation based on temporal data collected from the same field to see the short and long-term impacts of various crop combinations and varied cropping intensity levels on soil health.

Keywords: soil health, punjab agriculture, sustainability, soil fertility index

Procedia PDF Downloads 354
24685 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 334
24684 Rotor Dynamic Analysis for a Shaft Train by Using Finite Element Method

Authors: M. Najafi

Abstract:

In the present paper, a large turbo-generator shaft train including a heavy-duty gas turbine engine, a coupling, and a generator is established. The method of analysis is based on finite element simplified model for lateral and torsional vibration calculation. The basic elements of rotor are the shafts and the disks which are represented as circular cross section flexible beams and rigid body elements, respectively. For more accurate results, the gyroscopic effect and bearing dynamics coefficients and function of rotation are taken into account, and for the influence of shear effect, rotor has been modeled in the form of Timoshenko beam. Lateral critical speeds, critical speed map, damped mode shapes, Campbell diagram, zones of instability, amplitudes, phase angles response due to synchronous forces of excitation and amplification factor are calculated. Also, in the present paper, the effect of imbalanced rotor and effects of changing in internal force and temperature are studied.

Keywords: rotor dynamic analysis, finite element method, shaft train, Campbell diagram

Procedia PDF Downloads 133
24683 A Systematic Review on Measuring the Physical Activity Level and Pattern in Persons with Chronic Fatigue Syndrome

Authors: Kuni Vergauwen, Ivan P. J. Huijnen, Astrid Depuydt, Jasmine Van Regenmortel, Mira Meeus

Abstract:

A lower activity level and imbalanced activity pattern are frequently observed in persons with chronic fatigue syndrome (CFS) / myalgic encephalomyelitis (ME) due to debilitating fatigue and post-exertional malaise (PEM). Identification of measurement instruments to evaluate the activity level and pattern is therefore important. The objective is to identify measurement instruments suited to evaluate the activity level and/or pattern in patients with CFS/ME and review their psychometric properties. A systematic literature search was performed in the electronic databases PubMed and Web of Science until 12 October 2016. Articles including relevant measurement instruments were identified and included for further analysis. The psychometric properties of relevant measurement instruments were extracted from the included articles and rated based on the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist. The review was performed and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. A total of 49 articles and 15 unique measurement instruments were found, but only three instruments were evaluated in patients with CFS/ME: the Chronic Fatigue Syndrome-Activity Questionnaire (CFS-AQ), Activity Pattern Interview (API) and International Physical Activity Questionnaire-Short Form (IPAQ-SF), three self-report instruments measuring the physical activity level. The IPAQ-SF, CFS-AQ and API are all equally capable of evaluating the physical activity level, but none of the three measurement instruments are optimal to use. No studies about the psychometric properties of activity monitors in patients with CFS/ME were found, although they are often used as the gold standard to measure the physical activity pattern. More research is needed to evaluate the psychometric properties of existing instruments, including the use of activity monitors.

Keywords: chronic fatigue syndrome, data collection, physical activity, psychometrics

Procedia PDF Downloads 223
24682 A Machine Learning Approach for Detecting and Locating Hardware Trojans

Authors: Kaiwen Zheng, Wanting Zhou, Nan Tang, Lei Li, Yuanhang He

Abstract:

The integrated circuit industry has become a cornerstone of the information society, finding widespread application in areas such as industry, communication, medicine, and aerospace. However, with the increasing complexity of integrated circuits, Hardware Trojans (HTs) implanted by attackers have become a significant threat to their security. In this paper, we proposed a hardware trojan detection method for large-scale circuits. As HTs introduce physical characteristic changes such as structure, area, and power consumption as additional redundant circuits, we proposed a machine-learning-based hardware trojan detection method based on the physical characteristics of gate-level netlists. This method transforms the hardware trojan detection problem into a machine-learning binary classification problem based on physical characteristics, greatly improving detection speed. To address the problem of imbalanced data, where the number of pure circuit samples is far less than that of HTs circuit samples, we used the SMOTETomek algorithm to expand the dataset and further improve the performance of the classifier. We used three machine learning algorithms, K-Nearest Neighbors, Random Forest, and Support Vector Machine, to train and validate benchmark circuits on Trust-Hub, and all achieved good results. In our case studies based on AES encryption circuits provided by trust-hub, the test results showed the effectiveness of the proposed method. To further validate the method’s effectiveness for detecting variant HTs, we designed variant HTs using open-source HTs. The proposed method can guarantee robust detection accuracy in the millisecond level detection time for IC, and FPGA design flows and has good detection performance for library variant HTs.

Keywords: hardware trojans, physical properties, machine learning, hardware security

Procedia PDF Downloads 139
24681 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: big data, learning analytics, analytics, big data in education, Hadoop

Procedia PDF Downloads 411
24680 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 540
24679 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 558
24678 Energy Efficiency Analysis of Discharge Modes of an Adiabatic Compressed Air Energy Storage System

Authors: Shane D. Inder, Mehrdad Khamooshi

Abstract:

Efficient energy storage is a crucial factor in facilitating the uptake of renewable energy resources. Among the many options available for energy storage systems required to balance imbalanced supply and demand cycles, compressed air energy storage (CAES) is a proven technology in grid-scale applications. This paper reviews the current state of micro scale CAES technology and describes a micro-scale advanced adiabatic CAES (A-CAES) system, where heat generated during compression is stored for use in the discharge phase. It will also describe a thermodynamic model, developed in EES (Engineering Equation Solver) to evaluate the performance and critical parameters of the discharge phase of the proposed system. Three configurations are explained including: single turbine without preheater, two turbines with preheaters, and three turbines with preheaters. It is shown that the micro-scale A-CAES is highly dependent upon key parameters including; regulator pressure, air pressure and volume, thermal energy storage temperature and flow rate and the number of turbines. It was found that a micro-scale AA-CAES, when optimized with an appropriate configuration, could deliver energy input to output efficiency of up to 70%.

Keywords: CAES, adiabatic compressed air energy storage, expansion phase, micro generation, thermodynamic

Procedia PDF Downloads 304
24677 World on the Edge: Migration and Cross Border Crimes in West Africa

Authors: Adeyemi Kamil Hamzah

Abstract:

The contiguity of nations in international system suggests that world is a composite of socio-economic unit with people exploring and exploiting the potentials in the world via migrations. Thus, cross border migration has made positive contributions to social and economic development of individuals and nations by increasing the household incomes of the host countries. However, the cross border migrations in West Africa are becoming part of a dynamic and unstable world migration system. This is due to the nature and consequences of trans-border crimes in West Africa, with both short and long term effects on the socio-economic viability of developing countries like West African States. The paper identified that migration influenced cross-border crimes as well as the high spate of insurgencies in the sub-region. Furthermore, the consequential effect of a global village has imbalanced population flows, making some countries host and parasites to others. Also, stern and deft cross-border rules and regulations, as well as territorial security and protections, ameliorate cross border crimes and migration in West African sub-regions. Therefore, the study concluded that cross border migration is the linchpin of all kinds of criminal activities which affect the security of states in the sub-region.

Keywords: cross-border migration, border crimes, security, West Africa, development, globalisation

Procedia PDF Downloads 219
24676 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

Procedia PDF Downloads 398
24675 Assessment of Soil Quality Indicators in Rice Soil of Tamil Nadu

Authors: Kaleeswari R. K., Seevagan L .

Abstract:

Soil quality in an agroecosystem is influenced by the cropping system, water and soil fertility management. A valid soil quality index would help to assess the soil and crop management practices for desired productivity and soil health. The soil quality indices also provide an early indication of soil degradation and needy remedial and rehabilitation measures. Imbalanced fertilization and inadequate organic carbon dynamics deteriorate soil quality in an intensive cropping system. The rice soil ecosystem is different from other arable systems since rice is grown under submergence, which requires a different set of key soil attributes for enhancing soil quality and productivity. Assessment of the soil quality index involves indicator selection, indicator scoring and comprehensive score into one index. The most appropriate indicator to evaluate soil quality can be selected by establishing the minimum data set, which can be screened by linear and multiple regression factor analysis and score function. This investigation was carried out in intensive rice cultivating regions (having >1.0 lakh hectares) of Tamil Nadu viz., Thanjavur, Thiruvarur, Nagapattinam, Villupuram, Thiruvannamalai, Cuddalore and Ramanathapuram districts. In each district, intensive rice growing block was identified. In each block, two sampling grids (10 x 10 sq.km) were used with a sampling depth of 10 – 15 cm. Using GIS coordinates, and soil sampling was carried out at various locations in the study area. The number of soil sampling points were 41, 28, 28, 32, 37, 29 and 29 in Thanjavur, Thiruvarur, Nagapattinam, Cuddalore, Villupuram, Thiruvannamalai and Ramanathapuram districts, respectively. Principal Component Analysis is a data reduction tool to select some of the potential indicators. Principal Component is a linear combination of different variables that represents the maximum variance of the dataset. Principal Component that has eigenvalues equal or higher than 1.0 was taken as the minimum data set. Principal Component Analysis was used to select the representative soil quality indicators in rice soils based on factor loading values and contribution percent values. Variables having significant differences within the production system were used for the preparation of the minimum data set. Each Principal Component explained a certain amount of variation (%) in the total dataset. This percentage provided the weight for variables. The final Principal Component Analysis based soil quality equation is SQI = ∑ i=1 (W ᵢ x S ᵢ); where S- score for the subscripted variable; W-weighing factor derived from PCA. Higher index scores meant better soil quality. Soil respiration, Soil available Nitrogen and Potentially Mineralizable Nitrogen were assessed as soil quality indicators in rice soil of the Cauvery Delta zone covering Thanjavur, Thiruvavur and Nagapattinam districts. Soil available phosphorus could be used as a soil quality indicator of rice soils in the Cuddalore district. In rain-fed rice ecosystems of coastal sandy soil, DTPA – Zn could be used as an effective soil quality indicator. Among the soil parameters selected from Principal Component Analysis, Microbial Biomass Nitrogen could be used quality indicator for rice soils of the Villupuram district. Cauvery Delta zone has better SQI as compared with other intensive rice growing zone of Tamil Nadu.

Keywords: soil quality index, soil attributes, soil mapping, and rice soil

Procedia PDF Downloads 74
24674 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach

Authors: Theertha Chandroth

Abstract:

This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.

Keywords: XML, JSON, data comparison, integration testing, Python, SQL

Procedia PDF Downloads 128
24673 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 384
24672 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 517
24671 Prediction of Coronary Artery Stenosis Severity Based on Machine Learning Algorithms

Authors: Yu-Jia Jian, Emily Chia-Yu Su, Hui-Ling Hsu, Jian-Jhih Chen

Abstract:

Coronary artery is the major supplier of myocardial blood flow. When fat and cholesterol are deposit in the coronary arterial wall, narrowing and stenosis of the artery occurs, which may lead to myocardial ischemia and eventually infarction. According to the World Health Organization (WHO), estimated 740 million people have died of coronary heart disease in 2015. According to Statistics from Ministry of Health and Welfare in Taiwan, heart disease (except for hypertensive diseases) ranked the second among the top 10 causes of death from 2013 to 2016, and it still shows a growing trend. According to American Heart Association (AHA), the risk factors for coronary heart disease including: age (> 65 years), sex (men to women with 2:1 ratio), obesity, diabetes, hypertension, hyperlipidemia, smoking, family history, lack of exercise and more. We have collected a dataset of 421 patients from a hospital located in northern Taiwan who received coronary computed tomography (CT) angiography. There were 300 males (71.26%) and 121 females (28.74%), with age ranging from 24 to 92 years, and a mean age of 56.3 years. Prior to coronary CT angiography, basic data of the patients, including age, gender, obesity index (BMI), diastolic blood pressure, systolic blood pressure, diabetes, hypertension, hyperlipidemia, smoking, family history of coronary heart disease and exercise habits, were collected and used as input variables. The output variable of the prediction module is the degree of coronary artery stenosis. The output variable of the prediction module is the narrow constriction of the coronary artery. In this study, the dataset was randomly divided into 80% as training set and 20% as test set. Four machine learning algorithms, including logistic regression, stepwise regression, neural network and decision tree, were incorporated to generate prediction results. We used area under curve (AUC) / accuracy (Acc.) to compare the four models, the best model is neural network, followed by stepwise logistic regression, decision tree, and logistic regression, with 0.68 / 79 %, 0.68 / 74%, 0.65 / 78%, and 0.65 / 74%, respectively. Sensitivity of neural network was 27.3%, specificity was 90.8%, stepwise Logistic regression sensitivity was 18.2%, specificity was 92.3%, decision tree sensitivity was 13.6%, specificity was 100%, logistic regression sensitivity was 27.3%, specificity 89.2%. From the result of this study, we hope to improve the accuracy by improving the module parameters or other methods in the future and we hope to solve the problem of low sensitivity by adjusting the imbalanced proportion of positive and negative data.

Keywords: decision support, computed tomography, coronary artery, machine learning

Procedia PDF Downloads 224
24670 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 468
24669 Analysis of Urban Slum: Case Study of Korail Slum, Dhaka

Authors: Sanjida Ahmed Sinthia

Abstract:

Bangladesh is one of the poorest countries in the world. There are several reasons for this insufficiency and uncontrolled population growth is one of the prime reasons. Others include low economic progress, imbalanced resource management, unemployment and underemployment, urban migration and natural catastrophes etc. As a result, the rate of urban poor is increasing inevitably in every sphere of urban cities in Bangladesh and Dhaka is the most affected one. Besides there is scarcity of urban land, housing, urban infrastructure and amenities which create pressure on urban cities and mostly encroach the open space, wetlands that causes environmental degradation. Government has no or limited control over these due to poor government policy and management, political pressure and lack of resource management. Unfortunately, over centralization and bureaucracy creates unnecessary delay and interruptions in any government initiations. There is also no coordination between government and private sector developer to solve the problem of urban Poor. To understand the problem of these huge populations this paper analyzes one of the single largest slum areas in Dhaka, Korail Slum. The study focuses on socio demographic analysis, morphological pattern and role of different actors responsible for the improvements of the area and recommended some possible steps for determining the potential outcomes.

Keywords: demographic analysis, environmental degradation, government policy, housing and land management policy

Procedia PDF Downloads 171
24668 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 395
24667 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 637
24666 Energy-Aware Scheduling in Real-Time Systems: An Analysis of Fair Share Scheduling and Priority-Driven Preemptive Scheduling

Authors: Su Xiaohan, Jin Chicheng, Liu Yijing, Burra Venkata Durga Kumar

Abstract:

Energy-aware scheduling in real-time systems aims to minimize energy consumption, but issues related to resource reservation and timing constraints remain challenges. This study focuses on analyzing two scheduling algorithms, Fair-Share Scheduling (FFS) and Priority-Driven Preemptive Scheduling (PDPS), for solving these issues and energy-aware scheduling in real-time systems. Based on research on both algorithms and the processes of solving two problems, it can be found that Fair-Share Scheduling ensures fair allocation of resources but needs to improve with an imbalanced system load, and Priority-Driven Preemptive Scheduling prioritizes tasks based on criticality to meet timing constraints through preemption but relies heavily on task prioritization and may not be energy efficient. Therefore, improvements to both algorithms with energy-aware features will be proposed. Future work should focus on developing hybrid scheduling techniques that minimize energy consumption through intelligent task prioritization, resource allocation, and meeting time constraints.

Keywords: energy-aware scheduling, fair-share scheduling, priority-driven preemptive scheduling, real-time systems, optimization, resource reservation, timing constraints

Procedia PDF Downloads 113
24665 A Comprehensive Framework for Fraud Prevention and Customer Feedback Classification in E-Commerce

Authors: Samhita Mummadi, Sree Divya Nagalli, Harshini Vemuri, Saketh Charan Nakka, Sumesh K. J.

Abstract:

One of the most significant challenges faced by people in today’s digital era is an alarming increase in fraudulent activities on online platforms. The fascination with online shopping to avoid long queues in shopping malls, the availability of a variety of products, and home delivery of goods have paved the way for a rapid increase in vast online shopping platforms. This has had a major impact on increasing fraudulent activities as well. This loop of online shopping and transactions has paved the way for fraudulent users to commit fraud. For instance, consider a store that orders thousands of products all at once, but what’s fishy about this is the massive number of items purchased and their transactions turning out to be fraud, leading to a huge loss for the seller. Considering scenarios like these underscores the urgent need to introduce machine learning approaches to combat fraud in online shopping. By leveraging robust algorithms, namely KNN, Decision Trees, and Random Forest, which are highly effective in generating accurate results, this research endeavors to discern patterns indicative of fraudulent behavior within transactional data. Introducing a comprehensive solution to this problem in order to empower e-commerce administrators in timely fraud detection and prevention is the primary motive and the main focus. In addition to that, sentiment analysis is harnessed in the model so that the e-commerce admin can tailor to the customer’s and consumer’s concerns, feedback, and comments, allowing the admin to improve the user’s experience. The ultimate objective of this study is to ramp up online shopping platforms against fraud and ensure a safer shopping experience. This paper underscores a model accuracy of 84%. All the findings and observations that were noted during our work lay the groundwork for future advancements in the development of more resilient and adaptive fraud detection systems, which will become crucial as technologies continue to evolve.

Keywords: behavior analysis, feature selection, Fraudulent pattern recognition, imbalanced classification, transactional anomalies

Procedia PDF Downloads 16
24664 The Influence of English Learning on Ethnic Kazakh Minority Students’ Identity (Re)Construction at Chinese Universities

Authors: Sharapat Sharapat

Abstract:

English language is perceived as cultural capital in many non-native English-speaking countries, and minority groups in these social contexts seem to invest in the language to be empowered and reposition themselves from the imbalanced power relation with the dominant group. This study is devoted to explore how English learning influence minority Kazakh students’ identity (re)construction at Chinese universities from the scope of ‘imagined community, investment, and identity’ theory of Norton (2013). To this end the three research questions were designed as follows: 1) Kazakh minority students’ English learning experiences at Chinese universities; 2) Kazakh minority students’ views about benefits and opportunities of English learning; 3) the influence of English learning on Kazakh minority students’ identity (re)construction. The study employs an interview-based qualitative research method by interviewing nine Kazakh minority students in universities in Xinjiang and other inland cities in China. The findings suggest that through English learning, some students have reconstructed multiple identities as multicultural and global identities, which created ‘a third space’ to break limits of their ethnic and national identities and confused identity as someone in-between. Meanwhile, most minority students were empowered by the English language to resist inferior or marginalized positions and reconstruct imagined elite identity. However, English learning disempowered students who have little previous English education in school and placed them on unequal footing with other students, which further escalated the educational inequities.

Keywords: minority in China, identity construction, multilingual education, language empowerment

Procedia PDF Downloads 220
24663 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 370