Search results for: data imbalance
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7367

Search results for: data imbalance

7367 Investigation of I/Q Imbalance in Coherent Optical OFDM System

Authors: R. S. Fyath, Mustafa A. B. Al-Qadi

Abstract:

The inphase/quadrature (I/Q) amplitude and phase imbalance effects are studied in coherent optical orthogonal frequency division multiplexing (CO-OFDM) systems. An analytical model for the I/Q imbalance is developed and supported by simulation results. The results indicate that the I/Q imbalance degrades the BER performance considerably.

Keywords: Coherent detection, I/Q imbalance, OFDM, optical communications

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2507
7366 Investigation of Organizational Work-Life Imbalance of Thai Software Developers in a Multinational Software Development Firm using Fishbone Diagram for Knowledge Management

Authors: N. Mantalay, N. Chakpitak, W. Janchai, P. Sureepong

Abstract:

Work stress causes the organizational work-life imbalance of employees. Because of this imbalance, workers perform with lower effort to finish assignments and thus an organization will experience reduced productivity. In order to investigate the problem of an organizational work-life imbalance, this qualitative case study focuses on an organizational work-life imbalance among Thai software developers in a German-owned company in Chiang Mai, Thailand. In terms of knowledge management, fishbone diagram is useful analysis tool to investigate the root causes of an organizational work-life imbalance systematically in focus-group discussions. Furthermore, fishbone diagram shows the relationship between causes and effects clearly. It was found that an organizational worklife imbalance among Thai software developers is influenced by management team, work environment, and information tools used in the company over time.

Keywords: knowledge management, knowledge worker, worklife imbalance, fishbone diagram

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2810
7365 Machine Learning-Enabled Classification of Climbing Using Small Data

Authors: Nicholas Milburn, Yu Liang, Dalei Wu

Abstract:

Athlete performance scoring within the climbing domain presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.

Keywords: Classification, climbing, data imbalance, data scarcity, machine learning, time sequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 459
7364 Reducing the Imbalance Penalty through Artificial Intelligence Methods Geothermal Production Forecasting: A Case Study for Turkey

Authors: H. Anıl, G. Kar

Abstract:

In addition to being rich in renewable energy resources, Turkey is one of the countries that promise potential in geothermal energy production with its high installed power, cheapness, and sustainability. Increasing imbalance penalties become an economic burden for organizations, since the geothermal generation plants cannot maintain the balance of supply and demand due to the inadequacy of the production forecasts given in the day-ahead market. A better production forecast reduces the imbalance penalties of market participants and provides a better imbalance in the day ahead market. In this study, using machine learning, deep learning and time series methods, the total generation of the power plants belonging to Zorlu Doğal Electricity Generation, which has a high installed capacity in terms of geothermal, was predicted for the first one-week and first two-weeks of March, then the imbalance penalties were calculated with these estimates and compared with the real values. These modeling operations were carried out on two datasets, the basic dataset and the dataset created by extracting new features from this dataset with the feature engineering method. According to the results, Support Vector Regression from traditional machine learning models outperformed other models and exhibited the best performance. In addition, the estimation results in the feature engineering dataset showed lower error rates than the basic dataset. It has been concluded that the estimated imbalance penalty calculated for the selected organization is lower than the actual imbalance penalty, optimum and profitable accounts.

Keywords: Machine learning, deep learning, time series models, feature engineering, geothermal energy production forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 115
7363 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection

Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada

Abstract:

With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.

Keywords: Machine learning, Imbalanced data, Data mining, Big data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1048
7362 The Operating Behaviour of Unbalanced Unpaced Merging Assembly Lines

Authors: S. Shaaban, T. McNamara, S. Hudson

Abstract:

This paper reports on the performance of deliberately unbalanced, reliable, non-automated and assembly lines that merge, whose workstations differ in terms of their mean operation times. Simulations are carried out on 5- and 8-station lines with 1, 2 and 4 buffer capacity units, % degrees of line imbalance of 2, 5 and 12, and 24 different patterns of means imbalance. Data on two performance measures, namely throughput and average buffer level were gathered, statistically analysed and compared to a merging balanced line counterpart. It was found that the best configurations are a balanced line arrangement and a monotone decreasing order for each of the parallel merging lines, with the first generally resulting in a lower throughput and the second leading to a lower average buffer level than those of a balanced line.

Keywords: Average buffer level, merging lines, simulation, throughput, unbalanced.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1489
7361 Performance Evaluation of Task Scheduling Algorithm on LCQ Network

Authors: Zaki Ahmad Khan, Jamshed Siddiqui, Abdus Samad

Abstract:

The Scheduling and mapping of tasks on a set of processors is considered as a critical problem in parallel and distributed computing system. This paper deals with the problem of dynamic scheduling on a special type of multiprocessor architecture known as Linear Crossed Cube (LCQ) network. This proposed multiprocessor is a hybrid network which combines the features of both linear types of architectures as well as cube based architectures. Two standard dynamic scheduling schemes namely Minimum Distance Scheduling (MDS) and Two Round Scheduling (TRS) schemes are implemented on the LCQ network. Parallel tasks are mapped and the imbalance of load is evaluated on different set of processors in LCQ network. The simulations results are evaluated and effort is made by means of through analysis of the results to obtain the best solution for the given network in term of load imbalance left and execution time. The other performance matrices like speedup and efficiency are also evaluated with the given dynamic algorithms.

Keywords: Dynamic algorithm, Load imbalance, Mapping, Task scheduling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1959
7360 Application of H2 -based Sliding Mode Control for an Active Magnetic Bearing System

Authors: Abdul Rashid Husain, Mohamad Noh Ahmad, Abdul Halim Mohd. Yatim

Abstract:

In this paper, application of Sliding Mode Control (SMC) technique for an Active Magnetic Bearing (AMB) system with varying rotor speed is considered. The gyroscopic effect and mass imbalance inherited in the system is proportional to rotor speed in which this nonlinearity effect causes high system instability as the rotor speed increases. Transformation of the AMB dynamic model into regular system shows that these gyroscopic effect and imbalance lie in the mismatched part of the system. A H2-based sliding surface is designed which bound the mismatched parts. The solution of the surface parameter is obtained using Linear Matrix Inequality (LMI). The performance of the controller applied to the AMB model is demonstrated through simulation works under various system conditions.

Keywords: Active magnetic bearing, sliding mode control, linear matrix inequality, mismatched uncertainty and imbalance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1550
7359 Unreliable Production Lines with Simultaneously Unbalanced Operation Time Means, Breakdown, and Repair Rates

Authors: S. Shaaban, T. McNamara, S. Hudson

Abstract:

This paper investigates the benefits of deliberately unbalancing both operation time means (MTs) and unreliability (failure and repair rates) for non-automated production lines. The lines were simulated with various line lengths, buffer capacities, degrees of imbalance and patterns of MT and unreliability imbalance. Data on two performance measures, namely throughput (TR) and average buffer level (ABL) were gathered, analyzed and compared to a balanced line counterpart. A number of conclusions were made with respect to the ranking of configurations, as well as to the relationships among the independent design parameters and the dependent variables. It was found that the best configurations are a balanced line arrangement and a monotone decreasing MT order, coupled with either a decreasing or a bowl unreliability configuration, with the first generally resulting in a reduced TR and the second leading to a lower ABL than those of a balanced line.

Keywords: Average buffer level, throughput, unbalanced failure and repair rates, unequal mean operation times, unreliable production lines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2194
7358 Improved Rare Species Identification Using Focal Loss Based Deep Learning Models

Authors: Chad Goldsworthy, B. Rajeswari Matam

Abstract:

The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.

Keywords: Convolutional neural networks, data imbalance, deep learning, focal loss, species classification, wildlife conservation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1317
7357 The Virtual Container Yard: Identifying the Persuasive Factors in Container Interchange

Authors: L. Edirisinghe, Zhihong Jin, A. W. Wijeratne, R. Mudunkotuwa

Abstract:

The virtual container yard is an effective solution to the container inventory imbalance problem which is a global issue. It causes substantial cost to carriers, which inadvertently adds to the prices of consumer goods. The virtual container yard is rooted in the fundamentals of container interchange between carriers. If carriers opt to interchange their excess containers with those who are deficit, a substantial part of the empty reposition cost could be eliminated. Unlike in other types of ships, cargo cannot be directly loaded to a container ship. Slots and containers are supplementary components; thus, without containers, a carrier cannot ship cargo if the containers are not available and vice versa. Few decades ago, carriers recognized slot (the unit of space in a container ship) interchange as a viable solution for the imbalance of shipping space. Carriers interchange slots among them and it also increases the advantage of scale of economies in container shipping. Some of these service agreements between mega carriers have provisions to interchange containers too. However, the interchange mechanism is still not popular among carriers for containers. This is the paradox that prevails in the liner shipping industry. At present, carriers reposition their excess empty containers to areas where they are in demand. This research applied factor analysis statistical method. The paper reveals that five major components may influence the virtual container yard namely organisation, practice and culture, legal and environment, international nature, and marketing. There are 12 variables that may impact the virtual container yard, and these are explained in the paper.

Keywords: Virtual container yard, imbalance, management, inventory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 778
7356 A Study on the Application of Machine Learning and Deep Learning Techniques for Skin Cancer Detection

Authors: Hritwik Ghosh, Irfan Sadiq Rahat, Sachi Nandan Mohanty, J. V. R. Ravindra, Abdus Sobur

Abstract:

In the rapidly evolving landscape of medical diagnostics, the early detection and accurate classification of skin cancer remain paramount for effective treatment outcomes. This research delves into the transformative potential of artificial intelligence (AI), specifically deep learning (DL), as a tool for discerning and categorizing various skin conditions. Utilizing a diverse dataset of 3,000 images, representing nine distinct skin conditions, we confront the inherent challenge of class imbalance. This imbalance, where conditions like melanomas are over-represented, is addressed by incorporating class weights during the model training phase, ensuring an equitable representation of all conditions in the learning process. Our approach presents a hybrid model, amalgamating the strengths of two renowned convolutional neural networks (CNNs), VGG16 and ResNet50. These networks, pre-trained on the ImageNet dataset, are adept at extracting intricate features from images. By synergizing these models, our research aims to capture a holistic set of features, thereby bolstering classification performance. Preliminary findings underscore the hybrid model's superiority over individual models, showcasing its prowess in feature extraction and classification. Moreover, the research emphasizes the significance of rigorous data pre-processing, including image resizing, color normalization, and segmentation, in ensuring data quality and model reliability. In essence, this study illuminates the promising role of AI and DL in revolutionizing skin cancer diagnostics, offering insights into its potential applications in broader medical domains.

Keywords: Artificial intelligence, machine learning, deep learning, skin cancer, dermatology, convolutional neural networks, image classification, computer vision, healthcare technology, cancer detection, medical imaging.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 445
7355 An Assessment of the Hip Muscular Imbalance for Patients with Rheumatism

Authors: Anthony Bawa, Konstantinos Banitsas

Abstract:

Rheumatism is a muscular disorder that affects the muscles of the upper and lower limbs. This condition could potentially progress to impair the movement of patients. This study aims to investigate the hip muscular imbalance in patients with chronic rheumatism. A clinical trial involving a total of 15 participants, made up of 10 patients and five control subjects, took place in KATH Hospital between August and September. Participants recruited for the study were of age 54 ± 8 years, weight 65 ± 8 kg, and height 176 ± 8 cm. Muscle signals were recorded from the rectus femoris, and vastus lateralis on the right and left hip of participants. The parameters used in determining the hip muscular imbalances were the maximum voluntary contraction (MVC%), the mean difference, and hip muscle fatigue levels. The mean signals were compared using a t-test, and the metrics for muscle fatigue assessment were based on the root mean square (RMS), mean absolute value (MAV) and mean frequency (MEF), which were computed between the hip muscles of participants. The results indicated that there were significant imbalances in the muscle coactivity between the right and left hip muscles of patients. The patients’ MVC values were observed to be above 10% when compared with control subjects. Furthermore, the mean difference was seen to be higher with p > 0.002 among patients, which indicated clear differences in the hip muscle contraction activities. The findings indicate significant hip muscular imbalances for patients with rheumatism compared with control subjects. Information about the imbalances among patients will be useful for clinicians in designing therapeutic muscle-strengthening exercises.

Keywords: Muscular, imbalances, rheumatism, hip.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 82
7354 Spiral Cuff for Fiber-Diameter Selective VNS

Authors: P. Pečlin, J. Rozman

Abstract:

In this paper we present the modeling, design, and experimental testing of a nerve cuff multi-electrode system for diameter-selective vagus nerve stimulation. The multi-electrode system contained ninety-nine platinum electrodes embedded within a self-curling spiral silicone sheet. The electrodes were organized in a matrix having nine parallel groups, each containing eleven electrodes. Preliminary testing of the nerve cuff was performed in an isolated segment of a swinish left cervical vagus nerve. For selective vagus nerve stimulation, precisely defined current quasitrapezoidal, asymmetric and biphasic stimulating pulses were applied to preselected locations along the left vagus segment via appointed group of three electrodes within the cuff. Selective stimulation was obtained by anodal block. However, these pulses may not be safe for a long-term application because of a frequently used high imbalance between the cathodic and anodic part of the stimulating pulse. Preliminary results show that the cuff was capable of exciting A and B-fibres, and, that for a certain range of parameters used in stimulating pulses, the contribution of A-fibres to the CAP was slightly reduced and the contribution of B-fibres was slightly larger. Results also showed that measured CAPs are not greatly influenced by the imbalance between a charge Qc injected in cathodic and Qa in anodic phase of quasitrapezoidal, asymmetric and biphasic pulses.

Keywords: Vagus nerve stimulation, multi-electrode nerve cuff.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1614
7353 Big Data: Big Challenges to Privacy and Data Protection

Authors: Abu Bakar Munir, Siti Hajar Mohd Yasin, Firdaus Muhammad-Sukki

Abstract:

This paper seeks to analyse the benefits of big data and more importantly the challenges it pose to the subject of privacy and data protection. First, the nature of big data will be briefly deliberated before presenting the potential of big data in the present days. Afterwards, the issue of privacy and data protection is highlighted before discussing the challenges of implementing this issue in big data. In conclusion, the paper will put forward the debate on the adequacy of the existing legal framework in protecting personal data in the era of big data.

Keywords: Big data, data protection, information, privacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3845
7352 Bee Parameter Determination via Weighted Centriod Modified Simplex and Constrained Response Surface Optimisation Methods

Authors: P. Luangpaiboon

Abstract:

Various intelligences and inspirations have been adopted into the iterative searching process called as meta-heuristics. They intelligently perform the exploration and exploitation in the solution domain space aiming to efficiently seek near optimal solutions. In this work, the bee algorithm, inspired by the natural foraging behaviour of honey bees, was adapted to find the near optimal solutions of the transportation management system, dynamic multi-zone dispatching. This problem prepares for an uncertainty and changing customers- demand. In striving to remain competitive, transportation system should therefore be flexible in order to cope with the changes of customers- demand in terms of in-bound and outbound goods and technological innovations. To remain higher service level but lower cost management via the minimal imbalance scenario, the rearrangement penalty of the area, in each zone, including time periods are also included. However, the performance of the algorithm depends on the appropriate parameters- setting and need to be determined and analysed before its implementation. BEE parameters are determined through the linear constrained response surface optimisation or LCRSOM and weighted centroid modified simplex methods or WCMSM. Experimental results were analysed in terms of best solutions found so far, mean and standard deviation on the imbalance values including the convergence of the solutions obtained. It was found that the results obtained from the LCRSOM were better than those using the WCMSM. However, the average execution time of experimental run using the LCRSOM was longer than those using the WCMSM. Finally a recommendation of proper level settings of BEE parameters for some selected problem sizes is given as a guideline for future applications.

Keywords: Meta-heuristic, Bee Algorithm, Dynamic Multi-Zone Dispatching, Linear Constrained Response SurfaceOptimisation Method, Weighted Centroid Modified Simplex Method

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1325
7351 The Cooperation among Insulin, Cortisol and Thyroid Hormones in Morbid Obese Children and Metabolic Syndrome

Authors: Orkide Donma, Mustafa M. Donma

Abstract:

Obesity, a disease associated with a low-grade inflammation, is a risk factor for the development of metabolic syndrome (MetS). So far, MetS risk factors such as parameters related to glucose and lipid metabolisms as well as blood pressure were considered for the evaluation of this disease. There are still some ambiguities related to the characteristic features of MetS observed particularly in pediatric population. Hormonal imbalance is also important, and quite a lot information exists about the behaviour of some hormones in adults. However, the hormonal profiles in pediatric metabolism have not been cleared yet. The aim of this study is to investigate the profiles of cortisol, insulin, and thyroid hormones in children with MetS. The study population was composed of morbid obese (MO) children without (Group 1) and with (Group 2) MetS components. WHO BMI-for age and sex percentiles were used for the classification of obesity. The values above 99 percentile were defined as morbid obesity. Components of MetS (central obesity, glucose intolerance, high blood pressure, high triacylglycerol levels, low levels of high density lipoprotein cholesterol) were determined. Anthropometric measurements were performed. Ratios as well as obesity indices were calculated. Insulin, cortisol, thyroid stimulating hormone (TSH), free T3 and free T4 analyses were performed by electrochemiluminescence immunoassay. Data were evaluated by statistical package for social sciences program. p<0.05 was accepted as the degree for statistical significance. The mean ages±SD values of Group 1 and Group 2 were 9.9±3.1 years and 10.8±3.2 years, respectively. Body mass index (BMI) values were calculated as 27.4±5.9 kg/m2 and 30.6±8.1 kg/m2, successively. There were no statistically significant differences between the ages and BMI values of the groups. Insulin levels were statistically significantly increased in MetS in comparison with the levels measured in MO children. There was not any difference between MO children and those with MetS in terms of cortisol, T3, T4 and TSH. However, T4 levels were positively correlated with cortisol and negatively correlated with insulin. None of these correlations were observed in MO children. Cortisol levels in both MO as well as MetS group were significantly correlated. Cortisol, insulin, and thyroid hormones are essential for life. Cortisol, called the control system for hormones, orchestrates the performance of other key hormones. It seems to establish a connection between hormone imbalance and inflammation. During an inflammatory state, more cortisol is produced to fight inflammation. High cortisol levels prevent the conversion of the inactive form of the thyroid hormone T4 into active form T3. Insulin is reduced due to low thyroid hormone. T3, which is essential for blood sugar control- requires cortisol levels within the normal range. Positive association of T4 with cortisol and negative association of it with insulin are the indicators of such a delicate balance among these hormones also in children with MetS.

Keywords: Children, cortisol, insulin, metabolic syndrome, thyroid hormones.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 742
7350 Spatial Distribution of Socio-Economic Factors in Kogi State, Nigeria: Development Issues and Implication(s)

Authors: Yahya A. Sadiq, Grace F. Balogun, Olufemi J. Anjorin

Abstract:

This study analyzed the spatial distribution of socio-economic factors in Kogi state with a view to examining its implications on the development of the state. Consequently, questionnaires were administered on both the selected individual respondents (784) in the state and on the administrative offices (local council offices, 21) to solicit relevant information on the spatial distribution of socio-economic factors in their areas. The collected data were tabulated and analyzed using percentages. The study revealed commerce/trade, education, and health care, etc. as the major socio-economic factors in the state but with marked variation/imbalance in their spatial distribution across the study area. The rural-based local government areas have far less of such important facilities. Conclusively, it was recommended that there is need for socio-economic transformation of living conditions of people in the study area especially by positively redistributing local political power and the resources that are abound in the state will be felt by everybody including the commoners.

Keywords: Development, local government areas, socio-economic factors, spatial distribution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1736
7349 Data Preprocessing for Supervised Leaning

Authors: S. B. Kotsiantis, D. Kanellopoulos, P. E. Pintelas

Abstract:

Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.

Keywords: Data mining, feature selection, data cleaning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5896
7348 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: Analytics, Big Data in Education, Hadoop, Learning Analytics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4806
7347 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.

Keywords: Data cleaning, dependency rules, violation data discovery, data repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2555
7346 Coalescing Data Marts

Authors: N. Parimala, P. Pahwa

Abstract:

OLAP uses multidimensional structures, to provide access to data for analysis. Traditionally, OLAP operations are more focused on retrieving data from a single data mart. An exception is the drill across operator. This, however, is restricted to retrieving facts on common dimensions of the multiple data marts. Our concern is to define further operations while retrieving data from multiple data marts. Towards this, we have defined six operations which coalesce data marts. While doing so we consider the common as well as the non-common dimensions of the data marts.

Keywords: Data warehouse, Dimension, OLAP, Star Schema.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1509
7345 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2410
7344 Comparative Analysis of Diverse Collection of Big Data Analytics Tools

Authors: S. Vidhya, S. Sarumathi, N. Shanthi

Abstract:

Over the past era, there have been a lot of efforts and studies are carried out in growing proficient tools for performing various tasks in big data. Recently big data have gotten a lot of publicity for their good reasons. Due to the large and complex collection of datasets it is difficult to process on traditional data processing applications. This concern turns to be further mandatory for producing various tools in big data. Moreover, the main aim of big data analytics is to utilize the advanced analytic techniques besides very huge, different datasets which contain diverse sizes from terabytes to zettabytes and diverse types such as structured or unstructured and batch or streaming. Big data is useful for data sets where their size or type is away from the capability of traditional relational databases for capturing, managing and processing the data with low-latency. Thus the out coming challenges tend to the occurrence of powerful big data tools. In this survey, a various collection of big data tools are illustrated and also compared with the salient features.

Keywords: Big data, Big data analytics, Business analytics, Data analysis, Data visualization, Data discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3720
7343 Multi-labeled Data Expressed by a Set of Labels

Authors: Tetsuya Furukawa, Masahiro Kuzunishi

Abstract:

Collected data must be organized to be utilized efficiently, and hierarchical classification of data is efficient approach to organize data. When data is classified to multiple categories or annotated with a set of labels, users request multi-labeled data by giving a set of labels. There are several interpretations of the data expressed by a set of labels. This paper discusses which data is expressed by a set of labels by introducing orders for sets of labels and shows that there are four types of orders, which are characterized by whether the labels of expressed data includes every label of the given set of labels within the range of the set. Desirable properties of the orders, data is also expressed by the higher set of labels and different sets of labels express different data, are discussed for the orders.

Keywords: Classification Hierarchies, Multi-labeled Data, Multiple Classificaiton, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1251
7342 The Comparison of Data Replication in Distributed Systems

Authors: Iman Zangeneh, Mostafa Moradi, Ali Mokhtarbaf

Abstract:

The necessity of ever-increasing use of distributed data in computer networks is obvious for all. One technique that is performed on the distributed data for increasing of efficiency and reliablity is data rplication. In this paper, after introducing this technique and its advantages, we will examine some dynamic data replication. We will examine their characteristies for some overus scenario and the we will propose some suggestion for their improvement.

Keywords: data replication, data hiding, consistency, dynamicdata replication strategy

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1576
7341 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: Clustering, data mining, DBSCAN, k-means, k-medoids, sensor data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1947
7340 Multi-Stage Multi-Period Production Planning in Wire and Cable Industry

Authors: Mahnaz Hosseinzadeh, Shaghayegh Rezaee Amiri

Abstract:

This paper presents a methodology for serial production planning problem in wire and cable manufacturing process that addresses the problem of input-output imbalance in different consecutive stations, hoping to minimize the halt of machines in each stage. To this end, a linear Goal Programming (GP) model is developed, in which four main categories of constraints as per the number of runs per machine, machines’ sequences, acceptable inventories of machines at the end of each period, and the necessity of fulfillment of the customers’ orders are considered. The model is formulated based upon on the real data obtained from IKO TAK Company, an important supplier of wire and cable for oil and gas and automotive industries in Iran. By solving the model in GAMS software the optimal number of runs, end-of-period inventories, and the possible minimum idle time for each machine are calculated. The application of the numerical results in the target company has shown the efficiency of the proposed model and the solution in decreasing the lead time of the end product delivery to the customers by 20%. Accordingly, the developed model could be easily applied in wire and cable companies for the aim of optimal production planning to reduce the halt of machines in manufacturing stages.

Keywords: Serial manufacturing process, production planning, wire and cable industry, goal programming approach.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 860
7339 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: Big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1960
7338 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2715