Search results for: Missing Data Techniques.
8830 Design of an Ensemble Learning Behavior Anomaly Detection Framework
Authors: Abdoulaye Diop, Nahid Emad, Thierry Winter, Mohamed Hilia
Abstract:
Data assets protection is a crucial issue in the cybersecurity field. Companies use logical access control tools to vault their information assets and protect them against external threats, but they lack solutions to counter insider threats. Nowadays, insider threats are the most significant concern of security analysts. They are mainly individuals with legitimate access to companies information systems, which use their rights with malicious intents. In several fields, behavior anomaly detection is the method used by cyber specialists to counter the threats of user malicious activities effectively. In this paper, we present the step toward the construction of a user and entity behavior analysis framework by proposing a behavior anomaly detection model. This model combines machine learning classification techniques and graph-based methods, relying on linear algebra and parallel computing techniques. We show the utility of an ensemble learning approach in this context. We present some detection methods tests results on an representative access control dataset. The use of some explored classifiers gives results up to 99% of accuracy.Keywords: Cybersecurity, data protection, access control, insider threat, user behavior analysis, ensemble learning, high performance computing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11518829 Analysis of the Ambient Media Approach of Advertisement Samples from the Adman Awards and Symposium under the Category of Outdoor and Ambience
Authors: Chanthana Poninthawong
Abstract:
This research is to study the types of products and services that employs 'ambient media and respective techniques in its advertisement materials. Data collection has been done via analyses of a total of 62 advertisements that employed ambient media approach in Thailand during the years 2004 to 2011. The 62 advertisement were qualifying advertisements of the Adman Awards & Symposium under the category of Outdoor & Ambience. Analysis results reveal that there is a total of 14 products and services that chooses to utilize ambient media in its advertisement. Amongst all ambient media techniques, 'intrusion' uses the value of a medium in its representation of content most often. Following intrusion is 'interaction', where consumers are invited to participate and interact with the advertising materials. 'Illusion' ranks third in its ability to subject the viewers to distortions of reality that makes the division between reality and fantasy less clear.Keywords: Ambient media, Adman Awards, advertising, Out of Home media.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24288828 A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods
Authors: S. Sarumathi, N. Shanthi, M. Sharmila
Abstract:
Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.
Keywords: Clustering, Cluster Ensemble methods, Co-association matrix, Consensus function, Median partition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 26038827 Node Insertion in Coalescence Hidden-Variable Fractal Interpolation Surface
Authors: Srijanani Anurag Prasad
Abstract:
The Coalescence Hidden-variable Fractal Interpolation Surface (CHFIS) was built by combining interpolation data from the Iterated Function System (IFS). The interpolation data in a CHFIS comprise a row and/or column of uncertain values when a single point is entered. Alternatively, a row and/or column of additional points are placed in the given interpolation data to demonstrate the node added CHFIS. There are three techniques for inserting new points that correspond to the row and/or column of nodes inserted, and each method is further classified into four types based on the values of the inserted nodes. As a result, numerous forms of node insertion can be found in a CHFIS.
Keywords: Fractal, interpolation, iterated function system, coalescence, node insertion, knot insertion.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3398826 Fault Detection of Drinking Water Treatment Process Using PCA and Hotelling's T2 Chart
Authors: Joval P George, Dr. Zheng Chen, Philip Shaw
Abstract:
This paper deals with the application of Principal Component Analysis (PCA) and the Hotelling-s T2 Chart, using data collected from a drinking water treatment process. PCA is applied primarily for the dimensional reduction of the collected data. The Hotelling-s T2 control chart was used for the fault detection of the process. The data was taken from a United Utilities Multistage Water Treatment Works downloaded from an Integrated Program Management (IPM) dashboard system. The analysis of the results show that Multivariate Statistical Process Control (MSPC) techniques such as PCA, and control charts such as Hotelling-s T2, can be effectively applied for the early fault detection of continuous multivariable processes such as Drinking Water Treatment. The software package SIMCA-P was used to develop the MSPC models and Hotelling-s T2 Chart from the collected data.
Keywords: Principal component analysis, hotelling's t2 chart, multivariate statistical process control, drinking water treatment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27858825 Data Preprocessing for Supervised Leaning
Authors: S. B. Kotsiantis, D. Kanellopoulos, P. E. Pintelas
Abstract:
Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.Keywords: Data mining, feature selection, data cleaning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 60918824 An Educational Data Mining System for Advising Higher Education Students
Authors: Heba Mohammed Nagy, Walid Mohamed Aly, Osama Fathy Hegazy
Abstract:
Educational data mining is a specific data mining field applied to data originating from educational environments, it relies on different approaches to discover hidden knowledge from the available data. Among these approaches are machine learning techniques which are used to build a system that acquires learning from previous data. Machine learning can be applied to solve different regression, classification, clustering and optimization problems.
In our research, we propose a “Student Advisory Framework” that utilizes classification and clustering to build an intelligent system. This system can be used to provide pieces of consultations to a first year university student to pursue a certain education track where he/she will likely succeed in, aiming to decrease the high rate of academic failure among these students. A real case study in Cairo Higher Institute for Engineering, Computer Science and Management is presented using real dataset collected from 2000−2012.The dataset has two main components: pre-higher education dataset and first year courses results dataset. Results have proved the efficiency of the suggested framework.
Keywords: Classification, Clustering, Educational Data Mining (EDM), Machine Learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 52138823 Locating Center Points for Radial Basis Function Networks Using Instance Reduction Techniques
Authors: Rana Yousef, Khalil el Hindi
Abstract:
The behavior of Radial Basis Function (RBF) Networks greatly depends on how the center points of the basis functions are selected. In this work we investigate the use of instance reduction techniques, originally developed to reduce the storage requirements of instance based learners, for this purpose. Five Instance-Based Reduction Techniques were used to determine the set of center points, and RBF networks were trained using these sets of centers. The performance of the RBF networks is studied in terms of classification accuracy and training time. The results obtained were compared with two Radial Basis Function Networks: RBF networks that use all instances of the training set as center points (RBF-ALL) and Probabilistic Neural Networks (PNN). The former achieves high classification accuracies and the latter requires smaller training time. Results showed that RBF networks trained using sets of centers located by noise-filtering techniques (ALLKNN and ENN) rather than pure reduction techniques produce the best results in terms of classification accuracy. The results show that these networks require smaller training time than that of RBF-ALL and higher classification accuracy than that of PNN. Thus, using ALLKNN and ENN to select center points gives better combination of classification accuracy and training time. Our experiments also show that using the reduced sets to train the networks is beneficial especially in the presence of noise in the original training sets.
Keywords: Radial basis function networks, Instance-based reduction, PNN.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16878822 Modern Pedagogy Techniques for DC Motor Speed Control
Authors: Rajesh Kumar, Roopali Dogra, Puneet Aggarwal
Abstract:
Based on a survey conducted for second and third year students of the electrical engineering department at Maharishi Markandeshwar University, India, it was found that around 92% of students felt that it would be better to introduce a virtual environment for laboratory experiments. Hence, a need was felt to perform modern pedagogy techniques for students which consist of a virtual environment using MATLAB/Simulink. In this paper, a virtual environment for the speed control of a DC motor is performed using MATLAB/Simulink. The various speed control methods for the DC motor include the field resistance control method and armature voltage control method. The performance analysis of the DC motor is hence analyzed.
Keywords: Pedagogy techniques, speed control, virtual environment, DC motor, field control, voltage control.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18788821 Energy Management Techniques in Mobile Robots
Authors: G. Gurguze, I. Turkoglu
Abstract:
Today, the developing features of technological tools with limited energy resources have made it necessary to use energy efficiently. Energy management techniques have emerged for this purpose. As with every field, energy management is vital for robots that are being used in many areas from industry to daily life and that are thought to take up more spaces in the future. Particularly, effective power management in autonomous and multi robots, which are getting more complicated and increasing day by day, will improve the performance and success. In this study, robot management algorithms, usage of renewable and hybrid energy sources, robot motion patterns, robot designs, sharing strategies of workloads in multiple robots, road and mission planning algorithms are discussed for efficient use of energy resources by mobile robots. These techniques have been evaluated in terms of efficient use of existing energy resources and energy management in robots.
Keywords: Energy management, mobile robot, robot administration, robot management, robot planning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15708820 A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods
Authors: S. Sarumathi, N. Shanthi, S. Vidhya, M. Sharmila
Abstract:
An extensive amount of work has been done in data clustering research under the unsupervised learning technique in Data Mining during the past two decades. Moreover, several approaches and methods have been emerged focusing on clustering diverse data types, features of cluster models and similarity rates of clusters. However, none of the single clustering algorithm exemplifies its best nature in extracting efficient clusters. Consequently, in order to rectify this issue, a new challenging technique called Cluster Ensemble method was bloomed. This new approach tends to be the alternative method for the cluster analysis problem. The main objective of the Cluster Ensemble is to aggregate the diverse clustering solutions in such a way to attain accuracy and also to improve the eminence the individual clustering algorithms. Due to the massive and rapid development of new methods in the globe of data mining, it is highly mandatory to scrutinize a vital analysis of existing techniques and the future novelty. This paper shows the comparative analysis of different cluster ensemble methods along with their methodologies and salient features. Henceforth this unambiguous analysis will be very useful for the society of clustering experts and also helps in deciding the most appropriate one to resolve the problem in hand.
Keywords: Clustering, Cluster Ensemble Methods, Coassociation matrix, Consensus Function, Median Partition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21048819 An Overview of Handoff Techniques in Cellular Networks
Authors: Nasıf Ekiz, Tara Salih, Sibel Küçüköner, Kemal Fidanboylu
Abstract:
Continuation of an active call is one of the most important quality measurements in the cellular systems. Handoff process enables a cellular system to provide such a facility by transferring an active call from one cell to another. Different approaches are proposed and applied in order to achieve better handoff service. The principal parameters used to evaluate handoff techniques are: forced termination probability and call blocking probability. The mechanisms such as guard channels and queuing handoff calls decrease the forced termination probability while increasing the call blocking probability. In this paper we present an overview about the issues related to handoff initiation and decision and discuss about different types of handoff techniques available in the literature.
Keywords: Handoff, Forced Termination Probability, Blocking probability, Handoff Initiation, Handoff Decision, Handoff Prioritization Schemes.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 66158818 Applications of Big Data in Education
Authors: Faisal Kalota
Abstract:
Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.Keywords: Analytics, Big Data in Education, Hadoop, Learning Analytics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 48748817 Evolutionary Techniques Based Combined Artificial Neural Networks for Peak Load Forecasting
Authors: P. Subbaraj, V. Rajasekaran
Abstract:
This paper presents a new approach using Combined Artificial Neural Network (CANN) module for daily peak load forecasting. Five different computational techniques –Constrained method, Unconstrained method, Evolutionary Programming (EP), Particle Swarm Optimization (PSO), and Genetic Algorithm (GA) – have been used to identify the CANN module for peak load forecasting. In this paper, a set of neural networks has been trained with different architecture and training parameters. The networks are trained and tested for the actual load data of Chennai city (India). A set of better trained conventional ANNs are selected to develop a CANN module using different algorithms instead of using one best conventional ANN. Obtained results using CANN module confirm its validity.
Keywords: Combined ANN, Evolutionary Programming, Particle Swarm Optimization, Genetic Algorithm and Peak load forecasting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16808816 Enhanced Imperialist Competitive Algorithm for the Cell Formation Problem Using Sequence Data
Authors: S. H. Borghei, E. Teymourian, M. Mobin, G. M. Komaki, S. Sheikh
Abstract:
Imperialist Competitive Algorithm (ICA) is a recent meta-heuristic method that is inspired by the social evolutions for solving NP-Hard problems. The ICA is a population-based algorithm which has achieved a great performance in comparison to other metaheuristics. This study is about developing enhanced ICA approach to solve the Cell Formation Problem (CFP) using sequence data. In addition to the conventional ICA, an enhanced version of ICA, namely EICA, applies local search techniques to add more intensification aptitude and embed the features of exploration and intensification more successfully. Suitable performance measures are used to compare the proposed algorithms with some other powerful solution approaches in the literature. In the same way, for checking the proficiency of algorithms, forty test problems are presented. Five benchmark problems have sequence data, and other ones are based on 0-1 matrices modified to sequence based problems. Computational results elucidate the efficiency of the EICA in solving CFP problems.Keywords: Cell formation problem, Group technology, Imperialist competitive algorithm, Sequence data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15888815 Blockchain’s Feasibility in Military Data Networks
Authors: Brenden M. Shutt, Lubjana Beshaj, Paul L. Goethals, Ambrose Kam
Abstract:
Communication security is of particular interest to military data networks. A relatively novel approach to network security is blockchain, a cryptographically secured distribution ledger with a decentralized consensus mechanism for data transaction processing. Recent advances in blockchain technology have proposed new techniques for both data validation and trust management, as well as different frameworks for managing dataflow. The purpose of this work is to test the feasibility of different blockchain architectures as applied to military command and control networks. Various architectures are tested through discrete-event simulation and the feasibility is determined based upon a blockchain design’s ability to maintain long-term stable performance at industry standards of throughput, network latency, and security. This work proposes a consortium blockchain architecture with a computationally inexpensive consensus mechanism, one that leverages a Proof-of-Identity (PoI) concept and a reputation management mechanism.Keywords: Blockchain, command & control network, discrete-event simulation, reputation management.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8478814 What the Future Holds for Social Media Data Analysis
Authors: P. Wlodarczak, J. Soar, M. Ally
Abstract:
The dramatic rise in the use of Social Media (SM) platforms such as Facebook and Twitter provide access to an unprecedented amount of user data. Users may post reviews on products and services they bought, write about their interests, share ideas or give their opinions and views on political issues. There is a growing interest in the analysis of SM data from organisations for detecting new trends, obtaining user opinions on their products and services or finding out about their online reputations. A recent research trend in SM analysis is making predictions based on sentiment analysis of SM. Often indicators of historic SM data are represented as time series and correlated with a variety of real world phenomena like the outcome of elections, the development of financial indicators, box office revenue and disease outbreaks. This paper examines the current state of research in the area of SM mining and predictive analysis and gives an overview of the analysis methods using opinion mining and machine learning techniques.
Keywords: Social Media, text mining, knowledge discovery, predictive analysis, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 38498813 Research of Data Cleaning Methods Based on Dependency Rules
Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin
Abstract:
This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.Keywords: Data cleaning, dependency rules, violation data discovery, data repair.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 26128812 Context-Aware Querying in Multimedia Databases – A Futuristic Approach
Authors: Nadeem Iftikhar, Zouhaib Zafar, Shaukat Ali
Abstract:
Efficient retrieval of multimedia objects has gained enormous focus in recent years. A number of techniques have been suggested for retrieval of textual information; however, relatively little has been suggested for efficient retrieval of multimedia objects. In this paper we have proposed a generic architecture for contextaware retrieval of multimedia objects. The proposed framework combines the well-known approaches of text-based retrieval and context-aware retrieval to formulate architecture for accurate retrieval of multimedia data.
Keywords: Context-aware retrieval, information retrieval, multimedia databases, multimedia data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15368811 Identification of Wideband Sources Using Higher Order Statistics in Noisy Environment
Authors: S. Bourennane, A. Bendjama
Abstract:
This paper deals with the localization of the wideband sources. We develop a new approach for estimating the wide band sources parameters. This method is based on the high order statistics of the recorded data in order to eliminate the Gaussian components from the signals received on the various hydrophones.In fact the noise of sea bottom is regarded as being Gaussian. Thanks to the coherent signal subspace algorithm based on the cumulant matrix of the received data instead of the cross-spectral matrix the wideband correlated sources are perfectly located in the very noisy environment. We demonstrate the performance of the proposed algorithm on the real data recorded during an underwater acoustics experiments.
Keywords: Higher-order statistics, high resolution array processing techniques, localization of acoustics sources, wide band sources.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15998810 Extended Low Power Bus Binding Combined with Data Sequence Reordering
Authors: Jihyung Kim, Taejin Kim, Sungho Park, Jun-Dong Cho
Abstract:
In this paper, we address the problem of reducing the switching activity (SA) in on-chip buses through the use of a bus binding technique in high-level synthesis. While many binding techniques to reduce the SA exist, we present yet another technique for further reducing the switching activity. Our proposed method combines bus binding and data sequence reordering to explore a wider solution space. The problem is formulated as a multiple traveling salesman problem and solved using simulated annealing technique. The experimental results revealed that a binding solution obtained with the proposed method reduces 5.6-27.2% (18.0% on average) and 2.6-12.7% (6.8% on average) of the switching activity when compared with conventional binding-only and hybrid binding-encoding methods, respectively.Keywords: low power, bus binding, switching activity, multiple traveling salesman problem, data sequence reordering
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13338809 Derivation of Monotone Likelihood Ratio Using Two Sided Uniformly Normal Distribution Techniques
Authors: D. A. Farinde
Abstract:
In this paper, two-sided uniformly normal distribution techniques were used in the derivation of monotone likelihood ratio. The approach mainly employed the parameters of the distribution for a class of all size a. The derivation technique is fast, direct and less burdensome when compared to some existing methods.
Keywords: Neyman-Pearson Lemma, Normal distribution
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 32028808 Investigating the Demand for Short-shelf Life Food Products for SME Wholesalers
Authors: Yamini Raju, Parminder S. Kang, Adam Moroz, Ross Clement, Ashley Hopwell, Alistair Duffy
Abstract:
Accurate forecasting of fresh produce demand is one the challenges faced by Small Medium Enterprise (SME) wholesalers. This paper is an attempt to understand the cause for the high level of variability such as weather, holidays etc., in demand of SME wholesalers. Therefore, understanding the significance of unidentified factors may improve the forecasting accuracy. This paper presents the current literature on the factors used to predict demand and the existing forecasting techniques of short shelf life products. It then investigates a variety of internal and external possible factors, some of which is not used by other researchers in the demand prediction process. The results presented in this paper are further analysed using a number of techniques to minimize noise in the data. For the analysis past sales data (January 2009 to May 2014) from a UK based SME wholesaler is used and the results presented are limited to product ‘Milk’ focused on café’s in derby. The correlation analysis is done to check the dependencies of variability factor on the actual demand. Further PCA analysis is done to understand the significance of factors identified using correlation. The PCA results suggest that the cloud cover, weather summary and temperature are the most significant factors that can be used in forecasting the demand. The correlation of the above three factors increased relative to monthly and becomes more stable compared to the weekly and daily demand.Keywords: Demand Forecasting, Deteriorating Products, Food Wholesalers, Principal Component Analysis and Variability Factors.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 33688807 Adequacy of Object-Oriented Framework System-Based Testing Techniques
Authors: Jehad Al Dallal
Abstract:
An application framework provides a reusable design and implementation for a family of software systems. If the framework contains defects, the defects will be passed on to the applications developed from the framework. Framework defects are hard to discover at the time the framework is instantiated. Therefore, it is important to remove all defects before instantiating the framework. In this paper, two measures for the adequacy of an object-oriented system-based testing technique are introduced. The measures assess the usefulness and uniqueness of the testing technique. The two measures are applied to experimentally compare the adequacy of two testing techniques introduced to test objectoriented frameworks at the system level. The two considered testing techniques are the New Framework Test Approach and Testing Frameworks Through Hooks (TFTH). The techniques are also compared analytically in terms of their coverage power of objectoriented aspects. The comparison study results show that the TFTH technique is better than the New Framework Test Approach in terms of usefulness degree, uniqueness degree, and coverage power.Keywords: Object-oriented framework, object-oriented framework testing, test case generation, testing adequacy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14298806 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling
Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal
Abstract:
Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.
Keywords: Benchmark collection, program educational objectives, student outcomes, ABET, Accreditation, machine learning, supervised multiclass classification, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8378805 Literature-Based Discoveries in Lupus Treatment
Authors: Oluwaseyi Jaiyeoba, Vetria Byrd
Abstract:
Systemic lupus erythematosus (aka lupus) is a chronic disease known for its chameleon-like ability to mimic symptoms of other diseases rendering it hard to detect, diagnose and treat. The heterogeneous nature of the disease generates disparate data that are often multifaceted and multi-dimensional. Musculoskeletal manifestation of lupus is one of the most common clinical manifestations of lupus. This research links disparate literature on the treatment of lupus as it affects the musculoskeletal system using the discoveries from literature-based research articles available on the PubMed database. Several Natural Language Processing (NPL) tools exist to connect disjointed but related literature, such as Connected Papers, Bitola, and Gopalakrishnan. Literature-based discovery (LBD) has been used to bridge unconnected disciplines based on text mining procedures. The technical/medical literature consists of many technical/medical concepts, each having its sub-literature. This approach has been used to link Parkinson’s, Raynaud, and Multiple Sclerosis treatment within works of literature. Literature-based discovery methods can connect two or more related but disjointed literature concepts to produce a novel and plausible approach to solving a research problem. Data visualization techniques with the help of natural language processing tools are used to visually represent the result of literature-based discoveries. Literature search results can be voluminous, but Data visualization processes can provide insight and detect subtle patterns in large data. These insights and patterns can lead to discoveries that would have otherwise been hidden from disjointed literature. In this research, literature data are mined and combined with visualization techniques for heterogeneous data to discover viable treatments reported in the literature for lupus expression in the musculoskeletal system. This research answers the question of using literature-based discovery to identify potential treatments for a multifaceted disease like lupus. A three-pronged methodology is used in this research: text mining, natural language processing, and data visualization. These three research-related fields are employed to identify patterns in lupus-related data that, when visually represented, could aid research in the treatment of lupus. This work introduces a method for visually representing interconnections of various lupus-related literature. The methodology outlined in this work is the first step toward literature-based research and treatment planning for the musculoskeletal manifestation of lupus. The results also outline the interconnection of complex, disparate data associated with the manifestation of lupus in the musculoskeletal system. The societal impact of this work is broad. Advances in this work will improve the quality of life for millions of persons in the workforce currently diagnosed and silently living with a musculoskeletal disease associated with lupus.
Keywords: Systemic lupus erythematosus, LBD, Data Visualization, musculoskeletal system, treatment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5068804 Establishing Pairwise Keys Using Key Predistribution Schemes for Sensor Networks
Authors: Y. Harold Robinson, M. Rajaram
Abstract:
Designing cost-efficient, secure network protocols for Wireless Sensor Networks (WSNs) is a challenging problem because sensors are resource-limited wireless devices. Security services such as authentication and improved pairwise key establishment are critical to high efficient networks with sensor nodes. For sensor nodes to correspond securely with each other efficiently, usage of cryptographic techniques is necessary. In this paper, two key predistribution schemes that enable a mobile sink to establish a secure data-communication link, on the fly, with any sensor nodes. The intermediate nodes along the path to the sink are able to verify the authenticity and integrity of the incoming packets using a predicted value of the key generated by the sender’s essential power. The proposed schemes are based on the pairwise key with the mobile sink, our analytical results clearly show that our schemes perform better in terms of network resilience to node capture than existing schemes if used in wireless sensor networks with mobile sinks.Keywords: Wireless Sensor Networks, predistribution scheme, cryptographic techniques.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15898803 Robust Image Registration Based on an Adaptive Normalized Mutual Information Metric
Authors: Huda Algharib, Amal Algharib, Hanan Algharib, Ali Mohammad Alqudah
Abstract:
Image registration is an important topic for many imaging systems and computer vision applications. The standard image registration techniques such as Mutual information/ Normalized mutual information -based methods have a limited performance because they do not consider the spatial information or the relationships between the neighbouring pixels or voxels. In addition, the amount of image noise may significantly affect the registration accuracy. Therefore, this paper proposes an efficient method that explicitly considers the relationships between the adjacent pixels, where the gradient information of the reference and scene images is extracted first, and then the cosine similarity of the extracted gradient information is computed and used to improve the accuracy of the standard normalized mutual information measure. Our experimental results on different data types (i.e. CT, MRI and thermal images) show that the proposed method outperforms a number of image registration techniques in terms of the accuracy.
Keywords: Image registration, mutual information, image gradients, Image transformations.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8968802 Coalescing Data Marts
Authors: N. Parimala, P. Pahwa
Abstract:
OLAP uses multidimensional structures, to provide access to data for analysis. Traditionally, OLAP operations are more focused on retrieving data from a single data mart. An exception is the drill across operator. This, however, is restricted to retrieving facts on common dimensions of the multiple data marts. Our concern is to define further operations while retrieving data from multiple data marts. Towards this, we have defined six operations which coalesce data marts. While doing so we consider the common as well as the non-common dimensions of the data marts.Keywords: Data warehouse, Dimension, OLAP, Star Schema.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15598801 Integration of Educational Data Mining Models to a Web-Based Support System for Predicting High School Student Performance
Authors: Sokkhey Phauk, Takeo Okazaki
Abstract:
The challenging task in educational institutions is to maximize the high performance of students and minimize the failure rate of poor-performing students. An effective method to leverage this task is to know student learning patterns with highly influencing factors and get an early prediction of student learning outcomes at the timely stage for setting up policies for improvement. Educational data mining (EDM) is an emerging disciplinary field of data mining, statistics, and machine learning concerned with extracting useful knowledge and information for the sake of improvement and development in the education environment. The study is of this work is to propose techniques in EDM and integrate it into a web-based system for predicting poor-performing students. A comparative study of prediction models is conducted. Subsequently, high performing models are developed to get higher performance. The hybrid random forest (Hybrid RF) produces the most successful classification. For the context of intervention and improving the learning outcomes, a feature selection method MICHI, which is the combination of mutual information (MI) and chi-square (CHI) algorithms based on the ranked feature scores, is introduced to select a dominant feature set that improves the performance of prediction and uses the obtained dominant set as information for intervention. By using the proposed techniques of EDM, an academic performance prediction system (APPS) is subsequently developed for educational stockholders to get an early prediction of student learning outcomes for timely intervention. Experimental outcomes and evaluation surveys report the effectiveness and usefulness of the developed system. The system is used to help educational stakeholders and related individuals for intervening and improving student performance.
Keywords: Academic performance prediction system, prediction model, educational data mining, dominant factors, feature selection methods, student performance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 975