Search results for: Data Mining Techniques
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9218

Search results for: Data Mining Techniques

8588 Noise Optimization Techniques for 1V 1GHz CMOS Low-Noise Amplifiers Design

Authors: M. Zamin Khan, Yanjie Wang, R. Raut

Abstract:

A 1V, 1GHz low noise amplifier (LNA) has been designed and simulated using Spectre simulator in a standard TSMC 0.18um CMOS technology.With low power and noise optimization techniques, the amplifier provides a gain of 24 dB, a noise figure of only 1.2 dB, power dissipation of 14 mW from a 1 V power supply.

Keywords:

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2440
8587 Harnessing Replication in Object Allocation

Authors: H. T. Barney, G. C. Low

Abstract:

The design of distributed systems involves the partitioning of the system into components or partitions and the allocation of these components to physical nodes. Techniques have been proposed for both the partitioning and allocation process. However these techniques suffer from a number of limitations. For instance object replication has the potential to greatly improve the performance of an object orientated distributed system but can be difficult to use effectively and there are few techniques that support the developer in harnessing object replication. This paper presents a methodological technique that helps developers decide how objects should be allocated in order to improve performance in a distributed system that supports replication. The performance of the proposed technique is demonstrated and tested on an example system.

Keywords: Allocation, Distributed Systems, Replication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1427
8586 Educase – Intelligent System for Pedagogical Advising Using Case-Based Reasoning

Authors: Elionai Moura, José A. da Cunha, César Analide

Abstract:

This paper introduces a proposal scheme for an Intelligent System applied to Pedagogical Advising using Case-Based Reasoning, to find consolidated solutions before used for the new problems, making easier the task of advising students to the pedagogical staff. We do intend, through this work, introduce the motivation behind the choices for this system structure, justifying the development of an incremental and smart web system who learns bests solutions for new cases when it’s used, showing technics and technology.

Keywords: Case-based Reasoning, Pedagogical Advising, Educational Data-Mining (EDM), Machine Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2068
8585 Comparison of S-transform and Wavelet Transform in Power Quality Analysis

Authors: Mohammad Javad Dehghani

Abstract:

In the power quality analysis non-stationary nature of voltage distortions require some precise and powerful analytical techniques. The time-frequency representation (TFR) provides a powerful method for identification of the non-stationary of the signals. This paper investigates a comparative study on two techniques for analysis and visualization of voltage distortions with time-varying amplitudes. The techniques include the Discrete Wavelet Transform (DWT), and the S-Transform. Several power quality problems are analyzed using both the discrete wavelet transform and S–transform, showing clearly the advantage of the S– transform in detecting, localizing, and classifying the power quality problems.

Keywords: Power quality, S-Transform, Short Time FourierTransform , Wavelet Transform, instantaneous sag, swell.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2797
8584 Improving Spatiotemporal Change Detection: A High Level Fusion Approach for Discovering Uncertain Knowledge from Satellite Image Database

Authors: Wadii Boulila, Imed Riadh Farah, Karim Saheb Ettabaa, Basel Solaiman, Henda Ben Ghezala

Abstract:

This paper investigates the problem of tracking spa¬tiotemporal changes of a satellite image through the use of Knowledge Discovery in Database (KDD). The purpose of this study is to help a given user effectively discover interesting knowledge and then build prediction and decision models. Unfortunately, the KDD process for spatiotemporal data is always marked by several types of imperfections. In our paper, we take these imperfections into consideration in order to provide more accurate decisions. To achieve this objective, different KDD methods are used to discover knowledge in satellite image databases. Each method presents a different point of view of spatiotemporal evolution of a query model (which represents an extracted object from a satellite image). In order to combine these methods, we use the evidence fusion theory which considerably improves the spatiotemporal knowledge discovery process and increases our belief in the spatiotemporal model change. Experimental results of satellite images representing the region of Auckland in New Zealand depict the improvement in the overall change detection as compared to using classical methods.

Keywords: Knowledge discovery in satellite databases, knowledge fusion, data imperfection, data mining, spatiotemporal change detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1533
8583 On the outlier Detection in Nonlinear Regression

Authors: Hossein Riazoshams, Midi Habshah, Jr., Mohamad Bakri Adam

Abstract:

The detection of outliers is very essential because of their responsibility for producing huge interpretative problem in linear as well as in nonlinear regression analysis. Much work has been accomplished on the identification of outlier in linear regression, but not in nonlinear regression. In this article we propose several outlier detection techniques for nonlinear regression. The main idea is to use the linear approximation of a nonlinear model and consider the gradient as the design matrix. Subsequently, the detection techniques are formulated. Six detection measures are developed that combined with three estimation techniques such as the Least-Squares, M and MM-estimators. The study shows that among the six measures, only the studentized residual and Cook Distance which combined with the MM estimator, consistently capable of identifying the correct outliers.

Keywords: Nonlinear Regression, outliers, Gradient, LeastSquare, M-estimate, MM-estimate.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3161
8582 A New Approach to Annotate the Text's of the Websites and Documents with a Quite Comprehensive Knowledge Base

Authors: Mohammad Yasrebi, Mehran Mohsenzadeh, Mashalla Abbasi-Dezfuli

Abstract:

Machine-understandable data when strongly interlinked constitutes the basis for the SemanticWeb. Annotating web documents is one of the major techniques for creating metadata on the Web. Annotating websites defines the containing data in a form which is suitable for interpretation by machines. In this paper, we present a new approach to annotate websites and documents by promoting the abstraction level of the annotation process to a conceptual level. By this means, we hope to solve some of the problems of the current annotation solutions.

Keywords: Knowledge base, ontology, semantic annotation, semantic web.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1338
8581 Image-Based (RBG) Technique for Estimating Phosphorus Levels of Crops

Authors: M. M. Ali, Ahmed Al-Ani, Derek Eamus, Daniel K. Y. Tan

Abstract:

In this glasshouse study, we developed a new imagebased non-destructive technique for detecting leaf P status of different crops such as cotton, tomato and lettuce. The plants were grown on a nutrient solution containing different P concentrations, e.g. 0%, 50% and 100% of recommended P concentration (P0 = no P, L; P1 = 2.5 mL 10 L-1 of P and P2 = 5 mL 10 L-1 of P). After 7 weeks of treatment, the plants were harvested and data on leaf P contents were collected using the standard destructive laboratory method and at the same time leaf images were collected by a handheld crop image sensor. We calculated leaf area, leaf perimeter and RGB (red, green and blue) values of these images. These data were further used in linear discriminant analysis (LDA) to estimate leaf P contents, which successfully classified these plants on the basis of leaf P contents. The data indicated that P deficiency in crop plants can be predicted using leaf image and morphological data. Our proposed nondestructive imaging method is precise in estimating P requirements of different crop species.

Keywords: Image-based techniques, leaf area, leaf P contents, linear discriminant analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1639
8580 Designing a Framework for Network Security Protection

Authors: Eric P. Jiang

Abstract:

As the Internet continues to grow at a rapid pace as the primary medium for communications and commerce and as telecommunication networks and systems continue to expand their global reach, digital information has become the most popular and important information resource and our dependence upon the underlying cyber infrastructure has been increasing significantly. Unfortunately, as our dependency has grown, so has the threat to the cyber infrastructure from spammers, attackers and criminal enterprises. In this paper, we propose a new machine learning based network intrusion detection framework for cyber security. The detection process of the framework consists of two stages: model construction and intrusion detection. In the model construction stage, a semi-supervised machine learning algorithm is applied to a collected set of network audit data to generate a profile of normal network behavior and in the intrusion detection stage, input network events are analyzed and compared with the patterns gathered in the profile, and some of them are then flagged as anomalies should these events are sufficiently far from the expected normal behavior. The proposed framework is particularly applicable to the situations where there is only a small amount of labeled network training data available, which is very typical in real world network environments.

Keywords: classification, data analysis and mining, network intrusion detection, semi-supervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1784
8579 Feature Selection and Predictive Modeling of Housing Data Using Random Forest

Authors: Bharatendra Rai

Abstract:

Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).

Keywords: Housing data, feature selection, random forest, Boruta algorithm, root mean square error.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1684
8578 A Survey in Techniques for Imbalanced Intrusion Detection System Datasets

Authors: Najmeh Abedzadeh, Matthew Jacobs

Abstract:

An intrusion detection system (IDS) is a software application that monitors malicious activities and generates alerts if any are detected. However, most network activities in IDS datasets are normal, and the relatively few numbers of attacks make the available data imbalanced. Consequently, cyber-attacks can hide inside a large number of normal activities, and machine learning algorithms have difficulty learning and classifying the data correctly. In this paper, a comprehensive literature review is conducted on different types of algorithms for both implementing the IDS and methods in correcting the imbalanced IDS dataset. The most famous algorithms are machine learning (ML), deep learning (DL), synthetic minority over-sampling technique (SMOTE), and reinforcement learning (RL). Most of the research use the CSE-CIC-IDS2017, CSE-CIC-IDS2018, and NSL-KDD datasets for evaluating their algorithms.

Keywords: IDS, intrusion detection system, imbalanced datasets, sampling algorithms, big data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1084
8577 Reuse of Huge Industrial Areas

Authors: Martina Perinkova, Lenka Kolarcikova, Marketa Twrda

Abstract:

Brownfields are one of the most important problems that must be solved by today's cities. The topic of this article is description of developing a comprehensive transformation of postindustrial area of the former iron factory national cultural heritage lower Vítkovice. City of Ostrava used to be industrial superpower of the Czechoslovak Republic, especially in the area of coal mining and iron production, after declining industrial production and mining in the 80s left many unused areas of former factories generally brownfields and backfields. Since the late 90s we are observing how the city officials or private entities seeking to remedy this situation. Regeneration of brownfields is a very expensive and long-term process. The area is now rebuilt for tourists and residents of the city in the entertainment, cultural, and social center. It was necessary do the reconstruction of the industrial monuments. Equally important was the construction of new buildings, which helped reusing of the entire complex. This is a unique example of transformation of technical monuments and completion of necessary new objects, so that the area could start working again and reintegrate back into the urban system.

Keywords: Brownfields, conversion, historical and industrial buildings, reconstruction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1564
8576 A Data Hiding Model with High Security Features Combining Finite State Machines and PMM method

Authors: Souvik Bhattacharyya, Gautam Sanyal

Abstract:

Recent years have witnessed the rapid development of the Internet and telecommunication techniques. Information security is becoming more and more important. Applications such as covert communication, copyright protection, etc, stimulate the research of information hiding techniques. Traditionally, encryption is used to realize the communication security. However, important information is not protected once decoded. Steganography is the art and science of communicating in a way which hides the existence of the communication. Important information is firstly hidden in a host data, such as digital image, video or audio, etc, and then transmitted secretly to the receiver.In this paper a data hiding model with high security features combining both cryptography using finite state sequential machine and image based steganography technique for communicating information more securely between two locations is proposed. The authors incorporated the idea of secret key for authentication at both ends in order to achieve high level of security. Before the embedding operation the secret information has been encrypted with the help of finite-state sequential machine and segmented in different parts. The cover image is also segmented in different objects through normalized cut.Each part of the encoded secret information has been embedded with the help of a novel image steganographic method (PMM) on different cuts of the cover image to form different stego objects. Finally stego image is formed by combining different stego objects and transmit to the receiver side. At the receiving end different opposite processes should run to get the back the original secret message.

Keywords: Cover Image, Finite state sequential machine, Melaymachine, Pixel Mapping Method (PMM), Stego Image, NCUT.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2248
8575 Real Time Data Communication with FlightGear Using Simulink over a UDP Protocol

Authors: Adil Loya, Ali Haider, Arslan A. Ghaffor, Abubaker Siddique

Abstract:

Simulation and modelling of Unmanned Aerial Vehicle (UAV) has gained wide popularity in front of aerospace community. The demand of designing and modelling optimized control system for UAV has increased ten folds since last decade, as next generation warfare is dependent on unmanned technologies. Therefore, this research focuses on the simulation of nonlinear UAV dynamics on Simulink and its integration with Flightgear. There has been lots of research on implementation of optimizing control using Simulink, however, there are fewer known techniques to simulate these dynamics over Flightgear and a tedious technique of acquiring data has been tackled in this research horizon. Sending data to Flightgear is easy but receiving it from Simulink is not that straight forward, i.e. we can only receive control data on the output. However, in this research we have managed to get the data out from the Flightgear by implementation of level 2 s-function block within Simulink. Moreover, the results captured from Flightgear over a Universal Datagram Protocol (UDP) communication are then compared with the attitude signal that were sent previously. This provide useful information regarding the difference in outputs attained from Simulink to Flightgear. It was found that values received on Simulink were in high agreement with that of the Flightgear output. And complete study has been conducted in a discrete way.

Keywords: aerospace, flight control, FlightGear, communication, Simulink

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1110
8574 A Computational Fluid Dynamic Model of Human Sniffing

Authors: M.V. Shyla, K.B. Naidu

Abstract:

The objective of this paper is to develop a computational model of human nasal cavity from computed tomography (CT) scans using MIMICS software. Computational fluid dynamic techniques were employed to understand nasal airflow. Gambit and Fluent software was used to perform CFD simulation. Velocity profiles, iteration plots, pressure distribution, streamline and pathline patterns for steady, laminar airflow inside the human nasal cavity of healthy and also infected persons are presented in detail. The implications for olfaction are visualized. Results are validated with the available numerical and experimental data. The graphs reveal that airflow varies with different anatomical nasal structures and only fraction of the inspired air reaches the olfactory region. The Deviations in the results suggest that the treatment of infected volunteers will improve the olfactory function.

Keywords: CFD techniques, Finite Volume Method, Fluid dynamic sniffing, Human nasal cavity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2043
8573 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: Analytics, Big Data in Education, Hadoop, Learning Analytics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4855
8572 Functional and Efficient Query Interpreters: Principle, Application and Performances’ Comparison

Authors: Laurent Thiry, Michel Hassenforder

Abstract:

This paper presents a general approach to implement efficient queries’ interpreters in a functional programming language. Indeed, most of the standard tools actually available use an imperative and/or object-oriented language for the implementation (e.g. Java for Jena-Fuseki) but other paradigms are possible with, maybe, better performances. To proceed, the paper first explains how to model data structures and queries in a functional point of view. Then, it proposes a general methodology to get performances (i.e. number of computation steps to answer a query) then it explains how to integrate some optimization techniques (short-cut fusion and, more important, data transformations). It then compares the functional server proposed to a standard tool (Fuseki) demonstrating that the first one can be twice to ten times faster to answer queries.

Keywords: Data transformation, functional programming, information server, optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 738
8571 Discovery of Production Rules with Fuzzy Hierarchy

Authors: Fadl M. Ba-Alwi, Kamal K. Bharadwaj

Abstract:

In this paper a novel algorithm is proposed that integrates the process of fuzzy hierarchy generation and rule discovery for automated discovery of Production Rules with Fuzzy Hierarchy (PRFH) in large databases.A concept of frequency matrix (Freq) introduced to summarize large database that helps in minimizing the number of database accesses, identification and removal of irrelevant attribute values and weak classes during the fuzzy hierarchy generation.Experimental results have established the effectiveness of the proposed algorithm.

Keywords: Data Mining, Degree of subsumption, Freq matrix, Fuzzy hierarchy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1293
8570 Evaluation of the Urban Regeneration Project: Land Use Transformation and SNS Big Data Analysis

Authors: Ju-Young Kim, Tae-Heon Moon, Jung-Hun Cho

Abstract:

Urban regeneration projects have been actively promoted in Korea. In particular, Jeonju Hanok Village is evaluated as one of representative cases in terms of utilizing local cultural heritage sits in the urban regeneration project. However, recently, there has been a growing concern in this area, due to the ‘gentrification’, caused by the excessive commercialization and surging tourists. This trend was changing land and building use and resulted in the loss of identity of the region. In this regard, this study analyzed the land use transformation between 2010 and 2016 to identify the commercialization trend in Jeonju Hanok Village. In addition, it conducted SNS big data analysis on Jeonju Hanok Village from February 14th, 2016 to March 31st, 2016 to identify visitors’ awareness of the village. The study results demonstrate that rapid commercialization was underway, unlikely the initial intention, so that planners and officials in city government should reconsider the project direction and rebuild deliberate management strategies. This study is meaningful in that it analyzed the land use transformation and SNS big data to identify the current situation in urban regeneration area. Furthermore, it is expected that the study results will contribute to the vitalization of regeneration area.

Keywords: Land use, SNS, text mining, urban regeneration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1198
8569 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: Cancer classification, feature selection, deep learning, genetic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1232
8568 Analysis of Codebook Based Channel Feedback Techniques for MIMO-OFDM Systems

Authors: Muhammad Rehan Khalid, Ahmed Farhan Hanif, Adnan Ahmed Khan

Abstract:

This paper investigates the performance of Multiple- Input Multiple-Output (MIMO) feedback system combined with Orthogonal Frequency Division Multiplexing (OFDM). Two types of codebook based channel feedback techniques are used in this work. The first feedback technique uses a combination of both the long-term and short-term channel state information (CSI) at the transmitter, whereas the second technique uses only the short term CSI. The long-term and short-term CSI at the transmitter is used for efficient channel utilization. OFDM is a powerful technique employed in communication systems suffering from frequency selectivity. Combined with multiple antennas at the transmitter and receiver, OFDM proves to be robust against delay spread. Moreover, it leads to significant data rates with improved bit error performance over links having only a single antenna at both the transmitter and receiver. The effectiveness of these techniques has been demonstrated through the simulation of a MIMO-OFDM feedback system. The results have been evaluated for 4x4 MIMO channels. Simulation results indicate the benefits of the MIMO-OFDM channel feedback system over the one without incorporating OFDM. Performance gain of about 3 dB is observed for MIMO-OFDM feedback system as compared to the one without employing OFDM. Hence MIMO-OFDM becomes an attractive approach for future high speed wireless communication systems.

Keywords: MIMO systems, OFDM, Codebooks, Channel Feedback

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1663
8567 Media and Information Literacy (MIL) for Thai Youths

Authors: Waralak Vongdoiwang Siricharoen, Nattanun Siricharoen

Abstract:

The objectives of this study are to determine the role of media that influence the values, attitudes and behaviors of Thai youths. Analytical qualitative research techniques were used for this purpose. Data collection based techniques was used which were individual interviews and focus group discussions with journalists, sample of high school and university students, and parents. The results show that “Social Media" is still the most popular media for Thai youths. It is also still in the hands of the marketing business and it can motivate Thai youths to do so many things. The main reasons of media exposure are to find quality information that they want quickly, get satisfaction and can use social media to get more exciting and to build communities. They believe that the need for media and information literacy skills is defined as making judgments, personal integrity, training of family and the behavior of close friends.

Keywords: Media and Information Literacy, Making Judgments, Personal integrity, Behavior of close friends

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2778
8566 An Analysis of Compression Methods and Implementation of Medical Images in Wireless Network

Authors: C. Rajan, K. Geetha, S. Geetha

Abstract:

The motivation of image compression technique is to reduce the irrelevance and redundancy of the image data in order to store or pass data in an efficient way from one place to another place. There are several types of compression methods available. Without the help of compression technique, the file size is knowingly larger, usually several megabytes, but by doing the compression technique, it is possible to reduce file size up to 10% as of the original without noticeable loss in quality. Image compression can be lossless or lossy. The compression technique can be applied to images, audio, video and text data. This research work mainly concentrates on methods of encoding, DCT, compression methods, security, etc. Different methodologies and network simulations have been analyzed here. Various methods of compression methodologies and its performance metrics has been investigated and presented in a table manner.

Keywords: Image compression techniques, encoding, DCT, lossy compression, lossless compression, JPEG.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1179
8565 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.

Keywords: Data cleaning, dependency rules, violation data discovery, data repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2595
8564 An Automated Stock Investment System Using Machine Learning Techniques: An Application in Australia

Authors: Carol Anne Hargreaves

Abstract:

A key issue in stock investment is how to select representative features for stock selection. The objective of this paper is to firstly determine whether an automated stock investment system, using machine learning techniques, may be used to identify a portfolio of growth stocks that are highly likely to provide returns better than the stock market index. The second objective is to identify the technical features that best characterize whether a stock’s price is likely to go up and to identify the most important factors and their contribution to predicting the likelihood of the stock price going up. Unsupervised machine learning techniques, such as cluster analysis, were applied to the stock data to identify a cluster of stocks that was likely to go up in price – portfolio 1. Next, the principal component analysis technique was used to select stocks that were rated high on component one and component two – portfolio 2. Thirdly, a supervised machine learning technique, the logistic regression method, was used to select stocks with a high probability of their price going up – portfolio 3. The predictive models were validated with metrics such as, sensitivity (recall), specificity and overall accuracy for all models. All accuracy measures were above 70%. All portfolios outperformed the market by more than eight times. The top three stocks were selected for each of the three stock portfolios and traded in the market for one month. After one month the return for each stock portfolio was computed and compared with the stock market index returns. The returns for all three stock portfolios was 23.87% for the principal component analysis stock portfolio, 11.65% for the logistic regression portfolio and 8.88% for the K-means cluster portfolio while the stock market performance was 0.38%. This study confirms that an automated stock investment system using machine learning techniques can identify top performing stock portfolios that outperform the stock market.

Keywords: Machine learning, stock market trading, logistic principal component analysis, automated stock investment system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1074
8563 Construction of Large Scale UAVs Using Homebuilt Composite Techniques

Authors: Brian J. Kozak, Joshua D. Shipman, Peng Hao Wang, Blake Shipp

Abstract:

The unmanned aerial system (UAS) industry is growing at a rapid pace. This growth has increased the demand for low cost, custom made and high strength unmanned aerial vehicles (UAV). The area of most growth is in the area of 25 kg to 200 kg vehicles. Vehicles this size are beyond the size and scope of simple wood and fabric designs commonly found in hobbyist aircraft. These high end vehicles require stronger materials to complete their mission. Traditional aircraft construction materials such as aluminum are difficult to use without machining or advanced computer controlled tooling. However, by using general aviation composite aircraft homebuilding techniques and materials, a large scale UAV can be constructed cheaply and easily. Furthermore, these techniques could be used to easily manufacture cost made composite shapes and airfoils that would be cost prohibitive when using metals. These homebuilt aircraft techniques are being demonstrated by the researchers in the construction of a 75 kg aircraft.

Keywords: Composite aircraft, homebuilding, unmanned aerial system, unmanned aerial vehicles.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 796
8562 Fault Detection of Drinking Water Treatment Process Using PCA and Hotelling's T2 Chart

Authors: Joval P George, Dr. Zheng Chen, Philip Shaw

Abstract:

This paper deals with the application of Principal Component Analysis (PCA) and the Hotelling-s T2 Chart, using data collected from a drinking water treatment process. PCA is applied primarily for the dimensional reduction of the collected data. The Hotelling-s T2 control chart was used for the fault detection of the process. The data was taken from a United Utilities Multistage Water Treatment Works downloaded from an Integrated Program Management (IPM) dashboard system. The analysis of the results show that Multivariate Statistical Process Control (MSPC) techniques such as PCA, and control charts such as Hotelling-s T2, can be effectively applied for the early fault detection of continuous multivariable processes such as Drinking Water Treatment. The software package SIMCA-P was used to develop the MSPC models and Hotelling-s T2 Chart from the collected data.

Keywords: Principal component analysis, hotelling's t2 chart, multivariate statistical process control, drinking water treatment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2762
8561 Adaptation of State/Transition-Based Methods for Embedded System Testing

Authors: Abdelaziz Guerrouat, Harald Richter

Abstract:

In this paper test generation methods and appropriate fault models for testing and analysis of embedded systems described as (extended) finite state machines ((E)FSMs) are presented. Compared to simple FSMs, EFSMs specify not only the control flow but also the data flow. Thus, we define a two-level fault model to cover both aspects. The goal of this paper is to reuse well-known FSM-based test generation methods for automation of embedded system testing. These methods have been widely used in testing and validation of protocols and communicating systems. In particular, (E)FSMs-based specification and testing is more advantageous because (E)FSMs support the formal semantic of already standardised formal description techniques (FDTs) despite of their popularity in the design of hardware and software systems.

Keywords: Formal methods, testing and validation, finite state machines, formal description techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2081
8560 Node Insertion in Coalescence Hidden-Variable Fractal Interpolation Surface

Authors: Srijanani Anurag Prasad

Abstract:

The Coalescence Hidden-variable Fractal Interpolation Surface (CHFIS) was built by combining interpolation data from the Iterated Function System (IFS). The interpolation data in a CHFIS comprise a row and/or column of uncertain values when a single point is entered. Alternatively, a row and/or column of additional points are placed in the given interpolation data to demonstrate the node added CHFIS. There are three techniques for inserting new points that correspond to the row and/or column of nodes inserted, and each method is further classified into four types based on the values of the inserted nodes. As a result, numerous forms of node insertion can be found in a CHFIS.

Keywords: Fractal, interpolation, iterated function system, coalescence, node insertion, knot insertion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 304
8559 A New Type of Integration Error and its Influence on Integration Testing Techniques

Authors: P. Prema, B. Ramadoss

Abstract:

Testing is an activity that is required both in the development and maintenance of the software development life cycle in which Integration Testing is an important activity. Integration testing is based on the specification and functionality of the software and thus could be called black-box testing technique. The purpose of integration testing is testing integration between software components. In function or system testing, the concern is with overall behavior and whether the software meets its functional specifications or performance characteristics or how well the software and hardware work together. This explains the importance and necessity of IT for which the emphasis is on interactions between modules and their interfaces. Software errors should be discovered early during IT to reduce the costs of correction. This paper introduces a new type of integration error, presenting an overview of Integration Testing techniques with comparison of each technique and also identifying which technique detects what type of error.

Keywords: Integration Error, Integration Error Types, Integration Testing Techniques, Software Testing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2197