Search results for: image data.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8581

Search results for: image data.

7231 A Distributed Approach to Extract High Utility Itemsets from XML Data

Authors: S. Kannimuthu, K. Premalatha

Abstract:

This paper investigates a new data mining capability that entails mining of High Utility Itemsets (HUI) in a distributed environment. Existing research in data mining deals with only presence or absence of an items and do not consider the semantic measures like weight or cost of the items. Thus, HUI mining algorithm has evolved. HUI mining is the one kind of utility mining concept, aims to identify itemsets whose utility satisfies a given threshold. Although, the approach of mining HUIs in a distributed environment and mining of the same from XML data have not explored yet. In this work, a novel approach is proposed to mine HUIs from the XML based data in a distributed environment. This work utilizes Service Oriented Computing (SOC) paradigm which provides Knowledge as a Service (KaaS). The interesting patterns are provided via the web services with the help of knowledge server to answer the queries of the consumers. The performance of the approach is evaluated on various databases using execution time and memory consumption.

Keywords: Data mining, Knowledge as a Service, service oriented computing, utility mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2454
7230 Person Identification using Gait by Combined Features of Width and Shape of the Binary Silhouette

Authors: M.K. Bhuyan, Aragala Jagan.

Abstract:

Current image-based individual human recognition methods, such as fingerprints, face, or iris biometric modalities generally require a cooperative subject, views from certain aspects, and physical contact or close proximity. These methods cannot reliably recognize non-cooperating individuals at a distance in the real world under changing environmental conditions. Gait, which concerns recognizing individuals by the way they walk, is a relatively new biometric without these disadvantages. The inherent gait characteristic of an individual makes it irreplaceable and useful in visual surveillance. In this paper, an efficient gait recognition system for human identification by extracting two features namely width vector of the binary silhouette and the MPEG-7-based region-based shape descriptors is proposed. In the proposed method, foreground objects i.e., human and other moving objects are extracted by estimating background information by a Gaussian Mixture Model (GMM) and subsequently, median filtering operation is performed for removing noises in the background subtracted image. A moving target classification algorithm is used to separate human being (i.e., pedestrian) from other foreground objects (viz., vehicles). Shape and boundary information is used in the moving target classification algorithm. Subsequently, width vector of the outer contour of binary silhouette and the MPEG-7 Angular Radial Transform coefficients are taken as the feature vector. Next, the Principal Component Analysis (PCA) is applied to the selected feature vector to reduce its dimensionality. These extracted feature vectors are used to train an Hidden Markov Model (HMM) for identification of some individuals. The proposed system is evaluated using some gait sequences and the experimental results show the efficacy of the proposed algorithm.

Keywords: Gait Recognition, Gaussian Mixture Model, PrincipalComponent Analysis, MPEG-7 Angular Radial Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1911
7229 On the Network Packet Loss Tolerance of SVM Based Activity Recognition

Authors: Gamze Uslu, Sebnem Baydere, Alper K. Demir

Abstract:

In this study, data loss tolerance of Support Vector Machines (SVM) based activity recognition model and multi activity classification performance when data are received over a lossy wireless sensor network is examined. Initially, the classification algorithm we use is evaluated in terms of resilience to random data loss with 3D acceleration sensor data for sitting, lying, walking and standing actions. The results show that the proposed classification method can recognize these activities successfully despite high data loss. Secondly, the effect of differentiated quality of service performance on activity recognition success is measured with activity data acquired from a multi hop wireless sensor network, which introduces  high data loss. The effect of number of nodes on the reliability and multi activity classification success is demonstrated in simulation environment. To the best of our knowledge, the effect of data loss in a wireless sensor network on activity detection success rate of an SVM based classification algorithm has not been studied before.

Keywords: Activity recognition, support vector machines, acceleration sensor, wireless sensor networks, packet loss.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2871
7228 Performance and Availability Analyses of PV Generation Systems in Taiwan

Authors: H. S. Huang, J. C. Jao, K. L. Yen, C. T. Tsai

Abstract:

The purpose of this article applies the monthly final energy yield and failure data of 202 PV systems installed in Taiwan to analyze the PV operational performance and system availability. This data is collected by Industrial Technology Research Institute through manual records. Bad data detection and failure data estimation approaches are proposed to guarantee the quality of the received information. The performance ratio value and system availability are then calculated and compared with those of other countries. It is indicated that the average performance ratio of Taiwan-s PV systems is 0.74 and the availability is 95.7%. These results are similar with those of Germany, Switzerland, Italy and Japan.

Keywords: availability, performance ratio, PV system, Taiwan

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4439
7227 Stealthy Network Transfer of Data

Authors: N. Veerasamy, C. J. Cheyne

Abstract:

Users of computer systems may often require the private transfer of messages/communications between parties across a network. Information warfare and the protection and dominance of information in the military context is a prime example of an application area in which the confidentiality of data needs to be maintained. The safe transportation of critical data is therefore often a vital requirement for many private communications. However, unwanted interception/sniffing of communications is also a possibility. An elementary stealthy transfer scheme is therefore proposed by the authors. This scheme makes use of encoding, splitting of a message and the use of a hashing algorithm to verify the correctness of the reconstructed message. For this proof-of-concept purpose, the authors have experimented with the random sending of encoded parts of a message and the construction thereof to demonstrate how data can stealthily be transferred across a network so as to prevent the obvious retrieval of data.

Keywords: Construction, encode, interception, stealthy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1196
7226 Survey on Arabic Sentiment Analysis in Twitter

Authors: Sarah O. Alhumoud, Mawaheb I. Altuwaijri, Tarfa M. Albuhairi, Wejdan M. Alohaideb

Abstract:

Large-scale data stream analysis has become one of the important business and research priorities lately. Social networks like Twitter and other micro-blogging platforms hold an enormous amount of data that is large in volume, velocity and variety. Extracting valuable information and trends out of these data would aid in a better understanding and decision-making. Multiple analysis techniques are deployed for English content. Moreover, one of the languages that produce a large amount of data over social networks and is least analyzed is the Arabic language. The proposed paper is a survey on the research efforts to analyze the Arabic content in Twitter focusing on the tools and methods used to extract the sentiments for the Arabic content on Twitter.

Keywords: Big Data, Social Networks, Sentiment Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4348
7225 A Selective Markovianity Approach for Image Segmentation

Authors: A. Melouah, H. Merouani

Abstract:

A new Markovianity approach is introduced in this paper. This approach reduces the response time of classic Markov Random Fields approach. First, one region is determinated by a clustering technique. Then, this region is excluded from the study. The remaining pixel form the study zone and they are selected for a Markovianity segmentation task. With Selective Markovianity approach, segmentation process is faster than classic one.

Keywords: Markovianity, response time, segmentation, study zone.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1458
7224 An Efficient Segmentation Method Based on Local Entropy Characteristics of Iris Biometrics

Authors: Ali Shojaee Bakhtiari, Ali Asghar Beheshti Shirazi, Amir Sepasi Zahmati

Abstract:

An efficient iris segmentation method based on analyzing the local entropy characteristic of the iris image, is proposed in this paper and the strength and weaknesses of the method are analyzed for practical purposes. The method shows special strength in providing designers with an adequate degree of freedom in choosing the proper sections of the iris for their application purposes.

Keywords: Iris segmentation, entropy, biocryptosystem, biometric identification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1428
7223 Relevance Feedback within CBIR Systems

Authors: Mawloud Mosbah, Bachir Boucheham

Abstract:

We present here the results for a comparative study of some techniques, available in the literature, related to the relevance feedback mechanism in the case of a short-term learning. Only one method among those considered here is belonging to the data mining field which is the K-nearest neighbors algorithm (KNN) while the rest of the methods is related purely to the information retrieval field and they fall under the purview of the following three major axes: Shifting query, Feature Weighting and the optimization of the parameters of similarity metric. As a contribution, and in addition to the comparative purpose, we propose a new version of the KNN algorithm referred to as an incremental KNN which is distinct from the original version in the sense that besides the influence of the seeds, the rate of the actual target image is influenced also by the images already rated. The results presented here have been obtained after experiments conducted on the Wang database for one iteration and utilizing color moments on the RGB space. This compact descriptor, Color Moments, is adequate for the efficiency purposes needed in the case of interactive systems. The results obtained allow us to claim that the proposed algorithm proves good results; it even outperforms a wide range of techniques available in the literature.

Keywords: CBIR, Category Search, Relevance Feedback (RFB), Query Point Movement, Standard Rocchio’s Formula, Adaptive Shifting Query, Feature Weighting, Optimization of the Parameters of Similarity Metric, Original KNN, Incremental KNN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2342
7222 Wildfires Assessed by Remote Sense Images and Burned Land Monitoring

Authors: M. C. Proença

Abstract:

The tools described in this paper enable the location of burned areas where took place the annihilation of natural habitats and establishes a baseline for major changes in forest ecosystems during recovery. Moreover, the result allows the follow up of the surface fuel loading, allowing the evaluation and guidance of restoration measures to remote areas by phased time planning. This case study implements the evaluation of burned areas that suffered successive wildfires in Portugal mainland during the summer of 2017, killing more than 60 people. The goal is to show that this evaluation can be done with remote sense data free of charges in a simple laptop, with open-source software, describing the not-so-simple methodology step by step, to make it accessible for local workers in the areas attained, where the availability of information is essential for the immediate planning of mitigation measures, such as restoring road access, allocate funds for the recovery of human dwellings and assess further needs for restoration of the ecological system. Wildfires also devastate forest ecosystems having a direct impact on vegetation cover and killing or driving away the animal population, besides loss of all crops in rural areas that are essential as local resources. The economic interests are also attained, as the pinewood burned becomes useless for the noblest applications, so its value decreases, and resin extraction ends for several years.

Keywords: Image processing, remote sensing, wildfires, burned areas, SENTINEL-2.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1583
7221 Comparative Analysis of the Third Generation of Research Data for Evaluation of Solar Energy Potential

Authors: Claudineia Brazil, Elison Eduardo Jardim Bierhals, Luciane Teresa Salvi, Rafael Haag

Abstract:

Renewable energy sources are dependent on climatic variability, so for adequate energy planning, observations of the meteorological variables are required, preferably representing long-period series. Despite the scientific and technological advances that meteorological measurement systems have undergone in the last decades, there is still a considerable lack of meteorological observations that form series of long periods. The reanalysis is a system of assimilation of data prepared using general atmospheric circulation models, based on the combination of data collected at surface stations, ocean buoys, satellites and radiosondes, allowing the production of long period data, for a wide gamma. The third generation of reanalysis data emerged in 2010, among them is the Climate Forecast System Reanalysis (CFSR) developed by the National Centers for Environmental Prediction (NCEP), these data have a spatial resolution of 0.50 x 0.50. In order to overcome these difficulties, it aims to evaluate the performance of solar radiation estimation through alternative data bases, such as data from Reanalysis and from meteorological satellites that satisfactorily meet the absence of observations of solar radiation at global and/or regional level. The results of the analysis of the solar radiation data indicated that the reanalysis data of the CFSR model presented a good performance in relation to the observed data, with determination coefficient around 0.90. Therefore, it is concluded that these data have the potential to be used as an alternative source in locations with no seasons or long series of solar radiation, important for the evaluation of solar energy potential.

Keywords: Climate, reanalysis, renewable energy, solar radiation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 906
7220 Explorative Data Mining of Constructivist Learning Experiences and Activities with Multiple Dimensions

Authors: Patrick Wessa, Bart Baesens

Abstract:

This paper discusses the use of explorative data mining tools that allow the educator to explore new relationships between reported learning experiences and actual activities, even if there are multiple dimensions with a large number of measured items. The underlying technology is based on the so-called Compendium Platform for Reproducible Computing (http://www.freestatistics.org) which was built on top the computational R Framework (http://www.wessa.net).

Keywords: Reproducible computing, data mining, explorative data analysis, compendium technology, computer assisted education

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1253
7219 New Approach for Constructing a Secure Biometric Database

Authors: A. Kebbeb, M. Mostefai, F. Benmerzoug, Y. Chahir

Abstract:

The multimodal biometric identification is the combination of several biometric systems; the challenge of this combination is to reduce some limitations of systems based on a single modality while significantly improving performance. In this paper, we propose a new approach to the construction and the protection of a multimodal biometric database dedicated to an identification system. We use a topological watermarking to hide the relation between face image and the registered descriptors extracted from other modalities of the same person for more secure user identification.

Keywords: Biometric databases, Multimodal biometrics, security authentication, Digital watermarking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2090
7218 Analysis of Textual Data Based On Multiple 2-Class Classification Models

Authors: Shigeaki Sakurai, Ryohei Orihara

Abstract:

This paper proposes a new method for analyzing textual data. The method deals with items of textual data, where each item is described based on various viewpoints. The method acquires 2- class classification models of the viewpoints by applying an inductive learning method to items with multiple viewpoints. The method infers whether the viewpoints are assigned to the new items or not by using the models. The method extracts expressions from the new items classified into the viewpoints and extracts characteristic expressions corresponding to the viewpoints by comparing the frequency of expressions among the viewpoints. This paper also applies the method to questionnaire data given by guests at a hotel and verifies its effect through numerical experiments.

Keywords: Text mining, Multiple viewpoints, Differential analysis, Questionnaire data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1290
7217 A New Implementation of PCA for Fast Face Detection

Authors: Hazem M. El-Bakry

Abstract:

Principal Component Analysis (PCA) has many different important applications especially in pattern detection such as face detection / recognition. Therefore, for real time applications, the response time is required to be as small as possible. In this paper, new implementation of PCA for fast face detection is presented. Such new implementation is designed based on cross correlation in the frequency domain between the input image and eigenvectors (weights). Simulation results show that the proposed implementation of PCA is faster than conventional one.

Keywords: Fast Face Detection, PCA, Cross Correlation, Frequency Domain

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1797
7216 Using Automated Database Reverse Engineering for Database Integration

Authors: M. R. Abbasifard, M. Rahgozar, A. Bayati, P. Pournemati

Abstract:

One important problem in today organizations is the existence of non-integrated information systems, inconsistency and lack of suitable correlations between legacy and modern systems. One main solution is to transfer the local databases into a global one. In this regards we need to extract the data structures from the legacy systems and integrate them with the new technology systems. In legacy systems, huge amounts of a data are stored in legacy databases. They require particular attention since they need more efforts to be normalized, reformatted and moved to the modern database environments. Designing the new integrated (global) database architecture and applying the reverse engineering requires data normalization. This paper proposes the use of database reverse engineering in order to integrate legacy and modern databases in organizations. The suggested approach consists of methods and techniques for generating data transformation rules needed for the data structure normalization.

Keywords: Reverse Engineering, Database Integration, System Integration, Data Structure Normalization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1852
7215 Analysis of Cooperative Learning Behavior Based on the Data of Students' Movement

Authors: Wang Lin, Li Zhiqiang

Abstract:

The purpose of this paper is to analyze the cooperative learning behavior pattern based on the data of students' movement. The study firstly reviewed the cooperative learning theory and its research status, and briefly introduced the k-means clustering algorithm. Then, it used clustering algorithm and mathematical statistics theory to analyze the activity rhythm of individual student and groups in different functional areas, according to the movement data provided by 10 first-year graduate students. It also focused on the analysis of students' behavior in the learning area and explored the law of cooperative learning behavior. The research result showed that the cooperative learning behavior analysis method based on movement data proposed in this paper is feasible. From the results of data analysis, the characteristics of behavior of students and their cooperative learning behavior patterns could be found.

Keywords: Behavior pattern, cooperative learning, data analyze, K-means clustering algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 814
7214 Implementing a Visual Servoing System for Robot Controlling

Authors: Maryam Vafadar, Alireza Behrad, Saeed Akbari

Abstract:

Nowadays, with the emerging of the new applications like robot control in image processing, artificial vision for visual servoing is a rapidly growing discipline and Human-machine interaction plays a significant role for controlling the robot. This paper presents a new algorithm based on spatio-temporal volumes for visual servoing aims to control robots. In this algorithm, after applying necessary pre-processing on video frames, a spatio-temporal volume is constructed for each gesture and feature vector is extracted. These volumes are then analyzed for matching in two consecutive stages. For hand gesture recognition and classification we tested different classifiers including k-Nearest neighbor, learning vector quantization and back propagation neural networks. We tested the proposed algorithm with the collected data set and results showed the correct gesture recognition rate of 99.58 percent. We also tested the algorithm with noisy images and algorithm showed the correct recognition rate of 97.92 percent in noisy images.

Keywords: Back propagation neural network, Feature vector, Hand gesture recognition, k-Nearest Neighbor, Learning vector quantization neural network, Robot control, Spatio-temporal volume, Visual servoing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1670
7213 A Security Cloud Storage Scheme Based Accountable Key-Policy Attribute-Based Encryption without Key Escrow

Authors: Ming Lun Wang, Yan Wang, Ning Ruo Sun

Abstract:

With the development of cloud computing, more and more users start to utilize the cloud storage service. However, there exist some issues: 1) cloud server steals the shared data, 2) sharers collude with the cloud server to steal the shared data, 3) cloud server tampers the shared data, 4) sharers and key generation center (KGC) conspire to steal the shared data. In this paper, we use advanced encryption standard (AES), hash algorithms, and accountable key-policy attribute-based encryption without key escrow (WOKE-AKP-ABE) to build a security cloud storage scheme. Moreover, the data are encrypted to protect the privacy. We use hash algorithms to prevent the cloud server from tampering the data uploaded to the cloud. Analysis results show that this scheme can resist conspired attacks.

Keywords: Cloud storage security, sharing storage, attributes, Hash algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1037
7212 Multimethod Approach to Research in Interlanguage Pragmatics

Authors: Saad Al-Gahtani, Ghassan H Al Shatter

Abstract:

Argument over the use of particular method in interlanguage pragmatics has increased recently. Researchers argued the advantages and disadvantages of each method either natural or elicited. Findings of different studies indicated that the use of one method may not provide enough data to answer all its questions. The current study investigated the validity of using multimethod approach in interlanguage pragmatics to understand the development of requests in Arabic as a second language (Arabic L2). To this end, the study adopted two methods belong to two types of data sources: the institutional discourse (natural data), and the role play (elicited data). Participants were 117 learners of Arabic L2 at the university level, representing four levels (beginners, low-intermediate, highintermediate, and advanced). Results showed that using two or more methods in interlanguage pragmatics affect the size and nature of data.

Keywords: Arabic L2, Development of requests, Interlanguage Pragmatics, Multimethod approach.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1830
7211 Hand Gesture Detection via EmguCV Canny Pruning

Authors: N. N. Mosola, S. J. Molete, L. S. Masoebe, M. Letsae

Abstract:

Hand gesture recognition is a technique used to locate, detect, and recognize a hand gesture. Detection and recognition are concepts of Artificial Intelligence (AI). AI concepts are applicable in Human Computer Interaction (HCI), Expert systems (ES), etc. Hand gesture recognition can be used in sign language interpretation. Sign language is a visual communication tool. This tool is used mostly by deaf societies and those with speech disorder. Communication barriers exist when societies with speech disorder interact with others. This research aims to build a hand recognition system for Lesotho’s Sesotho and English language interpretation. The system will help to bridge the communication problems encountered by the mentioned societies. The system has various processing modules. The modules consist of a hand detection engine, image processing engine, feature extraction, and sign recognition. Detection is a process of identifying an object. The proposed system uses Canny pruning Haar and Haarcascade detection algorithms. Canny pruning implements the Canny edge detection. This is an optimal image processing algorithm. It is used to detect edges of an object. The system employs a skin detection algorithm. The skin detection performs background subtraction, computes the convex hull, and the centroid to assist in the detection process. Recognition is a process of gesture classification. Template matching classifies each hand gesture in real-time. The system was tested using various experiments. The results obtained show that time, distance, and light are factors that affect the rate of detection and ultimately recognition. Detection rate is directly proportional to the distance of the hand from the camera. Different lighting conditions were considered. The more the light intensity, the faster the detection rate. Based on the results obtained from this research, the applied methodologies are efficient and provide a plausible solution towards a light-weight, inexpensive system which can be used for sign language interpretation.

Keywords: Canny pruning, hand recognition, machine learning, skin tracking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1309
7210 Design of Integration Security System using XML Security

Authors: Juhan Kim, Soohyung Kim, Kiyoung Moon

Abstract:

In this paper, we design an integration security system that provides authentication service, authorization service, and management service of security data and a unified interface for the management service. The interface is originated from XKMS protocol and is used to manage security data such as XACML policies, SAML assertions and other authentication security data including public keys. The system includes security services such as authentication, authorization and delegation of authentication by employing SAML and XACML based on security data such as authentication data, attributes information, assertions and polices managed with the interface in the system. It also has SAML producer that issues assertions related on the result of the authentication and the authorization services.

Keywords: XML, XML Security, XACML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1429
7209 An Evaluation Model for Semantic Enablement of Virtual Research Environments

Authors: Tristan O'Neill, Trina Myers, Jarrod Trevathan

Abstract:

The Tropical Data Hub (TDH) is a virtual research environment that provides researchers with an e-research infrastructure to congregate significant tropical data sets for data reuse, integration, searching, and correlation. However, researchers often require data and metadata synthesis across disciplines for crossdomain analyses and knowledge discovery. A triplestore offers a semantic layer to achieve a more intelligent method of search to support the synthesis requirements by automating latent linkages in the data and metadata. Presently, the benchmarks to aid the decision of which triplestore is best suited for use in an application environment like the TDH are limited to performance. This paper describes a new evaluation tool developed to analyze both features and performance. The tool comprises a weighted decision matrix to evaluate the interoperability, functionality, performance, and support availability of a range of integrated and native triplestores to rank them according to requirements of the TDH.

Keywords: Virtual research environment, Semantic Web, performance analysis, tropical data hub.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1784
7208 Dimension Reduction of Microarray Data Based on Local Principal Component

Authors: Ali Anaissi, Paul J. Kennedy, Madhu Goyal

Abstract:

Analysis and visualization of microarraydata is veryassistantfor biologists and clinicians in the field of diagnosis and treatment of patients. It allows Clinicians to better understand the structure of microarray and facilitates understanding gene expression in cells. However, microarray dataset is a complex data set and has thousands of features and a very small number of observations. This very high dimensional data set often contains some noise, non-useful information and a small number of relevant features for disease or genotype. This paper proposes a non-linear dimensionality reduction algorithm Local Principal Component (LPC) which aims to maps high dimensional data to a lower dimensional space. The reduced data represents the most important variables underlying the original data. Experimental results and comparisons are presented to show the quality of the proposed algorithm. Moreover, experiments also show how this algorithm reduces high dimensional data whilst preserving the neighbourhoods of the points in the low dimensional space as in the high dimensional space.

Keywords: Linear Dimension Reduction, Non-Linear Dimension Reduction, Principal Component Analysis, Biologists.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1574
7207 Normal and Peaberry Coffee Beans Classification from Green Coffee Bean Images Using Convolutional Neural Networks and Support Vector Machine

Authors: Hira Lal Gope, Hidekazu Fukai

Abstract:

The aim of this study is to develop a system which can identify and sort peaberries automatically at low cost for coffee producers in developing countries. In this paper, the focus is on the classification of peaberries and normal coffee beans using image processing and machine learning techniques. The peaberry is not bad and not a normal bean. The peaberry is born in an only single seed, relatively round seed from a coffee cherry instead of the usual flat-sided pair of beans. It has another value and flavor. To make the taste of the coffee better, it is necessary to separate the peaberry and normal bean before green coffee beans roasting. Otherwise, the taste of total beans will be mixed, and it will be bad. In roaster procedure time, all the beans shape, size, and weight must be unique; otherwise, the larger bean will take more time for roasting inside. The peaberry has a different size and different shape even though they have the same weight as normal beans. The peaberry roasts slower than other normal beans. Therefore, neither technique provides a good option to select the peaberries. Defect beans, e.g., sour, broken, black, and fade bean, are easy to check and pick up manually by hand. On the other hand, the peaberry pick up is very difficult even for trained specialists because the shape and color of the peaberry are similar to normal beans. In this study, we use image processing and machine learning techniques to discriminate the normal and peaberry bean as a part of the sorting system. As the first step, we applied Deep Convolutional Neural Networks (CNN) and Support Vector Machine (SVM) as machine learning techniques to discriminate the peaberry and normal bean. As a result, better performance was obtained with CNN than with SVM for the discrimination of the peaberry. The trained artificial neural network with high performance CPU and GPU in this work will be simply installed into the inexpensive and low in calculation Raspberry Pi system. We assume that this system will be used in under developed countries. The study evaluates and compares the feasibility of the methods in terms of accuracy of classification and processing speed.

Keywords: Convolutional neural networks, coffee bean, peaberry, sorting, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1553
7206 Heterogeneous Attribute Reduction in Noisy System based on a Generalized Neighborhood Rough Sets Model

Authors: Siyuan Jing, Kun She

Abstract:

Neighborhood Rough Sets (NRS) has been proven to be an efficient tool for heterogeneous attribute reduction. However, most of researches are focused on dealing with complete and noiseless data. Factually, most of the information systems are noisy, namely, filled with incomplete data and inconsistent data. In this paper, we introduce a generalized neighborhood rough sets model, called VPTNRS, to deal with the problem of heterogeneous attribute reduction in noisy system. We generalize classical NRS model with tolerance neighborhood relation and the probabilistic theory. Furthermore, we use the neighborhood dependency to evaluate the significance of a subset of heterogeneous attributes and construct a forward greedy algorithm for attribute reduction based on it. Experimental results show that the model is efficient to deal with noisy data.

Keywords: attribute reduction, incomplete data, inconsistent data, tolerance neighborhood relation, rough sets

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1588
7205 A Mobile Agent-based Clustering Data Fusion Algorithm in WSN

Authors: Xiangbin Zhu, Wenjuan Zhang

Abstract:

In wireless sensor networks,the mobile agent technology is used in data fusion. According to the node residual energy and the results of partial integration,we design the node clustering algorithm. Optimization of mobile agent in the routing within the cluster strategy for wireless sensor networks to further reduce the amount of data transfer. Through the experiments, using mobile agents in the integration process within the cluster can be reduced the path loss in some extent.

Keywords: wireless sensor networks, data fusion, mobile agent

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1511
7204 Collision Detection Algorithm Based on Data Parallelism

Authors: Zhen Peng, Baifeng Wu

Abstract:

Modern computing technology enters the era of parallel computing with the trend of sustainable and scalable parallelism. Single Instruction Multiple Data (SIMD) is an important way to go along with the trend. It is able to gather more and more computing ability by increasing the number of processor cores without the need of modifying the program. Meanwhile, in the field of scientific computing and engineering design, many computation intensive applications are facing the challenge of increasingly large amount of data. Data parallel computing will be an important way to further improve the performance of these applications. In this paper, we take the accurate collision detection in building information modeling as an example. We demonstrate a model for constructing a data parallel algorithm. According to the model, a complex object is decomposed into the sets of simple objects; collision detection among complex objects is converted into those among simple objects. The resulting algorithm is a typical SIMD algorithm, and its advantages in parallelism and scalability is unparalleled in respect to the traditional algorithms.

Keywords: Data parallelism, collision detection, single instruction multiple data, building information modeling, continuous scalability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1235
7203 Development of Energy Benchmarks Using Mandatory Energy and Emissions Reporting Data: Ontario Post-Secondary Residences

Authors: C. Xavier Mendieta, J. J McArthur

Abstract:

Governments are playing an increasingly active role in reducing carbon emissions, and a key strategy has been the introduction of mandatory energy disclosure policies. These policies have resulted in a significant amount of publicly available data, providing researchers with a unique opportunity to develop location-specific energy and carbon emission benchmarks from this data set, which can then be used to develop building archetypes and used to inform urban energy models. This study presents the development of such a benchmark using the public reporting data. The data from Ontario’s Ministry of Energy for Post-Secondary Educational Institutions are being used to develop a series of building archetype dynamic building loads and energy benchmarks to fill a gap in the currently available building database. This paper presents the development of a benchmark for college and university residences within ASHRAE climate zone 6 areas in Ontario using the mandatory disclosure energy and greenhouse gas emissions data. The methodology presented includes data cleaning, statistical analysis, and benchmark development, and lessons learned from this investigation are presented and discussed to inform the development of future energy benchmarks from this larger data set. The key findings from this initial benchmarking study are: (1) the importance of careful data screening and outlier identification to develop a valid dataset; (2) the key features used to develop a model of the data are building age, size, and occupancy schedules and these can be used to estimate energy consumption; and (3) policy changes affecting the primary energy generation significantly affected greenhouse gas emissions, and consideration of these factors was critical to evaluate the validity of the reported data.

Keywords: Building archetypes, data analysis, energy benchmarks, GHG emissions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1024
7202 Making Data Structures and Algorithms more Understandable by Programming Sudoku the Human Way

Authors: Roelien Goede

Abstract:

Data Structures and Algorithms is a module in most Computer Science or Information Technology curricula. It is one of the modules most students identify as being difficult. This paper demonstrates how programming a solution for Sudoku can make abstract concepts more concrete. The paper relates concepts of a typical Data Structures and Algorithms module to a step by step solution for Sudoku in a human type as opposed to a computer oriented solution.

Keywords: Data Structures, Algorithms, Sudoku, ObjectOriented Programming, Programming Teaching, Education.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3097