Search results for: Unsupervised Clustering
69 Semi-automatic Construction of Ontology-based CBR System for Knowledge Integration
Authors: Junjie Gao, Guishi Deng
Abstract:
In order to integrate knowledge in heterogeneous case-based reasoning (CBR) systems, ontology-based CBR system has become a hot topic. To solve the facing problems of ontology-based CBR system, for example, its architecture is nonstandard, reusing knowledge in legacy CBR is deficient, ontology construction is difficult, etc, we propose a novel approach for semi-automatically construct ontology-based CBR system whose architecture is based on two-layer ontology. Domain knowledge implied in legacy case bases can be mapped from relational database schema and knowledge items to relevant OWL local ontology automatically by a mapping algorithm with low time-complexity. By concept clustering based on formal concept analysis, computing concept equation measure and concept inclusion measure, some suggestions about enriching or amending concept hierarchy of OWL local ontologies are made automatically that can aid designers to achieve semi-automatic construction of OWL domain ontology. Validation of the approach is done by an application example.Keywords: OWL ontology, Case-based Reasoning, FormalConcept Analysis, Knowledge Integration
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 201168 Evaluation of the ANN Based Nonlinear System Models in the MSE and CRLB Senses
Authors: M.V Rajesh, Archana R, A Unnikrishnan, R Gopikakumari, Jeevamma Jacob
Abstract:
The System Identification problem looks for a suitably parameterized model, representing a given process. The parameters of the model are adjusted to optimize a performance function based on error between the given process output and identified process output. The linear system identification field is well established with many classical approaches whereas most of those methods cannot be applied for nonlinear systems. The problem becomes tougher if the system is completely unknown with only the output time series is available. It has been reported that the capability of Artificial Neural Network to approximate all linear and nonlinear input-output maps makes it predominantly suitable for the identification of nonlinear systems, where only the output time series is available. [1][2][4][5]. The work reported here is an attempt to implement few of the well known algorithms in the context of modeling of nonlinear systems, and to make a performance comparison to establish the relative merits and demerits.Keywords: Multilayer neural networks, Radial Basis Functions, Clustering algorithm, Back Propagation training, Extended Kalmanfiltering, Mean Square Error, Nonlinear Modeling, Cramer RaoLower Bound.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 164667 A Geospatial Consumer Marketing Campaign Optimization Strategy: Case of Fuzzy Approach in Nigeria Mobile Market
Authors: Adeolu O. Dairo
Abstract:
Getting the consumer marketing strategy right is a crucial and complex task for firms with a large customer base such as mobile operators in a competitive mobile market. While empirical studies have made efforts to identify key constructs, no geospatial model has been developed to comprehensively assess the viability and interdependency of ground realities regarding the customer, competition, channel and the network quality of mobile operators. With this research, a geo-analytic framework is proposed for strategy formulation and allocation for mobile operators. Firstly, a fuzzy analytic network using a self-organizing feature map clustering technique based on inputs from managers and literature, which depicts the interrelationships amongst ground realities is developed. The model is tested with a mobile operator in the Nigeria mobile market. As a result, a customer-centric geospatial and visualization solution is developed. This provides a consolidated and integrated insight that serves as a transparent, logical and practical guide for strategic, tactical and operational decision making.
Keywords: Geospatial, geo-analytics, self-organizing map, customer-centric.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 81366 A Study on Early Prediction of Fault Proneness in Software Modules using Genetic Algorithm
Authors: Parvinder S. Sandhu, Sunil Khullar, Satpreet Singh, Simranjit K. Bains, Manpreet Kaur, Gurvinder Singh
Abstract:
Fault-proneness of a software module is the probability that the module contains faults. To predict faultproneness of modules different techniques have been proposed which includes statistical methods, machine learning techniques, neural network techniques and clustering techniques. The aim of proposed study is to explore whether metrics available in the early lifecycle (i.e. requirement metrics), metrics available in the late lifecycle (i.e. code metrics) and metrics available in the early lifecycle (i.e. requirement metrics) combined with metrics available in the late lifecycle (i.e. code metrics) can be used to identify fault prone modules using Genetic Algorithm technique. This approach has been tested with real time defect C Programming language datasets of NASA software projects. The results show that the fusion of requirement and code metric is the best prediction model for detecting the faults as compared with commonly used code based model.Keywords: Genetic Algorithm, Fault Proneness, Software Faultand Software Quality.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 198465 Named Entity Recognition using Support Vector Machine: A Language Independent Approach
Authors: Asif Ekbal, Sivaji Bandyopadhyay
Abstract:
Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.
Keywords: Named Entity (NE), Named Entity Recognition (NER), Support Vector Machine (SVM), Bengali, Hindi.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 340364 On the Performance of Information Criteria in Latent Segment Models
Authors: Jaime R. S. Fonseca
Abstract:
Nevertheless the widespread application of finite mixture models in segmentation, finite mixture model selection is still an important issue. In fact, the selection of an adequate number of segments is a key issue in deriving latent segments structures and it is desirable that the selection criteria used for this end are effective. In order to select among several information criteria, which may support the selection of the correct number of segments we conduct a simulation study. In particular, this study is intended to determine which information criteria are more appropriate for mixture model selection when considering data sets with only categorical segmentation base variables. The generation of mixtures of multinomial data supports the proposed analysis. As a result, we establish a relationship between the level of measurement of segmentation variables and some (eleven) information criteria-s performance. The criterion AIC3 shows better performance (it indicates the correct number of the simulated segments- structure more often) when referring to mixtures of multinomial segmentation base variables.Keywords: Quantitative Methods, Multivariate Data Analysis, Clustering, Finite Mixture Models, Information Theoretical Criteria, Simulation experiments.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 151963 A Fuzzy Approach to Liver Tumor Segmentation with Zernike Moments
Authors: Abder-Rahman Ali, Antoine Vacavant, Manuel Grand-Brochier, Adélaïde Albouy-Kissi, Jean-Yves Boire
Abstract:
In this paper, we present a new segmentation approach for liver lesions in regions of interest within MRI (Magnetic Resonance Imaging). This approach, based on a two-cluster Fuzzy CMeans methodology, considers the parameter variable compactness to handle uncertainty. Fine boundaries are detected by a local recursive merging of ambiguous pixels with a sequential forward floating selection with Zernike moments. The method has been tested on both synthetic and real images. When applied on synthetic images, the proposed approach provides good performance, segmentations obtained are accurate, their shape is consistent with the ground truth, and the extracted information is reliable. The results obtained on MR images confirm such observations. Our approach allows, even for difficult cases of MR images, to extract a segmentation with good performance in terms of accuracy and shape, which implies that the geometry of the tumor is preserved for further clinical activities (such as automatic extraction of pharmaco-kinetics properties, lesion characterization, etc.).Keywords: Defuzzification, floating search, fuzzy clustering, Zernike moments.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 205062 A Novel Modified Adaptive Fuzzy Inference Engine and Its Application to Pattern Classification
Authors: J. Hossen, A. Rahman, K. Samsudin, F. Rokhani, S. Sayeed, R. Hasan
Abstract:
The Neuro-Fuzzy hybridization scheme has become of research interest in pattern classification over the past decade. The present paper proposes a novel Modified Adaptive Fuzzy Inference Engine (MAFIE) for pattern classification. A modified Apriori algorithm technique is utilized to reduce a minimal set of decision rules based on input output data sets. A TSK type fuzzy inference system is constructed by the automatic generation of membership functions and rules by the fuzzy c-means clustering and Apriori algorithm technique, respectively. The generated adaptive fuzzy inference engine is adjusted by the least-squares fit and a conjugate gradient descent algorithm towards better performance with a minimal set of rules. The proposed MAFIE is able to reduce the number of rules which increases exponentially when more input variables are involved. The performance of the proposed MAFIE is compared with other existing applications of pattern classification schemes using Fisher-s Iris and Wisconsin breast cancer data sets and shown to be very competitive.Keywords: Apriori algorithm, Fuzzy C-means, MAFIE, TSK
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 193161 Automatic Reusability Appraisal of Software Components using Neuro-fuzzy Approach
Authors: Parvinder S. Sandhu, Hardeep Singh
Abstract:
Automatic reusability appraisal could be helpful in evaluating the quality of developed or developing reusable software components and in identification of reusable components from existing legacy systems; that can save cost of developing the software from scratch. But the issue of how to identify reusable components from existing systems has remained relatively unexplored. In this paper, we have mentioned two-tier approach by studying the structural attributes as well as usability or relevancy of the component to a particular domain. Latent semantic analysis is used for the feature vector representation of various software domains. It exploits the fact that FeatureVector codes can be seen as documents containing terms -the idenifiers present in the components- and so text modeling methods that capture co-occurrence information in low-dimensional spaces can be used. Further, we devised Neuro- Fuzzy hybrid Inference System, which takes structural metric values as input and calculates the reusability of the software component. Decision tree algorithm is used to decide initial set of fuzzy rules for the Neuro-fuzzy system. The results obtained are convincing enough to propose the system for economical identification and retrieval of reusable software components.Keywords: Clustering, ID3, LSA, Neuro-fuzzy System, SVD
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 166260 Advanced Neural Network Learning Applied to Pulping Modeling
Authors: Z. Zainuddin, W. D. Wan Rosli, R. Lanouette, S. Sathasivam
Abstract:
This paper reports work done to improve the modeling of complex processes when only small experimental data sets are available. Neural networks are used to capture the nonlinear underlying phenomena contained in the data set and to partly eliminate the burden of having to specify completely the structure of the model. Two different types of neural networks were used for the application of pulping problem. A three layer feed forward neural networks, using the Preconditioned Conjugate Gradient (PCG) methods were used in this investigation. Preconditioning is a method to improve convergence by lowering the condition number and increasing the eigenvalues clustering. The idea is to solve the modified odified problem M-1 Ax= M-1b where M is a positive-definite preconditioner that is closely related to A. We mainly focused on Preconditioned Conjugate Gradient- based training methods which originated from optimization theory, namely Preconditioned Conjugate Gradient with Fletcher-Reeves Update (PCGF), Preconditioned Conjugate Gradient with Polak-Ribiere Update (PCGP) and Preconditioned Conjugate Gradient with Powell-Beale Restarts (PCGB). The behavior of the PCG methods in the simulations proved to be robust against phenomenon such as oscillations due to large step size.
Keywords: Convergence, pulping modeling, neural networks, preconditioned conjugate gradient.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 140859 Secure and Efficient Transmission of Aggregated Data for Mobile Wireless Sensor Networks
Authors: A. Krishna Veni, R.Geetha
Abstract:
Wireless Sensor Networks (WSNs) are suitable for many scenarios in the real world. The retrieval of data is made efficient by the data aggregation techniques. Many techniques for the data aggregation are offered and most of the existing schemes are not energy efficient and secure. However, the existing techniques use the traditional clustering approach where there is a delay during the packet transmission since there is no proper scheduling. The presented system uses the Velocity Energy-efficient and Link-aware Cluster-Tree (VELCT) scheme in which there is a Data Collection Tree (DCT) which improves the lifetime of the network. The VELCT scheme and the construction of DCT reduce the delay and traffic. The network lifetime can be increased by avoiding the frequent change in cluster topology. Secure and Efficient Transmission of Aggregated data (SETA) improves the security of the data transmission via the trust value of the nodes prior the aggregation of data. Since SETA considers the data only from the trustworthy nodes for aggregation, it is more secure in transmitting the data thereby improving the accuracy of aggregated data.
Keywords: Aggregation, lifetime, network security, wireless sensor network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 121758 Objective Assessment of Psoriasis Lesion Thickness for PASI Scoring using 3D Digital Imaging
Authors: M.H. Ahmad Fadzil, Hurriyatul Fitriyah, Esa Prakasa, Hermawan Nugroho, S.H. Hussein, Azura Mohd. Affandi
Abstract:
Psoriasis is a chronic inflammatory skin condition which affects 2-3% of population around the world. Psoriasis Area and Severity Index (PASI) is a gold standard to assess psoriasis severity as well as the treatment efficacy. Although a gold standard, PASI is rarely used because it is tedious and complex. In practice, PASI score is determined subjectively by dermatologists, therefore inter and intra variations of assessment are possible to happen even among expert dermatologists. This research develops an algorithm to assess psoriasis lesion for PASI scoring objectively. Focus of this research is thickness assessment as one of PASI four parameters beside area, erythema and scaliness. Psoriasis lesion thickness is measured by averaging the total elevation from lesion base to lesion surface. Thickness values of 122 3D images taken from 39 patients are grouped into 4 PASI thickness score using K-means clustering. Validation on lesion base construction is performed using twelve body curvature models and show good result with coefficient of determinant (R2) is equal to 1.Keywords: 3D digital imaging, base construction, PASI, psoriasis lesion thickness.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 245457 Gene Expression Signature for Classification of Metastasis Positive and Negative Oral Cancer in Homosapiens
Authors: A. Shukla, A. Tarsauliya, R. Tiwari, S. Sharma
Abstract:
Cancer classification to their corresponding cohorts has been key area of research in bioinformatics aiming better prognosis of the disease. High dimensionality of gene data has been makes it a complex task and requires significance data identification technique in order to reducing the dimensionality and identification of significant information. In this paper, we have proposed a novel approach for classification of oral cancer into metastasis positive and negative patients. We have used significance analysis of microarrays (SAM) for identifying significant genes which constitutes gene signature. 3 different gene signatures were identified using SAM from 3 different combination of training datasets and their classification accuracy was calculated on corresponding testing datasets using k-Nearest Neighbour (kNN), Fuzzy C-Means Clustering (FCM), Support Vector Machine (SVM) and Backpropagation Neural Network (BPNN). A final gene signature of only 9 genes was obtained from above 3 individual gene signatures. 9 gene signature-s classification capability was compared using same classifiers on same testing datasets. Results obtained from experimentation shows that 9 gene signature classified all samples in testing dataset accurately while individual genes could not classify all accurately.
Keywords: Cancer, Gene Signature, SAM, Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 207656 Modeling of Pulping of Sugar Maple Using Advanced Neural Network Learning
Authors: W. D. Wan Rosli, Z. Zainuddin, R. Lanouette, S. Sathasivam
Abstract:
This paper reports work done to improve the modeling of complex processes when only small experimental data sets are available. Neural networks are used to capture the nonlinear underlying phenomena contained in the data set and to partly eliminate the burden of having to specify completely the structure of the model. Two different types of neural networks were used for the application of Pulping of Sugar Maple problem. A three layer feed forward neural networks, using the Preconditioned Conjugate Gradient (PCG) methods were used in this investigation. Preconditioning is a method to improve convergence by lowering the condition number and increasing the eigenvalues clustering. The idea is to solve the modified problem where M is a positive-definite preconditioner that is closely related to A. We mainly focused on Preconditioned Conjugate Gradient- based training methods which originated from optimization theory, namely Preconditioned Conjugate Gradient with Fletcher-Reeves Update (PCGF), Preconditioned Conjugate Gradient with Polak-Ribiere Update (PCGP) and Preconditioned Conjugate Gradient with Powell-Beale Restarts (PCGB). The behavior of the PCG methods in the simulations proved to be robust against phenomenon such as oscillations due to large step size.
Keywords: Convergence, Modeling, Neural Networks, Preconditioned Conjugate Gradient.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 168555 A Design Framework for Event Recommendation in Novice Low-Literacy Communities
Authors: Yimeng Deng, Klarissa T.T. Chang
Abstract:
The proliferation of user-generated content (UGC) results in huge opportunities to explore event patterns. However, existing event recommendation systems primarily focus on advanced information technology users. Little work has been done to address novice and low-literacy users. The next billion users providing and consuming UGC are likely to include communities from developing countries who are ready to use affordable technologies for subsistence goals. Therefore, we propose a design framework for providing event recommendations to address the needs of such users. Grounded in information integration theory (IIT), our framework advocates that effective event recommendation is supported by systems capable of (1) reliable information gathering through structured user input, (2) accurate sense making through spatial-temporal analytics, and (3) intuitive information dissemination through interactive visualization techniques. A mobile pest management application is developed as an instantiation of the design framework. Our preliminary study suggests a set of design principles for novice and low-literacy users.
Keywords: Event recommendation, iconic interface, information integration, spatial-temporal clustering, user-generated content, visualization techniques
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 165654 Logistic Model Tree and Expectation-Maximization for Pollen Recognition and Grouping
Authors: Endrick Barnacin, Jean-Luc Henry, Jack Molinié, Jimmy Nagau, Hélène Delatte, Gérard Lebreton
Abstract:
Palynology is a field of interest for many disciplines. It has multiple applications such as chronological dating, climatology, allergy treatment, and even honey characterization. Unfortunately, the analysis of a pollen slide is a complicated and time-consuming task that requires the intervention of experts in the field, which is becoming increasingly rare due to economic and social conditions. So, the automation of this task is a necessity. Pollen slides analysis is mainly a visual process as it is carried out with the naked eye. That is the reason why a primary method to automate palynology is the use of digital image processing. This method presents the lowest cost and has relatively good accuracy in pollen retrieval. In this work, we propose a system combining recognition and grouping of pollen. It consists of using a Logistic Model Tree to classify pollen already known by the proposed system while detecting any unknown species. Then, the unknown pollen species are divided using a cluster-based approach. Success rates for the recognition of known species have been achieved, and automated clustering seems to be a promising approach.
Keywords: Pollen recognition, logistic model tree, expectation-maximization, local binary pattern.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 77053 Plant Varieties Selection System
Authors: Kitti Koonsanit, Chuleerat Jaruskulchai, Poonsak Miphokasap, Apisit Eiumnoh
Abstract:
In the end of the day, meteorological data and environmental data becomes widely used such as plant varieties selection system. Variety plant selection for planted area is of almost importance for all crops, including varieties of sugarcane. Since sugarcane have many varieties. Variety plant non selection for planting may not be adapted to the climate or soil conditions for planted area. Poor growth, bloom drop, poor fruit, and low price are to be from varieties which were not recommended for those planted area. This paper presents plant varieties selection system for planted areas in Thailand from meteorological data and environmental data by the use of decision tree techniques. With this software developed as an environmental data analysis tool, it can analyze resulting easier and faster. Our software is a front end of WEKA that provides fundamental data mining functions such as classify, clustering, and analysis functions. It also supports pre-processing, analysis, and decision tree output with exporting result. After that, our software can export and display data result to Google maps API in order to display result and plot plant icons effectively.
Keywords: Plant varieties selection system, decision tree, expert recommendation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 179352 Extraction of Symbolic Rules from Artificial Neural Networks
Authors: S. M. Kamruzzaman, Md. Monirul Islam
Abstract:
Although backpropagation ANNs generally predict better than decision trees do for pattern classification problems, they are often regarded as black boxes, i.e., their predictions cannot be explained as those of decision trees. In many applications, it is desirable to extract knowledge from trained ANNs for the users to gain a better understanding of how the networks solve the problems. A new rule extraction algorithm, called rule extraction from artificial neural networks (REANN) is proposed and implemented to extract symbolic rules from ANNs. A standard three-layer feedforward ANN is the basis of the algorithm. A four-phase training algorithm is proposed for backpropagation learning. Explicitness of the extracted rules is supported by comparing them to the symbolic rules generated by other methods. Extracted rules are comparable with other methods in terms of number of rules, average number of conditions for a rule, and predictive accuracy. Extensive experimental studies on several benchmarks classification problems, such as breast cancer, iris, diabetes, and season classification problems, demonstrate the effectiveness of the proposed approach with good generalization ability.Keywords: Backpropagation, clustering algorithm, constructivealgorithm, continuous activation function, pruning algorithm, ruleextraction algorithm, symbolic rules.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 161651 Mining Correlated Bicluster from Web Usage Data Using Discrete Firefly Algorithm Based Biclustering Approach
Authors: K. Thangavel, R. Rathipriya
Abstract:
For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.
Keywords: Biclustering, Binary Particle Swarm Optimization, Discrete Firefly Algorithm, Firefly Algorithm, Usage profile Web usage mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 213350 A Monte Carlo Method to Data Stream Analysis
Authors: Kittisak Kerdprasop, Nittaya Kerdprasop, Pairote Sattayatham
Abstract:
Data stream analysis is the process of computing various summaries and derived values from large amounts of data which are continuously generated at a rapid rate. The nature of a stream does not allow a revisit on each data element. Furthermore, data processing must be fast to produce timely analysis results. These requirements impose constraints on the design of the algorithms to balance correctness against timely responses. Several techniques have been proposed over the past few years to address these challenges. These techniques can be categorized as either dataoriented or task-oriented. The data-oriented approach analyzes a subset of data or a smaller transformed representation, whereas taskoriented scheme solves the problem directly via approximation techniques. We propose a hybrid approach to tackle the data stream analysis problem. The data stream has been both statistically transformed to a smaller size and computationally approximated its characteristics. We adopt a Monte Carlo method in the approximation step. The data reduction has been performed horizontally and vertically through our EMR sampling method. The proposed method is analyzed by a series of experiments. We apply our algorithm on clustering and classification tasks to evaluate the utility of our approach.Keywords: Data Stream, Monte Carlo, Sampling, DensityEstimation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 141749 Grouping and Indexing Color Features for Efficient Image Retrieval
Authors: M. V. Sudhamani, C. R. Venugopal
Abstract:
Content-based Image Retrieval (CBIR) aims at searching image databases for specific images that are similar to a given query image based on matching of features derived from the image content. This paper focuses on a low-dimensional color based indexing technique for achieving efficient and effective retrieval performance. In our approach, the color features are extracted using the mean shift algorithm, a robust clustering technique. Then the cluster (region) mode is used as representative of the image in 3-D color space. The feature descriptor consists of the representative color of a region and is indexed using a spatial indexing method that uses *R -tree thus avoiding the high-dimensional indexing problems associated with the traditional color histogram. Alternatively, the images in the database are clustered based on region feature similarity using Euclidian distance. Only representative (centroids) features of these clusters are indexed using *R -tree thus improving the efficiency. For similarity retrieval, each representative color in the query image or region is used independently to find regions containing that color. The results of these methods are compared. A JAVA based query engine supporting query-by- example is built to retrieve images by color.
Keywords: Content-based, indexing, cluster, region.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 181148 Performance Evaluation of Energy Efficient Communication Protocol for Mobile Ad Hoc Networks
Authors: Toshihiko Sasama, Kentaro Kishida, Kazunori Sugahara, Hiroshi Masuyama
Abstract:
A mobile ad hoc network is a network of mobile nodes without any notion of centralized administration. In such a network, each mobile node behaves not only as a host which runs applications but also as a router to forward packets on behalf of others. Clustering has been applied to routing protocols to achieve efficient communications. A CH network expresses the connected relationship among cluster-heads. This paper discusses the methods for constructing a CH network, and produces the following results: (1) The required running costs of 3 traditional methods for constructing a CH network are not so different from each other in the static circumstance, or in the dynamic circumstance. Their running costs in the static circumstance do not differ from their costs in the dynamic circumstance. Meanwhile, although the routing costs required for the above 3 methods are not so different in the static circumstance, the costs are considerably different from each other in the dynamic circumstance. Their routing costs in the static circumstance are also very different from their costs in the dynamic circumstance, and the former is one tenths of the latter. The routing cost in the dynamic circumstance is mostly the cost for re-routing. (2) On the strength of the above results, we discuss new 2 methods regarding whether they are tolerable or not in the dynamic circumstance, that is, whether the times of re-routing are small or not. These new methods are revised methods that are based on the traditional methods. We recommended the method which produces the smallest routing cost in the dynamic circumstance, therefore producing the smallest total cost.Keywords: cluster, mobile ad hoc network, re-routing cost, simulation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 135047 Discovery of Human HMG-Coa Reductase Inhibitors Using Structure-Based Pharmacophore Modeling Combined with Molecular Dynamics Simulation Methodologies
Authors: Minky Son, Chanin Park, Ayoung Baek, Shalini John, Keun Woo Lee
Abstract:
3-hydroxy-3-methylglutaryl coenzyme A reductase (HMGR) catalyzes the conversion of HMG-CoA to mevalonate using NADPH and the enzyme is involved in rate-controlling step of mevalonate. Inhibition of HMGR is considered as effective way to lower cholesterol levels so it is drug target to treat hypercholesterolemia, major risk factor of cardiovascular disease. To discover novel HMGR inhibitor, we performed structure-based pharmacophore modeling combined with molecular dynamics (MD) simulation. Four HMGR inhibitors were used for MD simulation and representative structure of each simulation were selected by clustering analysis. Four structure-based pharmacophore models were generated using the representative structure. The generated models were validated used in virtual screening to find novel scaffolds for inhibiting HMGR. The screened compounds were filtered by applying drug-like properties and used in molecular docking. Finally, four hit compounds were obtained and these complexes were refined using energy minimization. These compounds might be potential leads to design novel HMGR inhibitor.
Keywords: Anti-hypercholesterolemia drug, HMGR inhibitor, Molecular dynamics simulation, Structure-based pharmacophore modeling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 194846 A Novel Nucleus-Based Classifier for Discrimination of Osteoclasts and Mesenchymal Precursor Cells in Mouse Bone Marrow Cultures
Authors: Andreas Heindl, Alexander K. Seewald, Martin Schepelmann, Radu Rogojanu, Giovanna Bises, Theresia Thalhammer, Isabella Ellinger
Abstract:
Bone remodeling occurs by the balanced action of bone resorbing osteoclasts (OC) and bone-building osteoblasts. Increased bone resorption by excessive OC activity contributes to malignant and non-malignant diseases including osteoporosis. To study OC differentiation and function, OC formed in in vitro cultures are currently counted manually, a tedious procedure which is prone to inter-observer differences. Aiming for an automated OC-quantification system, classification of OC and precursor cells was done on fluorescence microscope images based on the distinct appearance of fluorescent nuclei. Following ellipse fitting to nuclei, a combination of eight features enabled clustering of OC and precursor cell nuclei. After evaluating different machine-learning techniques, LOGREG achieved 74% correctly classified OC and precursor cell nuclei, outperforming human experts (best expert: 55%). In combination with the automated detection of total cell areas, this system allows to measure various cell parameters and most importantly to quantify proteins involved in osteoclastogenesis.Keywords: osteoclasts, machine learning, ellipse fitting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 191345 A General Framework for Knowledge Discovery Using High Performance Machine Learning Algorithms
Authors: S. Nandagopalan, N. Pradeep
Abstract:
The aim of this paper is to propose a general framework for storing, analyzing, and extracting knowledge from two-dimensional echocardiographic images, color Doppler images, non-medical images, and general data sets. A number of high performance data mining algorithms have been used to carry out this task. Our framework encompasses four layers namely physical storage, object identification, knowledge discovery, user level. Techniques such as active contour model to identify the cardiac chambers, pixel classification to segment the color Doppler echo image, universal model for image retrieval, Bayesian method for classification, parallel algorithms for image segmentation, etc., were employed. Using the feature vector database that have been efficiently constructed, one can perform various data mining tasks like clustering, classification, etc. with efficient algorithms along with image mining given a query image. All these facilities are included in the framework that is supported by state-of-the-art user interface (UI). The algorithms were tested with actual patient data and Coral image database and the results show that their performance is better than the results reported already.Keywords: Active Contour, Bayesian, Echocardiographic image, Feature vector.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 171344 Improved Computational Efficiency of Machine Learning Algorithms Based on Evaluation Metrics to Control the Spread of Coronavirus in the UK
Authors: Swathi Ganesan, Nalinda Somasiri, Rebecca Jeyavadhanam, Gayathri Karthick
Abstract:
The COVID-19 crisis presents a substantial and critical hazard to worldwide health. Since the occurrence of the disease in late January 2020 in the UK, the number of infected people confirmed to acquire the illness has increased tremendously across the country, and the number of individuals affected is undoubtedly considerably high. The purpose of this research is to figure out a predictive machine learning (ML) archetypal that could forecast the COVID-19 cases within the UK. This study concentrates on the statistical data collected from 31st January 2020 to 31st March 2021 in the United Kingdom. Information on total COVID-19 cases registered, new cases encountered on a daily basis, total death registered, and patients’ death per day due to Coronavirus is collected from World Health Organization (WHO). Data preprocessing is carried out to identify any missing values, outliers, or anomalies in the dataset. The data are split into 8:2 ratio for training and testing purposes to forecast future new COVID-19 cases. Support Vector Machine (SVM), Random Forest (RF), and linear regression (LR) algorithms are chosen to study the model performance in the prediction of new COVID-19 cases. From the evaluation metrics such as r-squared value and mean squared error, the statistical performance of the model in predicting the new COVID-19 cases is evaluated. RF outperformed the other two ML algorithms with a training accuracy of 99.47% and testing accuracy of 98.26% when n = 30. The mean square error obtained for RF is 4.05e11, which is lesser compared to the other predictive models used for this study. From the experimental analysis, RF algorithm can perform more effectively and efficiently in predicting the new COVID-19 cases, which could help the health sector to take relevant control measures for the spread of the virus.
Keywords: COVID-19, machine learning, supervised learning, unsupervised learning, linear regression, support vector machine, random forest.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17243 Dominating Set Algorithm and Trust Evaluation Scheme for Secured Cluster Formation and Data Transferring
Authors: Y. Harold Robinson, M. Rajaram, E. Golden Julie, S. Balaji
Abstract:
This paper describes the proficient way of choosing the cluster head based on dominating set algorithm in a wireless sensor network (WSN). The algorithm overcomes the energy deterioration problems by this selection process of cluster heads. Clustering algorithms such as LEACH, EEHC and HEED enhance scalability in WSNs. Dominating set algorithm keeps the first node alive longer than the other protocols previously used. As the dominating set of cluster heads are directly connected to each node, the energy of the network is saved by eliminating the intermediate nodes in WSN. Security and trust is pivotal in network messaging. Cluster head is secured with a unique key. The member can only connect with the cluster head if and only if they are secured too. The secured trust model provides security for data transmission in the dominated set network with the group key. The concept can be extended to add a mobile sink for each or for no of clusters to transmit data or messages between cluster heads and to base station. Data security id preferably high and data loss can be prevented. The simulation demonstrates the concept of choosing cluster heads by dominating set algorithm and trust evaluation using DSTE. The research done is rationalized.
Keywords: Wireless Sensor Networks, LEECH, EEHC, HEED, DSTE.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 140542 Validation and Selection between Machine Learning Technique and Traditional Methods to Reduce Bullwhip Effects: a Data Mining Approach
Authors: Hamid R. S. Mojaveri, Seyed S. Mousavi, Mojtaba Heydar, Ahmad Aminian
Abstract:
The aim of this paper is to present a methodology in three steps to forecast supply chain demand. In first step, various data mining techniques are applied in order to prepare data for entering into forecasting models. In second step, the modeling step, an artificial neural network and support vector machine is presented after defining Mean Absolute Percentage Error index for measuring error. The structure of artificial neural network is selected based on previous researchers' results and in this article the accuracy of network is increased by using sensitivity analysis. The best forecast for classical forecasting methods (Moving Average, Exponential Smoothing, and Exponential Smoothing with Trend) is resulted based on prepared data and this forecast is compared with result of support vector machine and proposed artificial neural network. The results show that artificial neural network can forecast more precisely in comparison with other methods. Finally, forecasting methods' stability is analyzed by using raw data and even the effectiveness of clustering analysis is measured.Keywords: Artificial Neural Networks (ANN), bullwhip effect, demand forecasting, Support Vector Machine (SVM).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 201041 Diversity Analysis of a Quinoa (Chenopodium quinoa Willd.) Germplasm during Two Seasons
Authors: M. Mhada, E. N. Jellen, S. E. Jacobsen, O. Benlhabib
Abstract:
The present work has been carried out to evaluate the diversity of a collection of 78 quinoa accessions developed through recurrent selection from Andean germplasm introduced to Morocco in the winter of 2000. Twenty-three quantitative and qualitative characters were used for the evaluation of genetic diversity and the relationship between the accessions, and also for the establishment of a core collection in Morocco. Important variation was found among the accessions in terms of plant morphology and growth behavior. Data analysis showed positive correlation of the plant height, the plant fresh and the dry weight with the grain yield, while days to flowering was found to be negatively correlated with grain yield. The first four PCs contributed 74.76% of the variability; the first PC showed significant variation with 42.86% of the total variation, PC2 with 15.37%, PC3 with 9.05% and PC4 contributed 7.49% of the total variation. Plant size, days to grain filling and days to maturity are correlated to the PC1; and seed size, inflorescence density and mildew resistance are correlated to the PC2. Hierarchical cluster analysis rearranged the 78 quinoa accessions into four main groups and ten sub-clusters. Clustering was found in associations with days to maturity and also with plant size and seed-size traits.
Keywords: Character association, Chenopodium quinoa, Diversity analysis, Morphotypic cluster, Multivariate analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 258640 Using the Combined Model of PROMETHEE and Fuzzy Analytic Network Process for Determining Question Weights in Scientific Exams through Data Mining Approach
Authors: Hassan Haleh, Amin Ghaffari, Parisa Farahpour
Abstract:
Need for an appropriate system of evaluating students- educational developments is a key problem to achieve the predefined educational goals. Intensity of the related papers in the last years; that tries to proof or disproof the necessity and adequacy of the students assessment; is the corroborator of this matter. Some of these studies tried to increase the precision of determining question weights in scientific examinations. But in all of them there has been an attempt to adjust the initial question weights while the accuracy and precision of those initial question weights are still under question. Thus In order to increase the precision of the assessment process of students- educational development, the present study tries to propose a new method for determining the initial question weights by considering the factors of questions like: difficulty, importance and complexity; and implementing a combined method of PROMETHEE and fuzzy analytic network process using a data mining approach to improve the model-s inputs. The result of the implemented case study proves the development of performance and precision of the proposed model.Keywords: Assessing students, Analytic network process, Clustering, Data mining, Fuzzy sets, Multi-criteria decision making, and Preference function.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1581