Search results for: sequence mining
608 Cluster Algorithm for Genetic Diversity
Authors: Manpreet Singh, Keerat Kaur, Bhavdeep Singh
Abstract:
With the hardware technology advancing, the cost of storing is decreasing. Thus there is an urgent need for new techniques and tools that can intelligently and automatically assist us in transferring this data into useful knowledge. Different techniques of data mining are developed which are helpful for handling these large size databases [7]. Data mining is also finding its role in the field of biotechnology. Pedigree means the associated ancestry of a crop variety. Genetic diversity is the variation in the genetic composition of individuals within or among species. Genetic diversity depends upon the pedigree information of the varieties. Parents at lower hierarchic levels have more weightage for predicting genetic diversity as compared to the upper hierarchic levels. The weightage decreases as the level increases. For crossbreeding, the two varieties should be more and more genetically diverse so as to incorporate the useful characters of the two varieties in the newly developed variety. This paper discusses the searching and analyzing of different possible pairs of varieties selected on the basis of morphological characters, Climatic conditions and Nutrients so as to obtain the most optimal pair that can produce the required crossbreed variety. An algorithm was developed to determine the genetic diversity between the selected wheat varieties. Cluster analysis technique is used for retrieving the results.Keywords: Genetic diversity, pedigree, nutrients.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1802607 Increasing the Capacity of Plant Bottlenecks by Using of Improving the Ratio of Mean Time between Failures to Mean Time to Repair
Authors: Jalal Soleimannejad, Mohammad Asadizeidabadi, Mahmoud Koorki, Mojtaba Azarpira
Abstract:
A significant percentage of production costs is the maintenance costs, and analysis of maintenance costs could to achieve greater productivity and competitiveness. With this is mind, the maintenance of machines and installations is considered as an essential part of organizational functions and applying effective strategies causes significant added value in manufacturing activities. Organizations are trying to achieve performance levels on a global scale with emphasis on creating competitive advantage by different methods consist of RCM (Reliability-Center-Maintenance), TPM (Total Productivity Maintenance) etc. In this study, increasing the capacity of Concentration Plant of Golgohar Iron Ore Mining & Industrial Company (GEG) was examined by using of reliability and maintainability analyses. The results of this research showed that instead of increasing the number of machines (in order to solve the bottleneck problems), the improving of reliability and maintainability would solve bottleneck problems in the best way. It should be mention that in the abovementioned study, the data set of Concentration Plant of GEG as a case study, was applied and analyzed.
Keywords: Bottleneck, Golgohar Iron Ore Mining and Industrial Company, maintainability, maintenance costs, reliability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 956606 Antibody Reactivity of Synthetic Peptides Belonging to Proteins Encoded by Genes Located in Mycobacterium tuberculosis-Specific Genomic Regions of Differences
Authors: Abu Salim Mustafa
Abstract:
The comparisons of mycobacterial genomes have identified several Mycobacterium tuberculosis-specific genomic regions that are absent in other mycobacteria and are known as regions of differences. Due to M. tuberculosis-specificity, the peptides encoded by these regions could be useful in the specific diagnosis of tuberculosis. To explore this possibility, overlapping synthetic peptides corresponding to 39 proteins predicted to be encoded by genes present in regions of differences were tested for antibody-reactivity with sera from tuberculosis patients and healthy subjects. The results identified four immunodominant peptides corresponding to four different proteins, with three of the peptides showing significantly stronger antibody reactivity and rate of positivity with sera from tuberculosis patients than healthy subjects. The fourth peptide was recognized equally well by the sera of tuberculosis patients as well as healthy subjects. Predication of antibody epitopes by bioinformatics analyses using ABCpred server predicted multiple linear epitopes in each peptide. Furthermore, peptide sequence analysis for sequence identity using BLAST suggested M. tuberculosis-specificity for the three peptides that had preferential reactivity with sera from tuberculosis patients, but the peptide with equal reactivity with sera of TB patients and healthy subjects showed significant identity with sequences present in nob-tuberculous mycobacteria. The three identified M. tuberculosis-specific immunodominant peptides may be useful in the serological diagnosis of tuberculosis.
Keywords: Genomic regions of differences, Mycobacterium tuberculosis, peptides, serodiagnosis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 930605 Educational Data Mining: The Case of Department of Mathematics and Computing in the Period 2009-2018
Authors: M. Sitoe, O. Zacarias
Abstract:
University education is influenced by several factors that range from the adoption of strategies to strengthen the whole process to the academic performance improvement of the students themselves. This work uses data mining techniques to develop a predictive model to identify students with a tendency to evasion and retention. To this end, a database of real students’ data from the Department of University Admission (DAU) and the Department of Mathematics and Informatics (DMI) was used. The data comprised 388 undergraduate students admitted in the years 2009 to 2014. The Weka tool was used for model building, using three different techniques, namely: K-nearest neighbor, random forest, and logistic regression. To allow for training on multiple train-test splits, a cross-validation approach was employed with a varying number of folds. To reduce bias variance and improve the performance of the models, ensemble methods of Bagging and Stacking were used. After comparing the results obtained by the three classifiers, Logistic Regression using Bagging with seven folds obtained the best performance, showing results above 90% in all evaluated metrics: accuracy, rate of true positives, and precision. Retention is the most common tendency.
Keywords: Evasion and retention, cross validation, bagging, stacking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 119604 Effect of Scene Changing on Image Sequences Compression Using Zero Tree Coding
Authors: Mbainaibeye Jérôme, Noureddine Ellouze
Abstract:
We study in this paper the effect of the scene changing on image sequences coding system using Embedded Zerotree Wavelet (EZW). The scene changing considered here is the full motion which may occurs. A special image sequence is generated where the scene changing occurs randomly. Two scenarios are considered: In the first scenario, the system must provide the reconstruction quality as best as possible by the management of the bit rate (BR) while the scene changing occurs. In the second scenario, the system must keep the bit rate as constant as possible by the management of the reconstruction quality. The first scenario may be motivated by the availability of a large band pass transmission channel where an increase of the bit rate may be possible to keep the reconstruction quality up to a given threshold. The second scenario may be concerned by the narrow band pass transmission channel where an increase of the bit rate is not possible. In this last case, applications for which the reconstruction quality is not a constraint may be considered. The simulations are performed with five scales wavelet decomposition using the 9/7-tap filter bank biorthogonal wavelet. The entropy coding is performed using a specific defined binary code book and EZW algorithm. Experimental results are presented and compared to LEAD H263 EVAL. It is shown that if the reconstruction quality is the constraint, the system increases the bit rate to obtain the required quality. In the case where the bit rate must be constant, the system is unable to provide the required quality if the scene change occurs; however, the system is able to improve the quality while the scene changing disappears.Keywords: Image Sequence Compression, Wavelet Transform, Scene Changing, Zero Tree, Bit Rate, Quality.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1356603 An Improved K-Means Algorithm for Gene Expression Data Clustering
Authors: Billel Kenidra, Mohamed Benmohammed
Abstract:
Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.
Keywords: Microarray data mining, biological pattern recognition, partitional clustering, k-means algorithm, centroid initialization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1284602 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency
Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami
Abstract:
Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.
Keywords: Clustering, k-means, categorical datasets, pattern recognition, unsupervised learning, knowledge discovery.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3545601 Exploring the Correlation between Population Distribution and Urban Heat Island under Urban Data: Taking Shenzhen Urban Heat Island as an Example
Authors: Wang Yang
Abstract:
Shenzhen is a modern city of China's reform and opening-up policy, the development of urban morphology has been established on the administration of the Chinese government. This city`s planning paradigm is primarily affected by the spatial structure and human behavior. The subjective urban agglomeration center is divided into several groups and centers. In comparisons of this effect, the city development law has better to be neglected. With the continuous development of the internet, extensive data technology has been introduced in China. Data mining and data analysis has become important tools in municipal research. Data mining has been utilized to improve data cleaning such as receiving business data, traffic data and population data. Prior to data mining, government data were collected by traditional means, then were analyzed using city-relationship research, delaying the timeliness of urban development, especially for the contemporary city. Data update speed is very fast and based on the Internet. The city's point of interest (POI) in the excavation serves as data source affecting the city design, while satellite remote sensing is used as a reference object, city analysis is conducted in both directions, the administrative paradigm of government is broken and urban research is restored. Therefore, the use of data mining in urban analysis is very important. The satellite remote sensing data of the Shenzhen city in July 2018 were measured by the satellite Modis sensor and can be utilized to perform land surface temperature inversion, and analyze city heat island distribution of Shenzhen. This article acquired and classified the data from Shenzhen by using Data crawler technology. Data of Shenzhen heat island and interest points were simulated and analyzed in the GIS platform to discover the main features of functional equivalent distribution influence. Shenzhen is located in the east-west area of China. The city’s main streets are also determined according to the direction of city development. Therefore, it is determined that the functional area of the city is also distributed in the east-west direction. The urban heat island can express the heat map according to the functional urban area. Regional POI has correspondence. The research result clearly explains that the distribution of the urban heat island and the distribution of urban POIs are one-to-one correspondence. Urban heat island is primarily influenced by the properties of the underlying surface, avoiding the impact of urban climate. Using urban POIs as analysis object, the distribution of municipal POIs and population aggregation are closely connected, so that the distribution of the population corresponded with the distribution of the urban heat island.
Keywords: POI, satellite remote sensing, the population distribution, urban heat island thermal map.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 928600 A Preference-Based Multi-Agent Data Mining Framework for Social Network Service Users' Decision Making
Authors: Ileladewa Adeoye Abiodun, Cheng Wai Khuen
Abstract:
Multi-Agent Systems (MAS) emerged in the pursuit to improve our standard of living, and hence can manifest complex human behaviors such as communication, decision making, negotiation and self-organization. The Social Network Services (SNSs) have attracted millions of users, many of whom have integrated these sites into their daily practices. The domains of MAS and SNS have lots of similarities such as architecture, features and functions. Exploring social network users- behavior through multiagent model is therefore our research focus, in order to generate more accurate and meaningful information to SNS users. An application of MAS is the e-Auction and e-Rental services of the Universiti Cyber AgenT(UniCAT), a Social Network for students in Universiti Tunku Abdul Rahman (UTAR), Kampar, Malaysia, built around the Belief- Desire-Intention (BDI) model. However, in spite of the various advantages of the BDI model, it has also been discovered to have some shortcomings. This paper therefore proposes a multi-agent framework utilizing a modified BDI model- Belief-Desire-Intention in Dynamic and Uncertain Situations (BDIDUS), using UniCAT system as a case study.
Keywords: Distributed Data Mining, Multi-Agent Systems, Preference-Based, SNS.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1502599 A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods
Authors: S. Sarumathi, N. Shanthi, M. Sharmila
Abstract:
Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.
Keywords: Clustering, Cluster Ensemble methods, Co-association matrix, Consensus function, Median partition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2603598 Validation and Selection between Machine Learning Technique and Traditional Methods to Reduce Bullwhip Effects: a Data Mining Approach
Authors: Hamid R. S. Mojaveri, Seyed S. Mousavi, Mojtaba Heydar, Ahmad Aminian
Abstract:
The aim of this paper is to present a methodology in three steps to forecast supply chain demand. In first step, various data mining techniques are applied in order to prepare data for entering into forecasting models. In second step, the modeling step, an artificial neural network and support vector machine is presented after defining Mean Absolute Percentage Error index for measuring error. The structure of artificial neural network is selected based on previous researchers' results and in this article the accuracy of network is increased by using sensitivity analysis. The best forecast for classical forecasting methods (Moving Average, Exponential Smoothing, and Exponential Smoothing with Trend) is resulted based on prepared data and this forecast is compared with result of support vector machine and proposed artificial neural network. The results show that artificial neural network can forecast more precisely in comparison with other methods. Finally, forecasting methods' stability is analyzed by using raw data and even the effectiveness of clustering analysis is measured.Keywords: Artificial Neural Networks (ANN), bullwhip effect, demand forecasting, Support Vector Machine (SVM).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2010597 Overloading Scheme for Cellular DS-CDMA using Quasi-Orthogonal Sequences and Iterative Interference Cancellation Receiver
Authors: Preetam Kumar, Saswat Chakrabarti
Abstract:
Overloading is a technique to accommodate more number of users than the spreading factor N. This is a bandwidth efficient scheme to increase the number users in a fixed bandwidth. One of the efficient schemes to overload a CDMA system is to use two sets of orthogonal signal waveforms (O/O). The first set is assigned to the N users and the second set is assigned to the additional M users. An iterative interference cancellation technique is used to cancel interference between the two sets of users. In this paper, the performance of an overloading scheme in which the first N users are assigned Walsh-Hadamard orthogonal codes and extra users are assigned the same WH codes but overlaid by a fixed (quasi) bent sequence [11] is evaluated. This particular scheme is called Quasi- Orthogonal Sequence (QOS) O/O scheme, which is a part of cdma2000 standard [12] to provide overloading in the downlink using single user detector. QOS scheme are balance O/O scheme, where the correlation between any set-1 and set-2 users are equalized. The allowable overload of this scheme is investigated in the uplink on an AWGN and Rayleigh fading channels, so that the uncoded performance with iterative multistage interference cancellation detector remains close to the single user bound. It is shown that this scheme provides 19% and 11% overloading with SDIC technique for N= 16 and 64 respectively, with an SNR degradation of less than 0.35 dB as compared to single user bound at a BER of 0.00001. But on a Rayleigh fading channel, the channel overloading is 45% (29 extra users) at a BER of 0.0005, with an SNR degradation of about 1 dB as compared to single user performance for N=64. This is a significant amount of channel overloading on a Rayleigh fading channel.Keywords: DS-CDMA, Iterative Interference CancellationOrthogonal codes, Overloading.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1716596 Learning to Order Terms: Supervised Interestingness Measures in Terminology Extraction
Authors: Jérôme Azé, Mathieu Roche, Yves Kodratoff, Michèle Sebag
Abstract:
Term Extraction, a key data preparation step in Text Mining, extracts the terms, i.e. relevant collocation of words, attached to specific concepts (e.g. genetic-algorithms and decisiontrees are terms associated to the concept “Machine Learning" ). In this paper, the task of extracting interesting collocations is achieved through a supervised learning algorithm, exploiting a few collocations manually labelled as interesting/not interesting. From these examples, the ROGER algorithm learns a numerical function, inducing some ranking on the collocations. This ranking is optimized using genetic algorithms, maximizing the trade-off between the false positive and true positive rates (Area Under the ROC curve). This approach uses a particular representation for the word collocations, namely the vector of values corresponding to the standard statistical interestingness measures attached to this collocation. As this representation is general (over corpora and natural languages), generality tests were performed by experimenting the ranking function learned from an English corpus in Biology, onto a French corpus of Curriculum Vitae, and vice versa, showing a good robustness of the approaches compared to the state-of-the-art Support Vector Machine (SVM).Keywords: Text-mining, Terminology Extraction, Evolutionary algorithm, ROC Curve.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1659595 Data Mining to Capture User-Experience: A Case Study in Notebook Product Appearance Design
Authors: Rhoann Kerh, Chen-Fu Chien, Kuo-Yi Lin
Abstract:
In the era of rapidly increasing notebook market, consumer electronics manufacturers are facing a highly dynamic and competitive environment. In particular, the product appearance is the first part for user to distinguish the product from the product of other brands. Notebook product should differ in its appearance to engage users and contribute to the user experience (UX). The UX evaluates various product concepts to find the design for user needs; in addition, help the designer to further understand the product appearance preference of different market segment. However, few studies have been done for exploring the relationship between consumer background and the reaction of product appearance. This study aims to propose a data mining framework to capture the user’s information and the important relation between product appearance factors. The proposed framework consists of problem definition and structuring, data preparation, rules generation, and results evaluation and interpretation. An empirical study has been done in Taiwan that recruited 168 subjects from different background to experience the appearance performance of 11 different portable computers. The results assist the designers to develop product strategies based on the characteristics of consumers and the product concept that related to the UX, which help to launch the products to the right customers and increase the market shares. The results have shown the practical feasibility of the proposed framework.
Keywords: Consumers Decision Making, Product Design, Rough Set Theory, User Experience.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3512594 Exploring Social Impact of Emerging Technologies from Futuristic Data
Authors: Heeyeul Kwon, Yongtae Park
Abstract:
Despite the highly touted benefits, emerging technologies have unleashed pervasive concerns regarding unintended and unforeseen social impacts. Thus, those wishing to create safe and socially acceptable products need to identify such side effects and mitigate them prior to the market proliferation. Various methodologies in the field of technology assessment (TA), namely Delphi, impact assessment, and scenario planning, have been widely incorporated in such a circumstance. However, literatures face a major limitation in terms of sole reliance on participatory workshop activities. They unfortunately missed out the availability of a massive untapped data source of futuristic information flooding through the Internet. This research thus seeks to gain insights into utilization of futuristic data, future-oriented documents from the Internet, as a supplementary method to generate social impact scenarios whilst capturing perspectives of experts from a wide variety of disciplines. To this end, network analysis is conducted based on the social keywords extracted from the futuristic documents by text mining, which is then used as a guide to produce a comprehensive set of detailed scenarios. Our proposed approach facilitates harmonized depictions of possible hazardous consequences of emerging technologies and thereby makes decision makers more aware of, and responsive to, broad qualitative uncertainties.
Keywords: Emerging technologies, futuristic data, scenario, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2391593 The Investigation of Enzymatic Activity in the Soils under the Impact of Metallurgical Industrial Activity in Lori Marz, Armenia
Authors: T. H. Derdzyan, K. A. Ghazaryan, G. A. Gevorgyan
Abstract:
Beta-glucosidase, chitinase, leucine-aminopeptidase, acid phosphomonoesterase and acetate-esterase enzyme activities in the soils under the impact of metallurgical industrial activity in Lori marz (district) were investigated. The results of the study showed that the activities of the investigated enzymes in the soils decreased with increasing distance from the Shamlugh copper mine, the Chochkan tailings storage facility and the ore transportation road. Statistical analysis revealed that the activities of the enzymes were positively correlated (significant) to each other according to the observation sites which indicated that enzyme activities were affected by the same anthropogenic factor. The investigations showed that the soils were polluted with heavy metals (Cu, Pb, As, Co, Ni, Zn) due to copper mining activity in this territory. The results of Pearson correlation analysis revealed a significant negative correlation between heavy metal pollution degree (Nemerow integrated pollution index) and soil enzyme activity. All of this indicated that copper mining activity in this territory causing the heavy metal pollution of the soils resulted in the inhabitation of the activities of the enzymes which are considered as biological catalysts to decompose organic materials and facilitate the cycling of nutrients.
Keywords: Armenia, metallurgical industrial activity, heavy metal pollutionl, soil enzyme activity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2566592 A Class of Formal Operators for Combinatorial Identities and its Application
Authors: Ruigang Zhang, Wuyungaowa, Xingchen Ma
Abstract:
In this paper, we present some formulas of symbolic operator summation, which involving Generalization well-know number sequences or polynomial sequences, and mean while we obtain some identities about the sequences by employing M-R‘s substitution rule.
Keywords: Generating functions, operators sequence group, Riordan arrays, R. G operator group, combinatorial identities.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1805591 Latent Semantic Inference for Agriculture FAQ Retrieval
Authors: Dawei Wang, Rujing Wang, Ying Li, Baozi Wei
Abstract:
FAQ system can make user find answer to the problem that puzzles them. But now the research on Chinese FAQ system is still on the theoretical stage. This paper presents an approach to semantic inference for FAQ mining. To enhance the efficiency, a small pool of the candidate question-answering pairs retrieved from the system for the follow-up work according to the concept of the agriculture domain extracted from user input .Input queries or questions are converted into four parts, the question word segment (QWS), the verb segment (VS), the concept of agricultural areas segment (CS), the auxiliary segment (AS). A semantic matching method is presented to estimate the similarity between the semantic segments of the query and the questions in the pool of the candidate. A thesaurus constructed from the HowNet, a Chinese knowledge base, is adopted for word similarity measure in the matcher. The questions are classified into eleven intension categories using predefined question stemming keywords. For FAQ mining, given a query, the question part and answer part in an FAQ question-answer pair is matched with the input query, respectively. Finally, the probabilities estimated from these two parts are integrated and used to choose the most likely answer for the input query. These approaches are experimented on an agriculture FAQ system. Experimental results indicate that the proposed approach outperformed the FAQ-Finder system in agriculture FAQ retrieval.
Keywords: FAQ, Semantic Inference, Ontology.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1379590 Extraction of Data from Web Pages: A Vision Based Approach
Authors: P. S. Hiremath, Siddu P. Algur
Abstract:
With the explosive growth of information sources available on the World Wide Web, it has become increasingly difficult to identify the relevant pieces of information, since web pages are often cluttered with irrelevant content like advertisements, navigation-panels, copyright notices etc., surrounding the main content of the web page. Hence, tools for the mining of data regions, data records and data items need to be developed in order to provide value-added services. Currently available automatic techniques to mine data regions from web pages are still unsatisfactory because of their poor performance and tag-dependence. In this paper a novel method to extract data items from the web pages automatically is proposed. It comprises of two steps: (1) Identification and Extraction of the data regions based on visual clues information. (2) Identification of data records and extraction of data items from a data region. For step1, a novel and more effective method is proposed based on visual clues, which finds the data regions formed by all types of tags using visual clues. For step2 a more effective method namely, Extraction of Data Items from web Pages (EDIP), is adopted to mine data items. The EDIP technique is a list-based approach in which the list is a linear data structure. The proposed technique is able to mine the non-contiguous data records and can correctly identify data regions, irrespective of the type of tag in which it is bound. Our experimental results show that the proposed technique performs better than the existing techniques.
Keywords: Web data records, web data regions, web mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1901589 Biodegradation of Malathion by Acinetobacter baumannii Strain AFA Isolated from Domestic Sewage in Egypt
Authors: Ahmed F. Azmy , Amal E. Saafan, Tamer M. Essam, Magdy A. Amin, Shaban H. Ahmed
Abstract:
Bacterial strains capable of degradation of malathion from the domestic sewage were isolated by an enrichment culture technique. Three bacterial strains were screened and identified as Acinetobacter baumannii (AFA), Pseudomonas aeruginosa (PS1), and Pseudomonas mendocina (PS2) based on morphological, biochemical identification and 16S rRNA sequence analysis. Acinetobacter baumannii AFA was the most efficient malathion degrading bacterium, so used for further biodegradation study. AFA was able to grow in mineral salt medium (MSM) supplemented with malathion (100 mg/l) as a sole carbon source, and within 14 days, 84% of the initial dose was degraded by the isolate measured by high performance liquid chromatography. Strain AFA could also degrade other organophosphorus compounds including diazinon, chlorpyrifos and fenitrothion. The effect of different culture conditions on the degradation of malathion like inoculum density, other carbon or nitrogen sources, temperature and shaking were examined. Degradation of malathion and bacterial cell growth were accelerated when culture media were supplemented with yeast extract, glucose and citrate. The optimum conditions for malathion degradation by strain AFA were; an inoculum density of 1.5x 10^12CFU/ml at 30°C with shaking. A specific polymerase chain reaction primers were designed manually using multiple sequence alignment of the corresponding carboxylesterase enzymes of Acinetobacter species. Sequencing result of amplified PCR product and phylogenetic analysis showed low degree of homology with the other carboxylesterase enzymes of Acinetobacter strains, so we suggested that this enzyme is a novel esterase enzyme. Isolated bacterial strains may have potential role for use in bioremediation of malathion contaminated.
Keywords: Acinetobacter baumannii, biodegradation, Malathion, organophosphate pesticides.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3510588 Certain Subordination Results For A Class Of Analytic Functions Defined By The Generalized Integral Operator
Authors: C. Selvaraj, K. R. Karthikeyan
Abstract:
We obtain several interesting subordination results for a class of analytic functions defined by using a generalized integral operator.Keywords: Analytic functions, Hadamard product, Subordinating factor sequence
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1560587 A Sentence-to-Sentence Relation Network for Recognizing Textual Entailment
Authors: Isaac K. E. Ampomah, Seong-Bae Park, Sang-Jo Lee
Abstract:
Over the past decade, there have been promising developments in Natural Language Processing (NLP) with several investigations of approaches focusing on Recognizing Textual Entailment (RTE). These models include models based on lexical similarities, models based on formal reasoning, and most recently deep neural models. In this paper, we present a sentence encoding model that exploits the sentence-to-sentence relation information for RTE. In terms of sentence modeling, Convolutional neural network (CNN) and recurrent neural networks (RNNs) adopt different approaches. RNNs are known to be well suited for sequence modeling, whilst CNN is suited for the extraction of n-gram features through the filters and can learn ranges of relations via the pooling mechanism. We combine the strength of RNN and CNN as stated above to present a unified model for the RTE task. Our model basically combines relation vectors computed from the phrasal representation of each sentence and final encoded sentence representations. Firstly, we pass each sentence through a convolutional layer to extract a sequence of higher-level phrase representation for each sentence from which the first relation vector is computed. Secondly, the phrasal representation of each sentence from the convolutional layer is fed into a Bidirectional Long Short Term Memory (Bi-LSTM) to obtain the final sentence representations from which a second relation vector is computed. The relations vectors are combined and then used in then used in the same fashion as attention mechanism over the Bi-LSTM outputs to yield the final sentence representations for the classification. Experiment on the Stanford Natural Language Inference (SNLI) corpus suggests that this is a promising technique for RTE.Keywords: Deep neural models, natural language inference, recognizing textual entailment, sentence-to-sentence relation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1454586 GIS-Based Spatial Distribution and Evaluation of Selected Heavy Metals Contamination in Topsoil around Ecton Mining Area, Derbyshire, UK
Authors: Zahid O. Alibrahim, Craig D. Williams, Clive L. Roberts
Abstract:
The study area (Ecton mining area) is located in the southern part of the Peak District in Derbyshire, England. It is bounded by the River Manifold from the west. This area has been mined for a long period. As a result, huge amounts of potentially toxic metals were released into the surrounding area and are most likely to be a significant source of heavy metal contamination to the local soil, water and vegetation. In order to appraise the potential heavy metal pollution in this area, 37 topsoil samples (5-20 cm depth) were collected and analysed for their total content of Cu, Pb, Zn, Mn, Cr, Ni and V using ICP (Inductively Coupled Plasma) optical emission spectroscopy. Multivariate Geospatial analyses using the GIS technique were utilised to draw geochemical maps of the metals of interest over the study area. A few hotspot points, areas of elevated concentrations of metals, were specified, which are presumed to be the results of anthropogenic activities. In addition, the soil’s environmental quality was evaluated by calculating the Mullers’ Geoaccumulation index (I geo), which suggests that the degree of contamination of the investigated heavy metals has the following trend: Pb > Zn > Cu > Mn > Ni = Cr = V. Furthermore, the potential ecological risk, using the enrichment factor (EF), was also specified. On the basis of the calculated amount or the EF, the levels of pollution for the studied metals in the study area have the following order: Pb>Zn>Cu>Cr>V>Ni>Mn.
Keywords: Heavy metals, GIS, multivariate analysis, geoaccumulation index, enrichment factor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1241585 From Electroencephalogram to Epileptic Seizures Detection by Using Artificial Neural Networks
Authors: Gaetano Zazzaro, Angelo Martone, Roberto V. Montaquila, Luigi Pavone
Abstract:
Seizure is the main factor that affects the quality of life of epileptic patients. The diagnosis of epilepsy, and hence the identification of epileptogenic zone, is commonly made by using continuous Electroencephalogram (EEG) signal monitoring. Seizure identification on EEG signals is made manually by epileptologists and this process is usually very long and error prone. The aim of this paper is to describe an automated method able to detect seizures in EEG signals, using knowledge discovery in database process and data mining methods and algorithms, which can support physicians during the seizure detection process. Our detection method is based on Artificial Neural Network classifier, trained by applying the multilayer perceptron algorithm, and by using a software application, called Training Builder that has been developed for the massive extraction of features from EEG signals. This tool is able to cover all the data preparation steps ranging from signal processing to data analysis techniques, including the sliding window paradigm, the dimensionality reduction algorithms, information theory, and feature selection measures. The final model shows excellent performances, reaching an accuracy of over 99% during tests on data of a single patient retrieved from a publicly available EEG dataset.
Keywords: Artificial Neural Network, Data Mining, Electroencephalogram, Epilepsy, Feature Extraction, Seizure Detection, Signal Processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1314584 Identification of Promiscuous Epitopes for Cellular Immune Responses in the Major Antigenic Protein Rv3873 Encoded by Region of Difference 1 of Mycobacterium tuberculosis
Authors: Abu Salim Mustafa
Abstract:
Rv3873 is a relatively large size protein (371 amino acids in length) and its gene is located in the immunodominant genomic region of difference (RD)1 that is present in the genome of Mycobacterium tuberculosis but deleted from the genomes of all the vaccine strains of Bacillus Calmette Guerin (BCG) and most other mycobacteria. However, when tested for cellular immune responses using peripheral blood mononuclear cells from tuberculosis patients and BCG-vaccinated healthy subjects, this protein was found to be a major stimulator of cell mediated immune responses in both groups of subjects. In order to further identify the sequence of immunodominant epitopes and explore their Human Leukocyte Antigen (HLA)-restriction for epitope recognition, 24 peptides (25-mers overlapping with the neighboring peptides by 10 residues) covering the sequence of Rv3873 were synthesized chemically using fluorenylmethyloxycarbonyl chemistry and tested in cell mediated immune responses. The results of these experiments helped in the identification of an immunodominant peptide P9 that was recognized by people expressing varying HLA-DR types. Furthermore, it was also predicted to be a promiscuous binder with multiple epitopes for binding to HLA-DR, HLA-DP and HLA-DQ alleles of HLA-class II molecules that present antigens to T helper cells, and to HLA-class I molecules that present antigens to T cytotoxic cells. In addition, the evaluation of peptide P9 using an immunogenicity predictor server yielded a high score (0.94), which indicated a greater probability of this peptide to elicit a protective cellular immune response. In conclusion, P9, a peptide with multiple epitopes and ability to bind several HLA class I and class II molecules for presentation to cells of the cellular immune response, may be useful as a peptide-based vaccine against tuberculosis.
Keywords: Mycobacterium tuberculosis, Rv3873, peptides, vaccine
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 845583 Disparity Estimation for Objects of Interest
Authors: Yen San Yong, Hock Woon Hon
Abstract:
An algorithm for estimating the disparity of objects of interest is proposed. This algorithm uses image shifting and overlapping area to estimate the disparity value; thereby depth of the objects of interest can be obtained. The algorithm is able to perform at different levels of accuracy. However, as the accuracy increases the processing speed decreases. The algorithm is tested with static stereo images and sequence of stereo images. The experimental results are presented in this paper.Keywords: stereo vision, binocular parallax
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1230582 Almost Periodic Sequence Solutions of a Discrete Cooperation System with Feedback Controls
Authors: Ziping Li, Yongkun Li
Abstract:
In this paper, we consider the almost periodic solutions of a discrete cooperation system with feedback controls. Assuming that the coefficients in the system are almost periodic sequences, we obtain the existence and uniqueness of the almost periodic solution which is uniformly asymptotically stable.
Keywords: Discrete cooperation model, almost periodic solution, feedback control, Lyapunov function.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1447581 Simulation of Sample Paths of Non Gaussian Stationary Random Fields
Authors: Fabrice Poirion, Benedicte Puig
Abstract:
Mathematical justifications are given for a simulation technique of multivariate nonGaussian random processes and fields based on Rosenblatt-s transformation of Gaussian processes. Different types of convergences are given for the approaching sequence. Moreover an original numerical method is proposed in order to solve the functional equation yielding the underlying Gaussian process autocorrelation function.
Keywords: Simulation, nonGaussian, random field, multivariate, stochastic process.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1838580 Influence of Organic Modifier Loading on Particle Dispersion of Biodegradable Polycaprolactone/Montmorillonite Nanocomposites
Authors: O. I. H. Dimitry, N. A. Mansour, A. L. G. Saad
Abstract:
Natural sodium montmorillonite (NaMMT), Cloisite Na+ and two organophilic montmorillonites (OMMTs), Cloisites 20A and 15A were used. Polycaprolactone (PCL)/MMT composites containing 1, 3, 5, and 10 wt% of Cloisite Na+ and PCL/OMMT nanocomposites containing 5 and 10 wt% of Cloisites 20A and 15A were prepared via solution intercalation technique to study the influence of organic modifier loading on particle dispersion of PCL/ NaMMT composites. Thermal stabilities of the obtained composites were characterized by thermal analysis using the thermogravimetric analyzer (TGA) which showed that in the presence of nitrogen flow the incorporation of 5 and 10 wt% of filler brings some decrease in PCL thermal stability in the sequence: Cloisite Na+>Cloisite 15A > Cloisite 20A, while in the presence of air flow these fillers scarcely influenced the thermoxidative stability of PCL by slightly accelerating the process. The interaction between PCL and silicate layers was studied by Fourier transform infrared (FTIR) spectroscopy which confirmed moderate interactions between nanometric silicate layers and PCL segments. The electrical conductivity (σ) which describes the ionic mobility of the systems was studied as a function of temperature and showed that σ of PCL was enhanced on increasing the modifier loading at filler content of 5 wt%, especially at higher temperatures in the sequence: Cloisite Na+<Cloisite 20A<Cloisite 15A, and was then decreased to some extent with a further increase to 10 wt%. The activation energy Eσ obtained from the dependency of σ on temperature using Arrhenius equation was found to be lowest for the nanocomposite containing 5 wt% of Cloisite 15A. The dispersed behavior of clay in PCL matrix was evaluated by X-ray diffraction (XRD) and scanning electron microscopy (SEM) analyses which revealed partial intercalated structures in PCL/NaMMT composites and semi-intercalated/semi-exfoliated structures in PCL/OMMT nanocomposites containing 5 wt% of Cloisite 20A or Cloisite 15A.Keywords: Polycaprolactone, organoclay, nanocomposite, montmorillonite, electrical conductivity, activation energy, exfoliation, intercalation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1125579 The Process of Crisis: Model of Its Development in the Organization
Authors: M. Mikušová
Abstract:
The main aim of this paper is to present a clear and comprehensive picture of the process of a crisis in the organization which will help to better understand its possible developments. For a description of the sequence of individual steps and an indication of their causation and possible variants of the developments, a detailed flow diagram with verbal comment is applied. For simplicity, the process of the crisis is observed in four basic phases called: symptoms of the crisis, diagnosis, action and prevention. The model highlights the complexity of the phenomenon of the crisis and that the various phases of the crisis are interweaving.
Keywords: Crisis, management, model, organization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1134