Search results for: Sequence Similarity
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 856

Search results for: Sequence Similarity

706 A Systems Approach to Gene Ranking from DNA Microarray Data of Cervical Cancer

Authors: Frank Emmert Streib, Matthias Dehmer, Jing Liu, Max Mühlhauser

Abstract:

In this paper we present a method for gene ranking from DNA microarray data. More precisely, we calculate the correlation networks, which are unweighted and undirected graphs, from microarray data of cervical cancer whereas each network represents a tissue of a certain tumor stage and each node in the network represents a gene. From these networks we extract one tree for each gene by a local decomposition of the correlation network. The interpretation of a tree is that it represents the n-nearest neighbor genes on the n-th level of a tree, measured by the Dijkstra distance, and, hence, gives the local embedding of a gene within the correlation network. For the obtained trees we measure the pairwise similarity between trees rooted by the same gene from normal to cancerous tissues. This evaluates the modification of the tree topology due to progression of the tumor. Finally, we rank the obtained similarity values from all tissue comparisons and select the top ranked genes. For these genes the local neighborhood in the correlation networks changes most between normal and cancerous tissues. As a result we find that the top ranked genes are candidates suspected to be involved in tumor growth and, hence, indicates that our method captures essential information from the underlying DNA microarray data of cervical cancer.

Keywords: Graph similarity, DNA microarray data, cancer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1755
705 Image Indexing Using a Color Similarity Metric based on the Human Visual System

Authors: Angelo Nodari, Ignazio Gallo

Abstract:

The novelty proposed in this study is twofold and consists in the developing of a new color similarity metric based on the human visual system and a new color indexing based on a textual approach. The new color similarity metric proposed is based on the color perception of the human visual system. Consequently the results returned by the indexing system can fulfill as much as possibile the user expectations. We developed a web application to collect the users judgments about the similarities between colors, whose results are used to estimate the metric proposed in this study. In order to index the image's colors, we used a text indexing engine to facilitate the integration of visual features in a database of text documents. The textual signature is build by weighting the image's colors in according to their occurrence in the image. The use of a textual indexing engine, provide us a simple, fast and robust solution to index images. A typical usage of the system proposed in this study, is the development of applications whose data type is both visual and textual. In order to evaluate the proposed method we chose a price comparison engine as a case of study, collecting a series of commercial offers containing the textual description and the image representing a specific commercial offer.

Keywords: Color Extraction, Content-Based Image Retrieval, Indexing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3026
704 An Optimization Algorithm Based on Dynamic Schema with Dissimilarities and Similarities of Chromosomes

Authors: Radhwan Yousif Sedik Al-Jawadi

Abstract:

Optimization is necessary for finding appropriate solutions to a range of real-life problems. In particular, genetic (or more generally, evolutionary) algorithms have proved very useful in solving many problems for which analytical solutions are not available. In this paper, we present an optimization algorithm called Dynamic Schema with Dissimilarity and Similarity of Chromosomes (DSDSC) which is a variant of the classical genetic algorithm. This approach constructs new chromosomes from a schema and pairs of existing ones by exploring their dissimilarities and similarities. To show the effectiveness of the algorithm, it is tested and compared with the classical GA, on 15 two-dimensional optimization problems taken from literature. We have found that, in most cases, our method is better than the classical genetic algorithm.

Keywords: Genetic algorithm, similarity and dissimilarity, chromosome injection, dynamic schema.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1296
703 Effect of Implementation of Nonlinear Sequence Transformations on Power Series Expansion for a Class of Non-Linear Abel Equations

Authors: Javad Abdalkhani

Abstract:

Convergence of power series solutions for a class of non-linear Abel type equations, including an equation that arises in nonlinear cooling of semi-infinite rods, is very slow inside their small radius of convergence. Beyond that the corresponding power series are wildly divergent. Implementation of nonlinear sequence transformation allow effortless evaluation of these power series on very large intervals..

Keywords: Nonlinear transformation, Abel Volterra Equations, Mathematica

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1304
702 Using the PGAS Programming Paradigm for Biological Sequence Alignment on a Chip Multi-Threading Architecture

Authors: M. Bakhouya, S. A. Bahra, T. El-Ghazawi

Abstract:

The Partitioned Global Address Space (PGAS) programming paradigm offers ease-of-use in expressing parallelism through a global shared address space while emphasizing performance by providing locality awareness through the partitioning of this address space. Therefore, the interest in PGAS programming languages is growing and many new languages have emerged and are becoming ubiquitously available on nearly all modern parallel architectures. Recently, new parallel machines with multiple cores are designed for targeting high performance applications. Most of the efforts have gone into benchmarking but there are a few examples of real high performance applications running on multicore machines. In this paper, we present and evaluate a parallelization technique for implementing a local DNA sequence alignment algorithm using a PGAS based language, UPC (Unified Parallel C) on a chip multithreading architecture, the UltraSPARC T1.

Keywords: Partitioned Global Address Space, Unified Parallel C, Multicore machines, Multi-threading Architecture, Sequence alignment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1389
701 Construction of cDNALibrary and EST Analysis of Tenebriomolitorlarvae

Authors: JiEun Jeong, Se-Won Kang, Hee-Ju Hwang, Sung-Hwa Chae, Sang-Haeng Choi, Hong-SeogPark, YeonSoo Han, Bok-Reul Lee, Dae-Hyun Seog, Yong Seok Lee

Abstract:

Tofurther advance research on immune-related genes from T. molitor, we constructed acDNA library and analyzed expressed sequence taq (EST) sequences from 1,056 clones. After removing vector sequence and quality checkingthrough thePhred program (trim_alt 0.05 (P-score>20), 1039 sequences were generated. The average length of insert was 792 bp. In addition, we identified 162 clusters, 167 contigs and 391 contigs after clustering and assembling process using a TGICL package. EST sequences were searchedagainst NCBI nr database by local BLAST (blastx, EKeywords: EST, Innate immunity, Tenebriomolitor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1536
700 A Class of Recurrent Sequences Exhibiting Some Exciting Properties of Balancing Numbers

Authors: G.K.Panda, S.S.Rout

Abstract:

The balancing numbers are natural numbers n satisfying the Diophantine equation 1 + 2 + 3 + · · · + (n - 1) = (n + 1) + (n + 2) + · · · + (n + r); r is the balancer corresponding to the balancing number n.The nth balancing number is denoted by Bn and the sequence {Bn}1 n=1 satisfies the recurrence relation Bn+1 = 6Bn-Bn-1. The balancing numbers posses some curious properties, some like Fibonacci numbers and some others are more interesting. This paper is a study of recurrent sequence {xn}1 n=1 satisfying the recurrence relation xn+1 = Axn - Bxn-1 and possessing some curious properties like the balancing numbers.

Keywords: Recurrent sequences, Balancing numbers, Lucas balancing numbers, Binet form.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1517
699 Assembly Process Algorithms of Flexible Cell

Authors: M. Kusá, M. Matúšová, A. Javorová, K. Velí

Abstract:

This paper deals about four items assembly process of linear drive. This assembly will be realized in flexible assembly cell on Institute of Manufacturing Systems and Applied Mechanics. There is defined manufacturing cell, individual actuators created our flexible cell. Next chapter is about control type, detailed describe a sequence control type, which will be used in mentioned flexible assembly cell. All cell control is divided in individual steps instructions. There instructions illustrate table number III.

Keywords: assembly, flexible cell, sequence control

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1309
698 Hidden Markov Model for the Simulation Study of Neural States and Intentionality

Authors: R. B. Mishra

Abstract:

Hidden Markov Model (HMM) has been used in prediction and determination of states that generate different neural activations as well as mental working conditions. This paper addresses two applications of HMM; one to determine the optimal sequence of states for two neural states: Active (AC) and Inactive (IA) for the three emission (observations) which are for No Working (NW), Waiting (WT) and Working (W) conditions of human beings. Another is for the determination of optimal sequence of intentionality i.e. Believe (B), Desire (D), and Intention (I) as the states and three observational sequences: NW, WT and W. The computational results are encouraging and useful.

Keywords: BDI, HMM, neural activation, optimal states, working conditions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 870
697 The Role and Importance of Genome Sequencing in Prediction of Cancer Risk

Authors: M. Sadeghi, H. Pezeshk, R. Tusserkani, A. Sharifi Zarchi, A. Malekpour, M. Foroughmand, S. Goliaei, M. Totonchi, N. Ansari–Pour

Abstract:

The role and relative importance of intrinsic and extrinsic factors in the development of complex diseases such as cancer still remains a controversial issue. Determining the amount of variation explained by these factors needs experimental data and statistical models. These models are nevertheless based on the occurrence and accumulation of random mutational events during stem cell division, thus rendering cancer development a stochastic outcome. We demonstrate that not only individual genome sequencing is uninformative in determining cancer risk, but also assigning a unique genome sequence to any given individual (healthy or affected) is not meaningful. Current whole-genome sequencing approaches are therefore unlikely to realize the promise of personalized medicine. In conclusion, since genome sequence differs from cell to cell and changes over time, it seems that determining the risk factor of complex diseases based on genome sequence is somewhat unrealistic, and therefore, the resulting data are likely to be inherently uninformative.

Keywords: Cancer risk, extrinsic factors, genome sequencing, intrinsic factors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1117
696 A Similarity Metric for Assessment of Image Fusion Algorithms

Authors: Nedeljko Cvejic, Artur Łoza, David Bull, Nishan Canagarajah

Abstract:

In this paper, we present a novel objective nonreference performance assessment algorithm for image fusion. It takes into account local measurements to estimate how well the important information in the source images is represented by the fused image. The metric is based on the Universal Image Quality Index and uses the similarity between blocks of pixels in the input images and the fused image as the weighting factors for the metrics. Experimental results confirm that the values of the proposed metrics correlate well with the subjective quality of the fused images, giving a significant improvement over standard measures based on mean squared error and mutual information.

Keywords: Fusion performance measures, image fusion, nonreferencequality measures, objective quality measures.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2490
695 Generation of Sets of Synthetic Classifiers for the Evaluation of Abstract-Level Combination Methods

Authors: N. Greco, S. Impedovo, R.Modugno, G. Pirlo

Abstract:

This paper presents a new technique for generating sets of synthetic classifiers to evaluate abstract-level combination methods. The sets differ in terms of both recognition rates of the individual classifiers and degree of similarity. For this purpose, each abstract-level classifier is considered as a random variable producing one class label as the output for an input pattern. From the initial set of classifiers, new slightly different sets are generated by applying specific operators, which are defined at the purpose. Finally, the sets of synthetic classifiers have been used to estimate the performance of combination methods for abstract-level classifiers. The experimental results demonstrate the effectiveness of the proposed approach.

Keywords: Abstract-level Classifier, Dempster-Shafer Rule, Multi-expert Systems, Similarity Index, System Evaluation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1486
694 Molecular Characterization of Free Radicals Decomposing Genes on Plant Developmental Stages

Authors: R. Haddad, K. Morris, V. Buchanan-Wollaston

Abstract:

Biochemical and molecular analysis of some antioxidant enzyme genes revealed different level of gene expression on oilseed (Brassica napus). For molecular and biochemical analysis, leaf tissues were harvested from plants at eight different developmental stages, from young to senescence. The levels of total protein and chlorophyll were increased during maturity stages of plant, while these were decreased during the last stages of plant growth. Structural analysis (nucleotide and deduced amino acid sequence, and phylogenic tree) of a complementary DNA revealed a high level of similarity for a family of Catalase genes. The expression of the gene encoded by different Catalase isoforms was assessed during different plant growth phase. No significant difference between samples was observed, when Catalase activity was statistically analyzed at different developmental stages. EST analysis exhibited different transcripts levels for a number of other relevant antioxidant genes (different isoforms of SOD and glutathione). The high level of transcription of these genes at senescence stages was indicated that these genes are senescenceinduced genes.

Keywords: Biochemical analysis, Oilseed, Expression pattern, Growth phases

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1549
693 Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance

Authors: Loai AbdAllah, Mahmoud Kaiyal

Abstract:

Missing values in real-world datasets are a common problem. Many algorithms were developed to deal with this problem, most of them replace the missing values with a fixed value that was computed based on the observed values. In our work, we used a distance function based on Bhattacharyya distance to measure the distance between objects with missing values. Bhattacharyya distance, which measures the similarity of two probability distributions. The proposed distance distinguishes between known and unknown values. Where the distance between two known values is the Mahalanobis distance. When, on the other hand, one of them is missing the distance is computed based on the distribution of the known values, for the coordinate that contains the missing value. This method was integrated with Wikaya, a digital health company developing a platform that helps to improve prevention of chronic diseases such as diabetes and cancer. In order for Wikaya’s recommendation system to work distance between users need to be measured. Since there are missing values in the collected data, there is a need to develop a distance function distances between incomplete users profiles. To evaluate the accuracy of the proposed distance function in reflecting the actual similarity between different objects, when some of them contain missing values, we integrated it within the framework of k nearest neighbors (kNN) classifier, since its computation is based only on the similarity between objects. To validate this, we ran the algorithm over diabetes and breast cancer datasets, standard benchmark datasets from the UCI repository. Our experiments show that kNN classifier using our proposed distance function outperforms the kNN using other existing methods.

Keywords: Missing values, distance metric, Bhattacharyya distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 781
692 Evaluating Generative Neural Attention Weights-Based Chatbot on Customer Support Twitter Dataset

Authors: Sinarwati Mohamad Suhaili, Naomie Salim, Mohamad Nazim Jambli

Abstract:

Sequence-to-sequence (seq2seq) models augmented with attention mechanisms are increasingly important in automated customer service. These models, adept at recognizing complex relationships between input and output sequences, are essential for optimizing chatbot responses. Central to these mechanisms are neural attention weights that determine the model’s focus during sequence generation. Despite their widespread use, there remains a gap in the comparative analysis of different attention weighting functions within seq2seq models, particularly in the context of chatbots utilizing the Customer Support Twitter (CST) dataset. This study addresses this gap by evaluating four distinct attention-scoring functions—dot, multiplicative/general, additive, and an extended multiplicative function with a tanh activation parameter — in neural generative seq2seq models. Using the CST dataset, these models were trained and evaluated over 10 epochs with the AdamW optimizer. Evaluation criteria included validation loss and BLEU scores implemented under both greedy and beam search strategies with a beam size of k = 3. Results indicate that the model with the tanh-augmented multiplicative function significantly outperforms its counterparts, achieving the lowest validation loss (1.136484) and the highest BLEU scores (0.438926 under greedy search, 0.443000 under beam search, k = 3). These findings emphasize the crucial influence of selecting an appropriate attention-scoring function to enhance the performance of seq2seq models for chatbots, particularly highlighting the model integrating tanh activation as a promising approach to improving chatbot quality in customer support contexts.

Keywords: Attention weight, chatbot, encoder-decoder, neural generative attention, score function, sequence-to-sequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 89
691 Fast Database Indexing for Large Protein Sequence Collections Using Parallel N-Gram Transformation Algorithm

Authors: Jehad A. H. Hammad, Nur'Aini binti Abdul Rashid

Abstract:

With the rapid development in the field of life sciences and the flooding of genomic information, the need for faster and scalable searching methods has become urgent. One of the approaches that were investigated is indexing. The indexing methods have been categorized into three categories which are the lengthbased index algorithms, transformation-based algorithms and mixed techniques-based algorithms. In this research, we focused on the transformation based methods. We embedded the N-gram method into the transformation-based method to build an inverted index table. We then applied the parallel methods to speed up the index building time and to reduce the overall retrieval time when querying the genomic database. Our experiments show that the use of N-Gram transformation algorithm is an economical solution; it saves time and space too. The result shows that the size of the index is smaller than the size of the dataset when the size of N-Gram is 5 and 6. The parallel N-Gram transformation algorithm-s results indicate that the uses of parallel programming with large dataset are promising which can be improved further.

Keywords: Biological sequence, Database index, N-gram indexing, Parallel computing, Sequence retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2135
690 Codon-optimized Carbonic Anhydrase from Dunaliella species: Expression and Characterization

Authors: Seung Pil Pack

Abstract:

Carbonic anhydrases (CAs) has been focused as biological catalysis for CO2 sequestration process because it can catalyze the conversion of CO2 to bicarbonate. Here, codon-optimized sequence of α type-CA cloned from Duneliala species. (DsCAopt) was constructed, expressed, and characterized. The expression level in E. coli BL21(DE3) was better for codon-optimized DsCAopt than intact sequence of DsCAopt. DsCAopt enzyme shows high-stability at pH 7.6/10.0. In final, we demonstrated that in the Ca2+ solution, DsCAopt enzyme can catalyze well the conversion of CO2 to CaCO3, as the calcite form.

Keywords: Carbonic anhydrase, Codon-optimization, Duneliala species, CO2 sequestration

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1646
689 XML Schema Automatic Matching Solution

Authors: Huynh Quyet Thang, Vo Sy Nam

Abstract:

Schema matching plays a key role in many different applications, such as schema integration, data integration, data warehousing, data transformation, E-commerce, peer-to-peer data management, ontology matching and integration, semantic Web, semantic query processing, etc. Manual matching is expensive and error-prone, so it is therefore important to develop techniques to automate the schema matching process. In this paper, we present a solution for XML schema automated matching problem which produces semantic mappings between corresponding schema elements of given source and target schemas. This solution contributed in solving more comprehensively and efficiently XML schema automated matching problem. Our solution based on combining linguistic similarity, data type compatibility and structural similarity of XML schema elements. After describing our solution, we present experimental results that demonstrate the effectiveness of this approach.

Keywords: XML Schema, Schema Matching, SemanticMatching, Automatic XML Schema Matching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1830
688 Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Authors: Shohei Maruyama, Yasuo Matsuyama, Sachiyo Aburatani

Abstract:

Development of a method to estimate gene functions is an important task in bioinformatics. One of the approaches for the annotation is the identification of the metabolic pathway that genes are involved in. Since gene expression data reflect various intracellular phenomena, those data are considered to be related with genes’ functions. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.

Keywords: Metabolic pathways, gene expression data, microarray, Kullback–Leibler divergence, KL divergence, support vector machines, SVM, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2336
687 Fighter Aircraft Selection Using Technique for Order Preference by Similarity to Ideal Solution with Multiple Criteria Decision Making Analysis

Authors: C. Ardil

Abstract:

This paper presents a multiple criteria decision making analysis technique for selecting fighter aircraft for the national air force. The selection of military aircraft is a process consisting of contradictory goals and objectives. When a modern air force needs to choose fighter aircraft to upgrade existing fleets, a multiple criteria decision making analysis and scenario planning for defense acquisition has been put forward. The selection of fighter aircraft for the air defense force is a strategic decision making process, since the purchase or lease of fighter jets, maintenance and operating costs and having a fleet is the biggest cost for the air force. Multiple criteria decision making analysis methods are effectively applied to facilitate decision making from various available options. The selection criteria were determined using the literature on the problem of fighter aircraft selection. The selection of fighter aircraft to be purchased for the air defense forces is handled using a multiple criteria decision making analysis technique that also determines a suitable methodological approach for the defense procurement and fleet upgrade planning process. The aim of this study is to originate an approach to evaluate fighter aircraft alternatives, Su-35, F-35, and TF-X (MMU), based on technique for order preference by similarity to ideal solution (TOPSIS).

Keywords: Fighter Aircraft, Fighter Aircraft Selection, Technique for Order Preference by Similarity to Ideal Solution, TOPSIS, Multiple Criteria Decision Making, Multiple Criteria Decision Making Analysis, MCDMA, Su-35, F-35, TF-X (MMU)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 631
686 Ranking Genes from DNA Microarray Data of Cervical Cancer by a local Tree Comparison

Authors: Frank Emmert-Streib, Matthias Dehmer, Jing Liu, Max Muhlhauser

Abstract:

The major objective of this paper is to introduce a new method to select genes from DNA microarray data. As criterion to select genes we suggest to measure the local changes in the correlation graph of each gene and to select those genes whose local changes are largest. More precisely, we calculate the correlation networks from DNA microarray data of cervical cancer whereas each network represents a tissue of a certain tumor stage and each node in the network represents a gene. From these networks we extract one tree for each gene by a local decomposition of the correlation network. The interpretation of a tree is that it represents the n-nearest neighbor genes on the n-th level of a tree, measured by the Dijkstra distance, and, hence, gives the local embedding of a gene within the correlation network. For the obtained trees we measure the pairwise similarity between trees rooted by the same gene from normal to cancerous tissues. This evaluates the modification of the tree topology due to tumor progression. Finally, we rank the obtained similarity values from all tissue comparisons and select the top ranked genes. For these genes the local neighborhood in the correlation networks changes most between normal and cancerous tissues. As a result we find that the top ranked genes are candidates suspected to be involved in tumor growth. This indicates that our method captures essential information from the underlying DNA microarray data of cervical cancer.

Keywords: Graph similarity, generalized trees, graph alignment, DNA microarray data, cervical cancer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1752
685 Efficient HAAR Wavelet Transform with Embedded Zerotrees of Wavelet Compression for Color Images

Authors: S. Piramu Kailasam

Abstract:

This study is expected to compress true color image with compression algorithms in color spaces to provide high compression rates. The need of high compression ratio is to improve storage space. Alternative aim is to rank compression algorithms in a suitable color space. The dataset is sequence of true color images with size 128 x 128. HAAR Wavelet is one of the famous wavelet transforms, has great potential and maintains image quality of color images. HAAR wavelet Transform using Set Partitioning in Hierarchical Trees (SPIHT) algorithm with different color spaces framework is applied to compress sequence of images with angles. Embedded Zerotrees of Wavelet (EZW) is a powerful standard method to sequence data. Hence the proposed compression frame work of HAAR wavelet, xyz color space, morphological gradient and applied image with EZW compression, obtained improvement to other methods, in terms of Compression Ratio, Mean Square Error, Peak Signal Noise Ratio and Bits Per Pixel quality measures.

Keywords: Color Spaces, HAAR Wavelet, Morphological Gradient, Embedded Zerotrees Wavelet Compression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 518
684 An Index based Forward Backward Multiple Pattern Matching Algorithm

Authors: Raju Bhukya, DVLN Somayajulu

Abstract:

Pattern matching is one of the fundamental applications in molecular biology. Searching DNA related data is a common activity for molecular biologists. In this paper we explore the applicability of a new pattern matching technique called Index based Forward Backward Multiple Pattern Matching algorithm(IFBMPM), for DNA Sequences. Our approach avoids unnecessary comparisons in the DNA Sequence due to this; the number of comparisons of the proposed algorithm is very less compared to other existing popular methods. The number of comparisons rapidly decreases and execution time decreases accordingly and shows better performance.

Keywords: Comparisons, DNA Sequence, Index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2374
683 Genome-Wide Analysis of BES1/BZR1 Gene Family in Five Plant Species

Authors: Jafar Ahmadi, Zhohreh Asiaban, Sedigheh Fabriki Ourang

Abstract:

Brassinosteroids (BRs) regulate cell elongation, vascular differentiation, senescence, and stress responses. BRs signal through the BES1/BZR1 family of transcription factors, which regulate hundreds of target genes involved in this pathway. In this research a comprehensive genome-wide analysis was carried out in BES1/BZR1 gene family in Arabidopsis thaliana, Cucumis sativus, Vitis vinifera, Glycin max and Brachypodium distachyon. Specifications of the desired sequences, dot plot and hydropathy plot were analyzed in the protein and genome sequences of five plant species. The maximum amino acid length was attributed to protein sequence Brdic3g with 374aa and the minimum amino acid length was attributed to protein sequence Gm7g with 163aa. The maximum Instability index was attributed to protein sequence AT1G19350 equal with 79.99 and the minimum Instability index was attributed to protein sequence Gm5g equal with 33.22. Aliphatic index of these protein sequences ranged from 47.82 to 78.79 in Arabidopsis thaliana, 49.91 to 57.50 in Vitis vinifera, 55.09 to 82.43 in Glycin max, 54.09 to 54.28 in Brachypodium distachyon 55.36 to 56.83 in Cucumis sativus. Overall, data obtained from our investigation contributes a better understanding of the complexity of the BES1/BZR1 gene family and provides the first step towards directing future experimental designs to perform systematic analysis of the functions of the BES1/BZR1 gene family.

Keywords: BES1/BZR1, Brassinosteroids, Phylogenetic analysis, Transcription factor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2255
682 Web Proxy Detection via Bipartite Graphs and One-Mode Projections

Authors: Zhipeng Chen, Peng Zhang, Qingyun Liu, Li Guo

Abstract:

With the Internet becoming the dominant channel for business and life, many IPs are increasingly masked using web proxies for illegal purposes such as propagating malware, impersonate phishing pages to steal sensitive data or redirect victims to other malicious targets. Moreover, as Internet traffic continues to grow in size and complexity, it has become an increasingly challenging task to detect the proxy service due to their dynamic update and high anonymity. In this paper, we present an approach based on behavioral graph analysis to study the behavior similarity of web proxy users. Specifically, we use bipartite graphs to model host communications from network traffic and build one-mode projections of bipartite graphs for discovering social-behavior similarity of web proxy users. Based on the similarity matrices of end-users from the derived one-mode projection graphs, we apply a simple yet effective spectral clustering algorithm to discover the inherent web proxy users behavior clusters. The web proxy URL may vary from time to time. Still, the inherent interest would not. So, based on the intuition, by dint of our private tools implemented by WebDriver, we examine whether the top URLs visited by the web proxy users are web proxies. Our experiment results based on real datasets show that the behavior clusters not only reduce the number of URLs analysis but also provide an effective way to detect the web proxies, especially for the unknown web proxies.

Keywords: Bipartite graph, clustering, one-mode projection, web proxy detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 746
681 Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Authors: Yunseok Noh, Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park

Abstract:

Scripts are one of the basic text resources to understand broadcasting contents. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches, and provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scene segments consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics by statistical learning method. To tackle this problem, we propose a method to improve topic quality with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, more accurate topical representations lead to get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. By iteratively inferring topics and determining semantically neighborhood scene segments, we draw a topic space represents broadcasting contents well. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

Keywords: Broadcasting contents, generalized P´olya urn model, scripts, text similarity, topic model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1817
680 Error-Robust Nature of Genome Profiling Applied for Clustering of Species Demonstrated by Computer Simulation

Authors: Shamim Ahmed Koichi Nishigaki

Abstract:

Genome profiling (GP), a genotype based technology, which exploits random PCR and temperature gradient gel electrophoresis, has been successful in identification/classification of organisms. In this technology, spiddos (Species identification dots) and PaSS (Pattern similarity score) were employed for measuring the closeness (or distance) between genomes. Based on the closeness (PaSS), we can buildup phylogenetic trees of the organisms. We noticed that the topology of the tree is rather robust against the experimental fluctuation conveyed by spiddos. This fact was confirmed quantitatively in this study by computer-simulation, providing the limit of the reliability of this highly powerful methodology. As a result, we could demonstrate the effectiveness of the GP approach for identification/classification of organisms.

Keywords: Fluctuation, Genome profiling (GP), Pattern similarity score (PaSS), Robustness, Spiddos-shift.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1538
679 A Novel Approach to Asynchronous State Machine Modeling on Multisim for Avoiding Function Hazards

Authors: L. Parisi, D. Hamili, N. Azlan

Abstract:

The aim of this study was to design and simulate a particular type of Asynchronous State Machine (ASM), namely a ‘traffic light controller’ (TLC), operated at a frequency of 0.5 Hz. The design task involved two main stages: firstly, designing a 4-bit binary counter using J-K flip flops as the timing signal and, subsequently, attaining the digital logic by deploying ASM design process. The TLC was designed such that it showed a sequence of three different colours, i.e. red, yellow and green, corresponding to set thresholds by deploying the least number of AND, OR and NOT gates possible. The software Multisim was deployed to design such circuit and simulate it for circuit troubleshooting in order for it to display the output sequence of the three different colours on the traffic light in the correct order. A clock signal, an asynchronous 4- bit binary counter that was designed through the use of J-K flip flops along with an ASM were used to complete this sequence, which was programmed to be repeated indefinitely. Eventually, the circuit was debugged and optimized, thus displaying the correct waveforms of the three outputs through the logic analyser. However, hazards occurred when the frequency was increased to 10 MHz. This was attributed to delays in the feedback being too high.

Keywords: Asynchronous State Machine, Traffic Light Controller, Circuit Design, Digital Electronics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3242
678 Destination Port Detection for Vessels: An Analytic Tool for Optimizing Port Authorities Resources

Authors: Lubna Eljabu, Mohammad Etemad, Stan Matwin

Abstract:

Port authorities have many challenges in congested ports to allocate their resources to provide a safe and secure loading/unloading procedure for cargo vessels. Selecting a destination port is the decision of a vessel master based on many factors such as weather, wavelength and changes of priorities. Having access to a tool which leverages Automatic Identification System (AIS) messages to monitor vessel’s movements and accurately predict their next destination port promotes an effective resource allocation process for port authorities. In this research, we propose a method, namely, Reference Route of Trajectory (RRoT) to assist port authorities in predicting inflow and outflow traffic in their local environment by monitoring AIS messages. Our RRo method creates a reference route based on historical AIS messages. It utilizes some of the best trajectory similarity measures to identify the destination of a vessel using their recent movement. We evaluated five different similarity measures such as Discrete Frechet Distance (DFD), Dynamic Time ´ Warping (DTW), Partial Curve Mapping (PCM), Area between two curves (Area) and Curve length (CL). Our experiments show that our method identifies the destination port with an accuracy of 98.97% and an f-measure of 99.08% using Dynamic Time Warping (DTW) similarity measure.

Keywords: Spatial temporal data mining, trajectory mining, trajectory similarity, resource optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 695
677 Applications of Rough Set Decompositions in Information Retrieval

Authors: Chen Wu, Xiaohua Hu

Abstract:

This paper proposes rough set models with three different level knowledge granules in incomplete information system under tolerance relation by similarity between objects according to their attribute values. Through introducing dominance relation on the discourse to decompose similarity classes into three subclasses: little better subclass, little worse subclass and vague subclass, it dismantles lower and upper approximations into three components. By using these components, retrieving information to find naturally hierarchical expansions to queries and constructing answers to elaborative queries can be effective. It illustrates the approach in applying rough set models in the design of information retrieval system to access different granular expanded documents. The proposed method enhances rough set model application in the flexibility of expansions and elaborative queries in information retrieval.

Keywords: Incomplete information system, Rough set model, tolerance relation, dominance relation, approximation, decomposition, elaborative query.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1611