Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25

Entropy Related Publications

25 Content-Based Image Retrieval Using HSV Color Space Features

Authors: Hamid Hassanpour, Hamed Qazanfari, Kazem Qazanfari

Abstract:

In this paper, a method is provided for content-based image retrieval. Content-based image retrieval system searches query an image based on its visual content in an image database to retrieve similar images. In this paper, with the aim of simulating the human visual system sensitivity to image's edges and color features, the concept of color difference histogram (CDH) is used. CDH includes the perceptually color difference between two neighboring pixels with regard to colors and edge orientations. Since the HSV color space is close to the human visual system, the CDH is calculated in this color space. In addition, to improve the color features, the color histogram in HSV color space is also used as a feature. Among the extracted features, efficient features are selected using entropy and correlation criteria. The final features extract the content of images most efficiently. The proposed method has been evaluated on three standard databases Corel 5k, Corel 10k and UKBench. Experimental results show that the accuracy of the proposed image retrieval method is significantly improved compared to the recently developed methods.

Keywords: correlation, Entropy, content-based image retrieval, color difference histogram, efficient features selection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 89
24 Optimized and Secured Digital Watermarking Using Entropy, Chaotic Grid Map and Its Performance Analysis

Authors: R. Rama Kishore, Sunesh

Abstract:

This paper presents an optimized, robust, and secured watermarking technique. The methodology used in this work is the combination of entropy and chaotic grid map. The proposed methodology incorporates Discrete Cosine Transform (DCT) on the host image. To improve the imperceptibility of the method, the host image DCT blocks, where the watermark is to be embedded, are further optimized by considering the entropy of the blocks. Chaotic grid is used as a key to reorder the DCT blocks so that it will further increase security while selecting the watermark embedding locations and its sequence. Without a key, one cannot reveal the exact watermark from the watermarked image. The proposed method is implemented on four different images. It is concluded that the proposed method is giving better results in terms of imperceptibility measured through PSNR and found to be above 50. In order to prove the effectiveness of the method, the performance analysis is done after implementing different attacks on the watermarked images. It is found that the methodology is very strong against JPEG compression attack even with the quality parameter up to 15. The experimental results are confirming that the combination of entropy and chaotic grid map method is strong and secured to different image processing attacks.

Keywords: Digital Watermarking, Entropy, discrete cosine transform, chaotic grid map

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 284
23 A Hypercube Social Feature Extraction and Multipath Routing in Delay Tolerant Networks

Authors: S. Balaji, E. Golden Julie, Y. Harold Robinson, M. Rajaram

Abstract:

Delay Tolerant Networks (DTN) which have sufficient state information include trajectory and contact information, to protect routing efficiency. However, state information is dynamic and hard to obtain without a global and/or long-term collection process. To deal with these problems, the internal social features of each node are introduced in the network to perform the routing process. This type of application is motivated from several human contact networks where people contact each other more frequently if they have more social features in common. Two unique processes were developed for this process; social feature extraction and multipath routing. The routing method then becomes a hypercube–based feature matching process. Furthermore, the effectiveness of multipath routing is evaluated and compared to that of single-path routing.

Keywords: Entropy, Delay Tolerant Networks, Multipath Routing, human contact networks, hyper cubes, social features

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 925
22 Frank Norris’ McTeague: An Entropic Melodrama

Authors: Monireh Arvin, Mohsen Masoomi, Fazel Asadi Amjad

Abstract:

According to Naturalistic principles, human destiny in the form of blind chance and determinism, entraps the individual, so man is a defenceless creature unable to escape from the ruthless paws of a stoical universe. In Naturalism; nonetheless, melodrama mirrors a conscious alternative with a peculiar function. A typical American Naturalistic character thus cannot be a subject for social criticism of American society since they are not victims of the ongoing virtual slavery, capitalist system, nor of a ruined milieu, but of their own volition, and more importantly, their character frailty. Through a Postmodern viewpoint, each Naturalistic work can encompass some entropic trends and changes culminating in an entire failure and devastation. Frank Norris in McTeague displays the futile struggles of ordinary men and how they end up brutes. McTeague encompasses intoxication, abuse, violation, and ruthless homicides. Norris’ depictions of the falling individual as a demon represent the entropic dimension of Naturalistic novels. McTeague’s defeat is somewhat his own fault, the result of his own blunders and resolution, not the result of sheer accident. Throughout the novel, each character is a kind of insane quester indicating McTeague’s decadence and, by inference, the decadence of Western civilisation. McTeague seems to designate Norris’ solicitude for a community fabricated by the elements of human negative demeanours and conducts hauling acute symptoms of infectious dehumanisation. The aim of this article is to illustrate how one specific negative human disposition gradually, like a running fire, can spread everywhere and burn everything in itself. The author applies the concept of entropy metaphorically to describe the individual devolutions that necessarily comprise community entropy in McTeague, a dying universe.

Keywords: Entropy, Gypsy, animal imagery, melodrama

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 921
21 Upgraded Rough Clustering and Outlier Detection Method on Yeast Dataset by Entropy Rough K-Means Method

Authors: P. Ashok, G. M. Kadhar Nawaz

Abstract:

Rough set theory is used to handle uncertainty and incomplete information by applying two accurate sets, Lower approximation and Upper approximation. In this paper, the rough clustering algorithms are improved by adopting the Similarity, Dissimilarity–Similarity and Entropy based initial centroids selection method on three different clustering algorithms namely Entropy based Rough K-Means (ERKM), Similarity based Rough K-Means (SRKM) and Dissimilarity-Similarity based Rough K-Means (DSRKM) were developed and executed by yeast dataset. The rough clustering algorithms are validated by cluster validity indexes namely Rand and Adjusted Rand indexes. An experimental result shows that the ERKM clustering algorithm perform effectively and delivers better results than other clustering methods. Outlier detection is an important task in data mining and very much different from the rest of the objects in the clusters. Entropy based Rough Outlier Factor (EROF) method is seemly to detect outlier effectively for yeast dataset. In rough K-Means method, by tuning the epsilon (ᶓ) value from 0.8 to 1.08 can detect outliers on boundary region and the RKM algorithm delivers better results, when choosing the value of epsilon (ᶓ) in the specified range. An experimental result shows that the EROF method on clustering algorithm performed very well and suitable for detecting outlier effectively for all datasets. Further, experimental readings show that the ERKM clustering method outperformed the other methods.

Keywords: Clustering, Entropy, outlier, Rough K-Means, validity index

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 970
20 Thermodynamic Approach of Lanthanide-Iron Double Oxides Formation

Authors: VERA VARAZASHVILI, Murman Tsarakhov, Tamar Mirianashvili, Teimuraz Pavlenishvili, Tengiz Machaladze, Mzia Khundadze

Abstract:

Standard Gibbs energy of formation ΔGfor(298.15) of lanthanide-iron double oxides of garnet-type crystal structure R3Fe5O12 - RIG (R – are rare earth ions) from initial oxides are evaluated. The calculation is based on the data of standard entropies S298.15 and standard enthalpies ΔH298.15 of formation of compounds which are involved in the process of garnets synthesis. Gibbs energy of formation is presented as temperature function ΔGfor(T) for the range 300-1600K. The necessary starting thermodynamic data were obtained from calorimetric study of heat capacity – temperature functions and by using the semi-empirical method for calculation of ΔH298.15 of formation. Thermodynamic functions for standard temperature – enthalpy, entropy and Gibbs energy - are recommended as reference data for technological evaluations. Through the structural series of rare earth-iron garnets the correlation between thermodynamic properties and characteristics of lanthanide ions are elucidated.

Keywords: Entropy, Calorimetry, Enthalpy, Heat Capacity, rare earth iron garnets, gibbs energy of formation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1494
19 Information Theoretical Analysis of Neural Spiking Activity with Temperature Modulation

Authors: Young-Seok Choi

Abstract:

This work assesses the cortical and the sub-cortical neural activity recorded from rodents using entropy and mutual information based approaches to study how hypothermia affects neural activity. By applying the multi-scale entropy and Shannon entropy, we quantify the degree of the regularity embedded in the cortical and sub-cortical neurons and characterize the dependency of entropy of these regions on temperature. We study also the degree of the mutual information on thalamocortical pathway depending on temperature. The latter is most likely an indicator of coupling between these highly connected structures in response to temperature manipulation leading to arousal after global cerebral ischemia.

Keywords: Entropy, mutual information, Spiking activity, temperature modulation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1299
18 Assessing Complexity of Neuronal Multiunit Activity by Information Theoretic Measure

Authors: Young-Seok Choi

Abstract:

This paper provides a quantitative measure of the time-varying multiunit neuronal spiking activity using an entropy based approach. To verify the status embedded in the neuronal activity of a population of neurons, the discrete wavelet transform (DWT) is used to isolate the inherent spiking activity of MUA. Due to the de-correlating property of DWT, the spiking activity would be preserved while reducing the non-spiking component. By evaluating the entropy of the wavelet coefficients of the de-noised MUA, a multiresolution Shannon entropy (MRSE) of the MUA signal is developed. The proposed entropy was tested in the analysis of both simulated noisy MUA and actual MUA recorded from cortex in rodent model. Simulation and experimental results demonstrate that the dynamics of a population can be quantified by using the proposed entropy.

Keywords: Entropy, discrete wavelet transform, multiresolution, Multiunit activity

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1459
17 Frequent Itemset Mining Using Rough-Sets

Authors: Usman Qamar, Younus Javed

Abstract:

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and roughsets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.

Keywords: classification, Entropy, Feature selection, outliers, rough-sets, frequent itemset mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2067
16 Entropy Generation Analysis of Heat Recovery Vapor Generator for Ammonia-Water Mixture

Authors: Kyoung Hoon Kim, Chul Ho Han

Abstract:

This paper carries out a performance analysis based on the first and second laws of thermodynamics for heat recovery vapor generator (HRVG) of ammonia-water mixture when the heat source is low-temperature energy in the form of sensible heat. In the analysis, effects of the ammonia mass concentration and mass flow ratio of the binary mixture are investigated on the system performance including the effectiveness of heat transfer, entropy generation, and exergy efficiency. The results show that the ammonia concentration and the mass flow ratio of the mixture have significant effects on the system performance of HRVG.

Keywords: Exergy, Entropy, ammonia-water mixture, heat exchanger

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1738
15 Genetic Algorithm for In-Theatre Military Logistics Search-and-Delivery Path Planning

Authors: Jean Berger, Mohamed Barkaoui

Abstract:

Discrete search path planning in time-constrained uncertain environment relying upon imperfect sensors is known to be hard, and current problem-solving techniques proposed so far to compute near real-time efficient path plans are mainly bounded to provide a few move solutions. A new information-theoretic –based open-loop decision model explicitly incorporating false alarm sensor readings, to solve a single agent military logistics search-and-delivery path planning problem with anticipated feedback is presented. The decision model consists in minimizing expected entropy considering anticipated possible observation outcomes over a given time horizon. The model captures uncertainty associated with observation events for all possible scenarios. Entropy represents a measure of uncertainty about the searched target location. Feedback information resulting from possible sensor observations outcomes along the projected path plan is exploited to update anticipated unit target occupancy beliefs. For the first time, a compact belief update formulation is generalized to explicitly include false positive observation events that may occur during plan execution. A novel genetic algorithm is then proposed to efficiently solve search path planning, providing near-optimal solutions for practical realistic problem instances. Given the run-time performance of the algorithm, natural extension to a closed-loop environment to progressively integrate real visit outcomes on a rolling time horizon can be easily envisioned. Computational results show the value of the approach in comparison to alternate heuristics.

Keywords: Entropy, Genetic Algorithm, search path planning, false alarm, search-and-delivery

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1581
14 Intuitive Robot Control Using Surface EMG and Accelerometer Signals

Authors: Hyun-Chool Shin, Kiwon Rhee, Kyung-Jin You

Abstract:

This paper proposes a method of remotely controlling robots with arm gestures using surface electromyography (EMG) and accelerometer sensors attached to the operator’s wrists. The EMG and accelerometer sensors receive signals from the arm gestures of the operator and infer the corresponding movements to execute the command to control the robot. The movements of the robot include moving forward and backward and turning left and right. The accuracy is over 99% and movements can be controlled in real time.

Keywords: Entropy, EMG, accelerometer, k-NN

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1498
13 An Advanced Time-Frequency Domain Method for PD Extraction with Non-Intrusive Measurement

Authors: Guomin Luo, Daming Zhang, Yong Kwee Koh, Kim Teck Ng, Helmi Kurniawan, Weng Hoe Leong

Abstract:

Partial discharge (PD) detection is an important method to evaluate the insulation condition of metal-clad apparatus. Non-intrusive sensors which are easy to install and have no interruptions on operation are preferred in onsite PD detection. However, it often lacks of accuracy due to the interferences in PD signals. In this paper a novel PD extraction method that uses frequency analysis and entropy based time-frequency (TF) analysis is introduced. The repetitive pulses from convertor are first removed via frequency analysis. Then, the relative entropy and relative peak-frequency of each pulse (i.e. time-indexed vector TF spectrum) are calculated and all pulses with similar parameters are grouped. According to the characteristics of non-intrusive sensor and the frequency distribution of PDs, the pulses of PD and interferences are separated. Finally the PD signal and interferences are recovered via inverse TF transform. The de-noised result of noisy PD data demonstrates that the combination of frequency and time-frequency techniques can discriminate PDs from interferences with various frequency distributions.

Keywords: Entropy, Time-Frequency Analysis, Fourier Analysis, partial discharge, non-intrusive measurement

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1136
12 Applications of Entropy Measures in Field of Queuing Theory

Authors: R.K.Tuli

Abstract:

In the present communication, we have studied different variations in the entropy measures in the different states of queueing processes. In case of steady state queuing process, it has been shown that as the arrival rate increases, the uncertainty increases whereas in the case of non-steady birth-death process, it is shown that the uncertainty varies differently. In this pattern, it first increases and attains its maximum value and then with the passage of time, it decreases and attains its minimum value.

Keywords: Entropy, steady state, Birth-death process, M/G/1 system, G/M/1system, Non-steady state

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1230
11 Information Measures Based on Sampling Distributions

Authors: c. p. Gandhi, Om Parkash, A. K. Thukral

Abstract:

Information theory and Statistics play an important role in Biological Sciences when we use information measures for the study of diversity and equitability. In this communication, we develop the link among the three disciplines and prove that sampling distributions can be used to develop new information measures. Our study will be an interdisciplinary and will find its applications in Biological systems.

Keywords: Diversity, Entropy, Symmetry‎, concavity, arithmetic mean, equitability

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1043
10 Selection and Exergy Analysis of Fuel Cell System to Meet all Energy Needs of Residential Buildings

Authors: N.Hedayat, G.R. Ashari, S. Shalbaf, E.Hajidavalloo

Abstract:

In this paper a polymer electrolyte membrane (PEM) fuel cell power system including burner, steam reformer, heat exchanger and water heater has been considered to meet the electrical, heating, cooling and domestic hot water loads of residential building which in Tehran. The system uses natural gas as fuel and works in CHP mode. Design and operating conditions of a PEM fuel cell system is considered in this study. The energy requirements of residential building and the number of fuel cell stacks to meet them have been estimated. The method involved exergy analysis and entropy generation thorough the months of the year. Results show that all the energy needs of the building can be met with 12 fuel cell stacks at a nominal capacity of 8.5 kW. Exergy analysis of the CHP system shows that the increase in the ambient air temperature from 1oC to 40oC, will have an increase of entropy generation by 5.73%.Maximum entropy generates for 15 hour in 15th of June and 15th of July is estimated to amount at 12624 (kW/K). Entropy generation of this system through a year is estimated to amount to 1004.54 GJ/k.year.

Keywords: Exergy, Entropy, CHP mode, no of fuel cell stacks

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1554
9 MIM: A Species Independent Approach for Classifying Coding and Non-Coding DNA Sequences in Bacterial and Archaeal Genomes

Authors: John R. Rose, Achraf El Allali

Abstract:

A number of competing methodologies have been developed to identify genes and classify DNA sequences into coding and non-coding sequences. This classification process is fundamental in gene finding and gene annotation tools and is one of the most challenging tasks in bioinformatics and computational biology. An information theory measure based on mutual information has shown good accuracy in classifying DNA sequences into coding and noncoding. In this paper we describe a species independent iterative approach that distinguishes coding from non-coding sequences using the mutual information measure (MIM). A set of sixty prokaryotes is used to extract universal training data. To facilitate comparisons with the published results of other researchers, a test set of 51 bacterial and archaeal genomes was used to evaluate MIM. These results demonstrate that MIM produces superior results while remaining species independent.

Keywords: Entropy, mutual information, Coding Non-coding Classification, GeneRecognition

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1343
8 Applications of Trigonometic Measures of Fuzzy Entropy to Geometry

Authors: Om Parkash, C.P.Gandhi

Abstract:

In the literature of fuzzy measures, there exist many well known parametric and non-parametric measures, each with its own merits and limitations. But our main emphasis is on applications of these measures to a variety of disciplines. To extend the scope of applications of these fuzzy measures to geometry, we need some special fuzzy measures. In this communication, we have introduced two new fuzzy measures involving trigonometric functions and simultaneously provided their applications to obtain the basic results already existing in the literature of geometry.

Keywords: Uncertainty, Entropy, Symmetry‎, fuzzy entropy, concavity

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1133
7 Thermodynamic Study for Aggregation Behavior of Hydrotropic Solution

Authors: Jigisha Parikh, Meghal Desai

Abstract:

Aggregation behavior of sodium salicylate and sodium cumene sulfonate was studied in aqueous solution at different temperature. Specific conductivity and relative viscosity were measured at different temperature to find minimum hydrotropic concentration. The thermodynamic parameters (free energy, enthalpy and entropy) were evaluated in the temperature range of 30°C-70°C. The free energy decreased with increase in temperature. The aggregation was found to be exothermic in nature and favored by positive value of entropy.

Keywords: Entropy, Free Energy, Enthalpy, Hydrotropes, Minimum Hydrotropic Concentration

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1365
6 Clustering Categorical Data Using Hierarchies (CLUCDUH)

Authors: Gökhan Silahtaroğlu

Abstract:

Clustering large populations is an important problem when the data contain noise and different shapes. A good clustering algorithm or approach should be efficient enough to detect clusters sensitively. Besides space complexity, time complexity also gains importance as the size grows. Using hierarchies we developed a new algorithm to split attributes according to the values they have and choosing the dimension for splitting so as to divide the database roughly into equal parts as much as possible. At each node we calculate some certain descriptive statistical features of the data which reside and by pruning we generate the natural clusters with a complexity of O(n).

Keywords: Clustering, Entropy, tree, pruning, split, gini

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1116
5 An Attribute-Centre Based Decision Tree Classification Algorithm

Authors: Gökhan Silahtaroğlu

Abstract:

Decision tree algorithms have very important place at classification model of data mining. In literature, algorithms use entropy concept or gini index to form the tree. The shape of the classes and their closeness to each other some of the factors that affect the performance of the algorithm. In this paper we introduce a new decision tree algorithm which employs data (attribute) folding method and variation of the class variables over the branches to be created. A comparative performance analysis has been held between the proposed algorithm and C4.5.

Keywords: classification, Entropy, Decision Tree, pruning, split, gini

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1062
4 Entropy Based Data Hiding for Document Images

Authors: Sridhar G., Sridhar V., Swetha Kurup

Abstract:

In this paper we present a novel technique for data hiding in binary document images. We use the concept of entropy in order to identify document specific least distortive areas throughout the binary document image. The document image is treated as any other image and the proposed method utilizes the standard document characteristics for the embedding process. Proposed method minimizes perceptual distortion due to embedding and allows watermark extraction without the requirement of any side information at the decoder end.

Keywords: Watermarking, Entropy, Steganography

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1195
3 A File Splitting Technique for Reducing the Entropy of Text Files

Authors: Abdel-Rahman M. Jaradat, Mansour I. Irshid, Talha T. Nassar

Abstract:

A novel file splitting technique for the reduction of the nth-order entropy of text files is proposed. The technique is based on mapping the original text file into a non-ASCII binary file using a new codeword assignment method and then the resulting binary file is split into several subfiles each contains one or more bits from each codeword of the mapped binary file. The statistical properties of the subfiles are studied and it is found that they reflect the statistical properties of the original text file which is not the case when the ASCII code is used as a mapper. The nth-order entropy of these subfiles are determined and it is found that the sum of their entropies is less than that of the original text file for the same values of extensions. These interesting statistical properties of the resulting subfiles can be used to achieve better compression ratios when conventional compression techniques are applied to these subfiles individually and on a bit-wise basis rather than on character-wise basis.

Keywords: Entropy, Bit-wise compression, file splitting, source mapping

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1133
2 An Efficient Segmentation Method Based on Local Entropy Characteristics of Iris Biometrics

Authors: Amir Sepasi Zahmati, Ali Asghar Beheshti Shirazi, Ali Shojaee Bakhtiari

Abstract:

An efficient iris segmentation method based on analyzing the local entropy characteristic of the iris image, is proposed in this paper and the strength and weaknesses of the method are analyzed for practical purposes. The method shows special strength in providing designers with an adequate degree of freedom in choosing the proper sections of the iris for their application purposes.

Keywords: Entropy, Iris segmentation, biocryptosystem, biometric identification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1108
1 Computing Entropy for Ortholog Detection

Authors: Hsing-Kuo Pao, John Case

Abstract:

Biological sequences from different species are called or-thologs if they evolved from a sequence of a common ancestor species and they have the same biological function. Approximations of Kolmogorov complexity or entropy of biological sequences are already well known to be useful in extracting similarity information between such sequences -in the interest, for example, of ortholog detection. As is well known, the exact Kolmogorov complexity is not algorithmically computable. In prac-tice one can approximate it by computable compression methods. How-ever, such compression methods do not provide a good approximation to Kolmogorov complexity for short sequences. Herein is suggested a new ap-proach to overcome the problem that compression approximations may notwork well on short sequences. This approach is inspired by new, conditional computations of Kolmogorov entropy. A main contribution of the empir-ical work described shows the new set of entropy-based machine learning attributes provides good separation between positive (ortholog) and nega-tive (non-ortholog) data - better than with good, previously known alter-natives (which do not employ some means to handle short sequences well).Also empirically compared are the new entropy based attribute set and a number of other, more standard similarity attributes sets commonly used in genomic analysis. The various similarity attributes are evaluated by cross validation, through boosted decision tree induction C5.0, and by Receiver Operating Characteristic (ROC) analysis. The results point to the conclu-sion: the new, entropy based attribute set by itself is not the one giving the best prediction; however, it is the best attribute set for use in improving the other, standard attribute sets when conjoined with them.

Keywords: Entropy, Compression, Decision Tree, ROC, ortholog

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1465