Search results for: similarity measures
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1192

Search results for: similarity measures

1132 Optimal Model Order Selection for Transient Error Autoregressive Moving Average (TERA) MRI Reconstruction Method

Authors: Abiodun M. Aibinu, Athaur Rahman Najeeb, Momoh J. E. Salami, Amir A. Shafie

Abstract:

An alternative approach to the use of Discrete Fourier Transform (DFT) for Magnetic Resonance Imaging (MRI) reconstruction is the use of parametric modeling technique. This method is suitable for problems in which the image can be modeled by explicit known source functions with a few adjustable parameters. Despite the success reported in the use of modeling technique as an alternative MRI reconstruction technique, two important problems constitutes challenges to the applicability of this method, these are estimation of Model order and model coefficient determination. In this paper, five of the suggested method of evaluating the model order have been evaluated, these are: The Final Prediction Error (FPE), Akaike Information Criterion (AIC), Residual Variance (RV), Minimum Description Length (MDL) and Hannan and Quinn (HNQ) criterion. These criteria were evaluated on MRI data sets based on the method of Transient Error Reconstruction Algorithm (TERA). The result for each criterion is compared to result obtained by the use of a fixed order technique and three measures of similarity were evaluated. Result obtained shows that the use of MDL gives the highest measure of similarity to that use by a fixed order technique.

Keywords: Autoregressive Moving Average (ARMA), MagneticResonance Imaging (MRI), Parametric modeling, Transient Error.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1563
1131 Recursive Similarity Hashing of Fractal Geometry

Authors: Timothee G. Leleu

Abstract:

A new technique of topological multi-scale analysis is introduced. By performing a clustering recursively to build a hierarchy, and analyzing the co-scale and intra-scale similarities, an Iterated Function System can be extracted from any data set. The study of fractals shows that this method is efficient to extract self-similarities, and can find elegant solutions the inverse problem of building fractals. The theoretical aspects and practical implementations are discussed, together with examples of analyses of simple fractals.

Keywords: hierarchical clustering, multi-scale analysis, Similarity hashing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1805
1130 Towards Clustering of Web-based Document Structures

Authors: Matthias Dehmer, Frank Emmert Streib, Jürgen Kilian, Andreas Zulauf

Abstract:

Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g., improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.

Keywords: Clustering methods, graph-based patterns, graph similarity, hypertext structures, web structure mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1454
1129 Incorporating Semantic Similarity Measure in Genetic Algorithm : An Approach for Searching the Gene Ontology Terms

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias, Hany T. Alashwal, Rohayanti Hassan, FarhanMohamed

Abstract:

The most important property of the Gene Ontology is the terms. These control vocabularies are defined to provide consistent descriptions of gene products that are shareable and computationally accessible by humans, software agent, or other machine-readable meta-data. Each term is associated with information such as definition, synonyms, database references, amino acid sequences, and relationships to other terms. This information has made the Gene Ontology broadly applied in microarray and proteomic analysis. However, the process of searching the terms is still carried out using traditional approach which is based on keyword matching. The weaknesses of this approach are: ignoring semantic relationships between terms, and highly depending on a specialist to find similar terms. Therefore, this study combines semantic similarity measure and genetic algorithm to perform a better retrieval process for searching semantically similar terms. The semantic similarity measure is used to compute similitude strength between two terms. Then, the genetic algorithm is employed to perform batch retrievals and to handle the situation of the large search space of the Gene Ontology graph. The computational results are presented to show the effectiveness of the proposed algorithm.

Keywords: Gene Ontology, Semantic similarity measure, Genetic algorithm, Ontology search

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1435
1128 Effective Keyword and Similarity Thresholds for the Discovery of Themes from the User Web Access Patterns

Authors: Haider A Ramadhan, Khalil Shihab

Abstract:

Clustering techniques have been used by many intelligent software agents to group similar access patterns of the Web users into high level themes which express users intentions and interests. However, such techniques have been mostly focusing on one salient feature of the Web document visited by the user, namely the extracted keywords. The major aim of these techniques is to come up with an optimal threshold for the number of keywords needed to produce more focused themes. In this paper we focus on both keyword and similarity thresholds to generate themes with concentrated themes, and hence build a more sound model of the user behavior. The purpose of this paper is two fold: use distance based clustering methods to recognize overall themes from the Proxy log file, and suggest an efficient cut off levels for the keyword and similarity thresholds which tend to produce more optimal clusters with better focus and efficient size.

Keywords: Data mining, knowledge discovery, clustering, dataanalysis, Web log analysis, theme based searching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1403
1127 Using Genetic Algorithm to Improve Information Retrieval Systems

Authors: Ahmed A. A. Radwan, Bahgat A. Abdel Latef, Abdel Mgeid A. Ali, Osman A. Sadek

Abstract:

This study investigates the use of genetic algorithms in information retrieval. The method is shown to be applicable to three well-known documents collections, where more relevant documents are presented to users in the genetic modification. In this paper we present a new fitness function for approximate information retrieval which is very fast and very flexible, than cosine similarity fitness function.

Keywords: Cosine similarity, Fitness function, Genetic Algorithm, Information Retrieval, Query learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2693
1126 Generalized Measures of Fuzzy Entropy and their Properties

Authors: K.C. Deshmukh, P.G. Khot, Nikhil

Abstract:

In the present communication, we have proposed some new generalized measure of fuzzy entropy based upon real parameters, discussed their and desirable properties, and presented these measures graphically. An important property, that is, monotonicity of the proposed measures has also been studied.

Keywords: Fuzzy numbers, Fuzzy entropy, Characteristicfunction, Crisp set, Monotonicity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1417
1125 Non-Overlapping Hierarchical Index Structure for Similarity Search

Authors: Mounira Taileb, Sid Lamrous, Sami Touati

Abstract:

In order to accelerate the similarity search in highdimensional database, we propose a new hierarchical indexing method. It is composed of offline and online phases. Our contribution concerns both phases. In the offline phase, after gathering the whole of the data in clusters and constructing a hierarchical index, the main originality of our contribution consists to develop a method to construct bounding forms of clusters to avoid overlapping. For the online phase, our idea improves considerably performances of similarity search. However, for this second phase, we have also developed an adapted search algorithm. Our method baptized NOHIS (Non-Overlapping Hierarchical Index Structure) use the Principal Direction Divisive Partitioning (PDDP) as algorithm of clustering. The principle of the PDDP is to divide data recursively into two sub-clusters; division is done by using the hyper-plane orthogonal to the principal direction derived from the covariance matrix and passing through the centroid of the cluster to divide. Data of each two sub-clusters obtained are including by a minimum bounding rectangle (MBR). The two MBRs are directed according to the principal direction. Consequently, the nonoverlapping between the two forms is assured. Experiments use databases containing image descriptors. Results show that the proposed method outperforms sequential scan and SRtree in processing k-nearest neighbors.

Keywords: K-nearest neighbour search, multi-dimensional indexing, multimedia databases, similarity search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1519
1124 3D Objects Indexing Using Spherical Harmonic for Optimum Measurement Similarity

Authors: S. Hellam, Y. Oulahrir, F. El Mounchid, A. Sadiq, S. Mbarki

Abstract:

In this paper, we propose a method for three-dimensional (3-D)-model indexing based on defining a new descriptor, which we call new descriptor using spherical harmonics. The purpose of the method is to minimize, the processing time on the database of objects models and the searching time of similar objects to request object. Firstly we start by defining the new descriptor using a new division of 3-D object in a sphere. Then we define a new distance which will be used in the search for similar objects in the database.

Keywords: 3D indexation, spherical harmonic, similarity of 3D objects.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2173
1123 Medical Image Fusion Based On Redundant Wavelet Transform and Morphological Processing

Authors: P. S. Gomathi, B. Kalaavathi

Abstract:

The process in which the complementary information from multiple images is integrated to provide composite image that contains more information than the original input images is called image fusion. Medical image fusion provides useful information from multimodality medical images that provides additional information to the doctor for diagnosis of diseases in a better way. This paper represents the wavelet based medical image fusion algorithm on different multimodality medical images. In order to fuse the medical images, images are decomposed using Redundant Wavelet Transform (RWT). The high frequency coefficients are convolved with morphological operator followed by the maximum-selection (MS) rule. The low frequency coefficients are processed by MS rule. The reconstructed image is obtained by inverse RWT. The quantitative measures which includes Mean, Standard Deviation, Average Gradient, Spatial frequency, Edge based Similarity Measures are considered for evaluating the fused images. The performance of this proposed method is compared with Pixel averaging, PCA, and DWT fusion methods. When compared with conventional methods, the proposed framework provides better performance for analysis of multimodality medical images.

Keywords: Discrete Wavelet Transform (DWT), Image Fusion, Morphological Processing, Redundant Wavelet Transform (RWT).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2090
1122 Improved Weighted Matching for Speaker Recognition

Authors: Ozan Mut, Mehmet Göktürk

Abstract:

Matching algorithms have significant importance in speaker recognition. Feature vectors of the unknown utterance are compared to feature vectors of the modeled speakers as a last step in speaker recognition. A similarity score is found for every model in the speaker database. Depending on the type of speaker recognition, these scores are used to determine the author of unknown speech samples. For speaker verification, similarity score is tested against a predefined threshold and either acceptance or rejection result is obtained. In the case of speaker identification, the result depends on whether the identification is open set or closed set. In closed set identification, the model that yields the best similarity score is accepted. In open set identification, the best score is tested against a threshold, so there is one more possible output satisfying the condition that the speaker is not one of the registered speakers in existing database. This paper focuses on closed set speaker identification using a modified version of a well known matching algorithm. The results of new matching algorithm indicated better performance on YOHO international speaker recognition database.

Keywords: Automatic Speaker Recognition, Voice Recognition, Pattern Recognition, Digital Audio Signal Processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1685
1121 Optimal Measures in Production Developing an Universal Decision Supporter for Evaluating Measures in a Production

Authors: Michael Grigutsch, Marco Kennemann, Peter Nyhuis

Abstract:

Due to the recovering global economy, enterprises are increasingly focusing on logistics. Investing in logistic measures for a production generates a large potential for achieving a good starting point within a competitive field. Unlike during the global economic crisis, enterprises are now challenged with investing available capital to maximize profits. In order to be able to create an informed and quantifiably comprehensible basis for a decision, enterprises need an adequate model for logistically and monetarily evaluating measures in production. The Collaborate Research Centre 489 (SFB 489) at the Institute for Production Systems (IFA) developed a Logistic Information System which provides support in making decisions and is designed specifically for the forging industry. The aim of a project that has been applied for is to now transfer this process in order to develop a universal approach to logistically and monetarily evaluate measures in production.

Keywords: Measures in Production, Logistic Operating Curves, Transfer Functions, Production Logistics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1432
1120 Product Configuration Strategy Based On Product Family Similarity

Authors: Heejung Lee

Abstract:

To offer a large variety of products while maintaining low costs, high speed, and high quality in a mass customization product development environment, platform based product development has much benefit and usefulness in many industry fields. This paper proposes a product configuration strategy by similarity measure, incorporating the knowledge engineering principles such as product information model, ontology engineering, and formal concept analysis.

Keywords: Platform, product family, ontology, formal concept analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1736
1119 Flow and Heat Transfer of a Nanofluid over a Shrinking Sheet

Authors: N. Bachok, N. L. Aleng, N. M. Arifin, A. Ishak, N. Senu

Abstract:

The problem of laminar fluid flow which results from the shrinking of a permeable surface in a nanofluid has been investigated numerically. The model used for the nanofluid incorporates the effects of Brownian motion and thermophoresis. A similarity solution is presented which depends on the mass suction parameter S, Prandtl number Pr, Lewis number Le, Brownian motion number Nb and thermophoresis number Nt. It was found that the reduced Nusselt number is decreasing function of each dimensionless number.

Keywords: Boundary layer, Nanofluid, Shrinking sheet, Brownian motion, Thermophoresis, Similarity solution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2743
1118 Accuracy of Divergence Measures for Detection of Abrupt Changes

Authors: P. Bergl

Abstract:

Numerous divergence measures (spectral distance, cepstral distance, difference of the cepstral coefficients, Kullback-Leibler divergence, distance given by the General Likelihood Ratio, distance defined by the Recursive Bayesian Changepoint Detector and the Mahalanobis measure) are compared in this study. The measures are used for detection of abrupt spectral changes in synthetic AR signals via the sliding window algorithm. Two experiments are performed; the first is focused on detection of single boundary while the second concentrates on detection of a couple of boundaries. Accuracy of detection is judged for each method; the measures are compared according to results of both experiments.

Keywords: Abrupt changes detection, autoregressive model, divergence measure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1394
1117 Actionable Rules: Issues and New Directions

Authors: Harleen Kaur

Abstract:

Knowledge Discovery in Databases (KDD) is the process of extracting previously unknown, hidden and interesting patterns from a huge amount of data stored in databases. Data mining is a stage of the KDD process that aims at selecting and applying a particular data mining algorithm to extract an interesting and useful knowledge. It is highly expected that data mining methods will find interesting patterns according to some measures, from databases. It is of vital importance to define good measures of interestingness that would allow the system to discover only the useful patterns. Measures of interestingness are divided into objective and subjective measures. Objective measures are those that depend only on the structure of a pattern and which can be quantified by using statistical methods. While, subjective measures depend only on the subjectivity and understandability of the user who examine the patterns. These subjective measures are further divided into actionable, unexpected and novel. The key issues that faces data mining community is how to make actions on the basis of discovered knowledge. For a pattern to be actionable, the user subjectivity is captured by providing his/her background knowledge about domain. Here, we consider the actionability of the discovered knowledge as a measure of interestingness and raise important issues which need to be addressed to discover actionable knowledge.

Keywords: Data Mining Community, Knowledge Discovery inDatabases (KDD), Interestingness, Subjective Measures, Actionability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892
1116 Mixed Convection Boundary Layer Flows Induced by a Permeable Continuous Surface Stretched with Prescribed Skin Friction

Authors: Mohamed Ali

Abstract:

The boundary layer flow and heat transfer on a stretched surface moving with prescribed skin friction is studied for permeable surface. The surface temperature is assumed to vary inversely with the vertical direction x for n = -1. The skin friction at the surface scales as (x-1/2) at m = 0. The constants m and n are the indices of the power law velocity and temperature exponent respectively. Similarity solutions are obtained for the boundary layer equations subject to power law temperature and velocity variation. The effect of various governing parameters, such as the buoyancy parameter λ and the suction/injection parameter fw for air (Pr = 0.72) are studied. The choice of n and m ensures that the used similarity solutions are x independent. The results show that, assisting flow (λ > 0) enhancing the heat transfer coefficient along the surface for any constant value of fw. Furthermore, injection increases the heat transfer coefficient but suction reduces it at constant λ.

Keywords: Stretching surface, Boundary layers, Prescribed skin friction, Suction or injection, similarity solutions, buoyancy effects.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1798
1115 A Simplified and Effective Algorithm Used to Mine Similar Processes: An Illustrated Example

Authors: Min-Hsun Kuo, Yun-Shiow Chen

Abstract:

The running logs of a process hold valuable information about its executed activity behavior and generated activity logic structure. Theses informative logs can be extracted, analyzed and utilized to improve the efficiencies of the process's execution and conduction. One of the techniques used to accomplish the process improvement is called as process mining. To mine similar processes is such an improvement mission in process mining. Rather than directly mining similar processes using a single comparing coefficient or a complicate fitness function, this paper presents a simplified heuristic process mining algorithm with two similarity comparisons that are able to relatively conform the activity logic sequences (traces) of mining processes with those of a normalized (regularized) one. The relative process conformance is to find which of the mining processes match the required activity sequences and relationships, further for necessary and sufficient applications of the mined processes to process improvements. One similarity presented is defined by the relationships in terms of the number of similar activity sequences existing in different processes; another similarity expresses the degree of the similar (identical) activity sequences among the conforming processes. Since these two similarities are with respect to certain typical behavior (activity sequences) occurred in an entire process, the common problems, such as the inappropriateness of an absolute comparison and the incapability of an intrinsic information elicitation, which are often appeared in other process conforming techniques, can be solved by the relative process comparison presented in this paper. To demonstrate the potentiality of the proposed algorithm, a numerical example is illustrated.

Keywords: process mining, process similarity, artificial intelligence, process conformance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1398
1114 Information Measures Based on Sampling Distributions

Authors: Om Parkash, A. K. Thukral, C. P. Gandhi

Abstract:

Information theory and Statistics play an important role in Biological Sciences when we use information measures for the study of diversity and equitability. In this communication, we develop the link among the three disciplines and prove that sampling distributions can be used to develop new information measures. Our study will be an interdisciplinary and will find its applications in Biological systems.

Keywords: Entropy, concavity, symmetry, arithmetic mean, diversity, equitability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1335
1113 Codes and Formulation of Appropriate Constraints via Entropy Measures

Authors: R. K. Tuli

Abstract:

In present communication, we have developed the suitable constraints for the given the mean codeword length and the measures of entropy. This development has proved that Renyi-s entropy gives the minimum value of the log of the harmonic mean and the log of power mean. We have also developed an important relation between best 1:1 code and the uniquely decipherable code by using different measures of entropy.

Keywords: Codeword, Instantaneous code, Prefix code, Uniquely decipherable code, Best one-one code, Mean codewordlength

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1238
1112 Computing the Similarity and the Diversity in the Species Based on Cronobacter Genome

Authors: E. Al Daoud

Abstract:

The purpose of computing the similarity and the diversity in the species is to trace the process of evolution and to find the relationship between the species and discover the unique, the special, the common and the universal proteins. The proteins of the whole genome of 40 species are compared with the cronobacter genome which is used as reference genome. More than 3 billion pairwise alignments are performed using blastp. Several findings are introduced in this study, for example, we found 172 proteins in cronobacter genome which have insignificant hits in other species, 116 significant proteins in the all tested species with very high score value and 129 common proteins in the plants but have insignificant hits in mammals, birds, fishes, and insects.

Keywords: Genome, species, blastp, conserved genes, cronobacter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 951
1111 Ranking of Performance Measures of GSCM towards Sustainability: Using Analytic Hierarchy Process

Authors: Dixit Garg, S. Luthra, A. Haleem

Abstract:

During recent years, the natural environment has become a challenging topic that business organizations must consider due to the economic and ecological impacts and increasing awareness of environment protection among society. Organizations are trying to achieve the goals of improvement in environment, low cost, high quality, flexibility and more customer satisfaction. Performance measurement frameworks are very useful to monitor the performance of any organization. The basic goal of this paper is to identify performance measures and ranking of these performance measures of GSCM performance measurement towards sustainability framework. Five perspectives (Environment, Economic, Social, Operational and Cost performances) and nineteen performance measures of GSCM performance towards sustainability have been have been identified from extensive literature review. Analytical Hierarchy Process (AHP) technique has been utilized for ranking of these performance perspectives and measures. All pair comparisons in AHP have been made on the basis on the experts’ opinions (selected from academia and industry). Ranking of these performance perspectives and measures will help to understand the importance of environmental, economic, social, operational performances and cost performances in the supply chain.

Keywords: Analytical Hierarchy Process (AHP), Green Supply Chain Management, Performance Measures (PM), Sustainability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3180
1110 Group Similarity Transformation of a Time Dependent Chemical Convective Process

Authors: M. M. Kassem, A. S. Rashed

Abstract:

The time dependent progress of a chemical reaction over a flat horizontal plate is here considered. The problem is solved through the group similarity transformation method which reduces the number of independent by one and leads to a set of nonlinear ordinary differential equation. The problem shows a singularity at the chemical reaction order n=1 and is analytically solved through the perturbation method. The behavior of the process is then numerically investigated for n≠1 and different Schmidt numbers. Graphical results for the velocity and concentration of chemicals based on the analytical and numerical solutions are presented and discussed.

Keywords: Time dependent, chemical convection, grouptransformation method, perturbation method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1568
1109 Upgraded Rough Clustering and Outlier Detection Method on Yeast Dataset by Entropy Rough K-Means Method

Authors: P. Ashok, G. M. Kadhar Nawaz

Abstract:

Rough set theory is used to handle uncertainty and incomplete information by applying two accurate sets, Lower approximation and Upper approximation. In this paper, the rough clustering algorithms are improved by adopting the Similarity, Dissimilarity–Similarity and Entropy based initial centroids selection method on three different clustering algorithms namely Entropy based Rough K-Means (ERKM), Similarity based Rough K-Means (SRKM) and Dissimilarity-Similarity based Rough K-Means (DSRKM) were developed and executed by yeast dataset. The rough clustering algorithms are validated by cluster validity indexes namely Rand and Adjusted Rand indexes. An experimental result shows that the ERKM clustering algorithm perform effectively and delivers better results than other clustering methods. Outlier detection is an important task in data mining and very much different from the rest of the objects in the clusters. Entropy based Rough Outlier Factor (EROF) method is seemly to detect outlier effectively for yeast dataset. In rough K-Means method, by tuning the epsilon (ᶓ) value from 0.8 to 1.08 can detect outliers on boundary region and the RKM algorithm delivers better results, when choosing the value of epsilon (ᶓ) in the specified range. An experimental result shows that the EROF method on clustering algorithm performed very well and suitable for detecting outlier effectively for all datasets. Further, experimental readings show that the ERKM clustering method outperformed the other methods.

Keywords: Clustering, Entropy, Outlier, Rough K-Means, validity index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1351
1108 Mean Codeword Lengths and Their Correspondence with Entropy Measures

Authors: R.K.Tuli

Abstract:

The objective of the present communication is to develop new genuine exponentiated mean codeword lengths and to study deeply the problem of correspondence between well known measures of entropy and mean codeword lengths. With the help of some standard measures of entropy, we have illustrated such a correspondence. In literature, we usually come across many inequalities which are frequently used in information theory. Keeping this idea in mind, we have developed such inequalities via coding theory approach.

Keywords: Codeword, Code alphabet, Uniquely decipherablecode, Mean codeword length, Uncertainty, Noiseless channel

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1648
1107 Mining News Sites to Create Special Domain News Collections

Authors: David B. Bracewell, Fuji Ren, Shingo Kuroiwa

Abstract:

We present a method to create special domain collections from news sites. The method only requires a single sample article as a seed. No prior corpus statistics are needed and the method is applicable to multiple languages. We examine various similarity measures and the creation of document collections for English and Japanese. The main contributions are as follows. First, the algorithm can build special domain collections from as little as one sample document. Second, unlike other algorithms it does not require a second “general" corpus to compute statistics. Third, in our testing the algorithm outperformed others in creating collections made up of highly relevant articles.

Keywords: Information Retrieval, News, Special DomainCollections,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1431
1106 Genetic Diversity Based Population Study of Freshwater Mud Eel (Monopterus cuchia) in Bangladesh

Authors: M. F. Miah, K. M. A. Zinnah, M. J. Raihan, H. Ali, M. N. Naser

Abstract:

As genetic diversity is most important for existing, breeding and production of any fish; this study was undertaken for investigating genetic diversity of freshwater mud eel, Monopterus cuchia at population level where three ecological populations such as flooded area of Sylhet (P1), open water of Moulvibazar (P2) and open water of Sunamganj (P3) districts of Bangladesh were considered. Four arbitrary RAPD primers (OPB-12, C0-4, B-03 and OPB-08) were screened and RAPD banding patterns were analyzed among the populations considering 15 individuals of each population. In total 174, 138 and 149 bands were detected in the populations of P1, P2 and P3 respectively; however, each primer revealed less number of bands in each population. 100% polymorphic loci were recorded in P2 and P3 whereas only one monomorphic locus was observed in P1, recorded 97.5% polymorphism. Different genetic parameters such as inter-individual pairwise similarity, genetic distance, Nei genetic similarity, linkage distances, cluster analysis and allelic information, etc. were considered for measuring genetic diversity. The average inter-individual pairwise similarity was recorded 2.98, 1.47 and 1.35 in P1, P2 and P3 respectively. Considering genetic distance analysis, the highest distance 1 was recorded in P2 and P3 and the lowest genetic distance 0.444 was found in P2. The average Nei genetic similarity was observed 0.19, 0.16 and 0.13 in P1, P2 and P3, respectively; however, the average linkage distance was recorded 24.92, 17.14 and 15.28 in P1, P3 and P2 respectively. Based on linkage distance, genetic clusters were generated in three populations where 6 clades and 7 clusters were found in P1, 3 clades and 5 clusters were observed in P2 and 4 clades and 7 clusters were detected in P3. In addition, allelic information was observed where the frequency of p and q alleles were observed 0.093 and 0.907 in P1, 0.076 and 0.924 in P2, 0.074 and 0.926 in P3 respectively. The average gene diversity was observed highest in P2 (0.132) followed by P3 (0.131) and P1 (0.121) respectively.

Keywords: Genetic diversity, Monopterus cuchia, population, RAPD, Bangladesh.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1760
1105 A Review on Important Aspects of Information Retrieval

Authors: Yogesh Gupta, Ashish Saini, A.K. Saxena

Abstract:

Information retrieval has become an important field of study and research under computer science due to explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive study, which discusses not only emergence and evolution of information retrieval but also includes different information retrieval models and some important aspects such as document representation, similarity measure and query expansion.

Keywords: Information Retrieval, query expansion, similarity measure, query expansion, vector space model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3268
1104 Applying Similarity Theory and Hilbert Huang Transform for Estimating the Differences of Pig-s Blood Pressure Signals between Situations of Intestinal Artery Blocking and Unblocking

Authors: Jia-Rong Yeh, Tzu-Yu Lin, Jiann-Shing Shieh, Yun Chen

Abstract:

A mammal-s body can be seen as a blood vessel with complex tunnels. When heart pumps blood periodically, blood runs through blood vessels and rebounds from walls of blood vessels. Blood pressure signals can be measured with complex but periodic patterns. When an artery is clamped during a surgical operation, the spectrum of blood pressure signals will be different from that of normal situation. In this investigation, intestinal artery clamping operations were conducted to a pig for simulating the situation of intestinal blocking during a surgical operation. Similarity theory is a convenient and easy tool to prove that patterns of blood pressure signals of intestinal artery blocking and unblocking are surely different. And, the algorithm of Hilbert Huang Transform can be applied to extract the character parameters of blood pressure pattern. In conclusion, the patterns of blood pressure signals of two different situations, intestinal artery blocking and unblocking, can be distinguished by these character parameters defined in this paper.

Keywords: Blood pressure, spectrum, intestinal artery, similarity theory and Hilbert Huang Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1566
1103 Similarity Based Retrieval in Case Based Reasoning for Analysis of Medical Images

Authors: M. Das Gupta, S. Banerjee

Abstract:

Content Based Image Retrieval (CBIR) coupled with Case Based Reasoning (CBR) is a paradigm that is becoming increasingly popular in the diagnosis and therapy planning of medical ailments utilizing the digital content of medical images. This paper presents a survey of some of the promising approaches used in the detection of abnormalities in retina images as well in mammographic screening and detection of regions of interest in MRI scans of the brain. We also describe our proposed algorithm to detect hard exudates in fundus images of the retina of Diabetic Retinopathy patients.

Keywords: Case based reasoning, Exudates, Retina image, Similarity based retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2069