Search results for: informative theoretic similarity metrics.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 715

Search results for: informative theoretic similarity metrics.

565 Application of Scientific Metrics to Evaluate Academic Reputation in Different Research Areas

Authors: Cristiano R. Cervi, Renata Galante, José Palazzo M. de Oliveira

Abstract:

In this paper, we address the problem of identifying academic reputation of researchers using scientific metrics in different research areas. Due to the characteristics of each area, researchers can present different behaviors. In previous work, we define Rep-Index that makes use of a profile template to individually identify the reputation of researchers. The Rep-Index is comprehensive and adaptive because involves hole trajectory of the researcher built throughout his career and can be used in different areas and in different contexts. Now, we compare our metric (Rep-Index) with the h-index and the g-index through experiments with researchers in the fields of Economics, Dentistry and Computer Science. We analyze the trajectory of 830 Brazilian researchers from the National Council of Technological and Scientific Development (CNPq), which receive grants research productivity. The grants are aimed at productivity researchers that stand out among their peers, enhancing their scientific normative criteria established by CNPq. Of the 830 researchers, 210 are in the area of Economics, 216 of Dentistry e 404 of Computer Science. The experiments show that our metric is strongly correlated with h-index, g-index and CNPq ranking. We also show good results for our hypothesis that our metric can be used to evaluate research in several areas. We apply our metric (Rep-Index) to compare the behavior of researchers in relation to their h-index and g-index through extensive experiments. The experiments showed that our metric is strongly correlated with h-index, g-index and CNPq ranking.

Keywords: Researcher reputation, profile model, scientific metrics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1959
564 Palmprint Recognition by Wavelet Transform with Competitive Index and PCA

Authors: Deepti Tamrakar, Pritee Khanna

Abstract:

This manuscript presents, palmprint recognition by combining different texture extraction approaches with high accuracy. The Region of Interest (ROI) is decomposed into different frequencytime sub-bands by wavelet transform up-to two levels and only the approximate image of two levels is selected, which is known as Approximate Image ROI (AIROI). This AIROI has information of principal lines of the palm. The Competitive Index is used as the features of the palmprint, in which six Gabor filters of different orientations convolve with the palmprint image to extract the orientation information from the image. The winner-take-all strategy is used to select dominant orientation for each pixel, which is known as Competitive Index. Further, PCA is applied to select highly uncorrelated Competitive Index features, to reduce the dimensions of the feature vector, and to project the features on Eigen space. The similarity of two palmprints is measured by the Euclidean distance metrics. The algorithm is tested on Hong Kong PolyU palmprint database. Different AIROI of different wavelet filter families are also tested with the Competitive Index and PCA. AIROI of db7 wavelet filter achievs Equal Error Rate (EER) of 0.0152% and Genuine Acceptance Rate (GAR) of 99.67% on the palm database of Hong Kong PolyU.

Keywords: DWT, EER, Euclidean Distance, Gabor filter, PCA, ROI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1702
563 eLearning Tools Evaluation based on Quality Concept Distance Computing. A Case Study

Authors: Mihai Caramihai, Irina Severin

Abstract:

Despite the extensive use of eLearning systems, there is no consensus on a standard framework for evaluating this kind of quality system. Hence, there is only a minimum set of tools that can supervise this judgment and gives information about the course content value. This paper presents two kinds of quality set evaluation indicators for eLearning courses based on the computational process of three known metrics, the Euclidian, Hamming and Levenshtein distances. The “distance" calculus is applied to standard evaluation templates (i.e. the European Commission Programme procedures vs. the AFNOR Z 76-001 Standard), determining a reference point in the evaluation of the e-learning course quality vs. the optimal concept(s). The case study, based on the results of project(s) developed in the framework of the European Programme “Leonardo da Vinci", with Romanian contractors, try to put into evidence the benefits of such a method.

Keywords: eLearning, European programme, metrics, quality evaluation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1484
562 Key Frames Extraction for Sign Language Video Analysis and Recognition

Authors: Jaroslav Polec, Petra Heribanová, Tomáš Hirner

Abstract:

In this paper we proposed a method for finding video frames representing one sign in the finger alphabet. The method is based on determining hands location, segmentation and the use of standard video quality evaluation metrics. Metric calculation is performed only in regions of interest. Sliding mechanism for finding local extrema and adaptive threshold based on local averaging is used for key frames selection. The success rate is evaluated by recall, precision and F1 measure. The method effectiveness is compared with metrics applied to all frames. Proposed method is fast, effective and relatively easy to realize by simple input video preprocessing and subsequent use of tools designed for video quality measuring.

Keywords: Key frame, video, quality, metric, MSE, MSAD, SSIM, VQM, sign language, finger alphabet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1992
561 Environmental Performance Assessment Model as a Sustainability Decision Tool for Small and Middle Sized Enterprises

Authors: Pavol Molnar, Martin Dolinsky

Abstract:

Paper deals with environmental metrics and assessment systems devoted to Small and Medium Sized Enterprises. Authors are presenting proposed assessment model which has an ability to discover current environmental strengths and weaknesses of Small and Middle Sized Enterprise. Suggested model has also an ambition to become a Sustainability Decision Tool. Model is able to identify "best environmental devision" in the company, and to quantify how this decision contributed into overall environmental improvement. Authors understand environmental improvements as environmental innovations (product, process and organizational). Suggested model is based on its own concept; however, authors are also utilizing already existing environmental assessment tools.

Keywords: Corporate Social Responsibility, (e)IMPACT model, Environmental metrics, , Small and Middle Sized Enterprises

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1504
560 Generation of Sets of Synthetic Classifiers for the Evaluation of Abstract-Level Combination Methods

Authors: N. Greco, S. Impedovo, R.Modugno, G. Pirlo

Abstract:

This paper presents a new technique for generating sets of synthetic classifiers to evaluate abstract-level combination methods. The sets differ in terms of both recognition rates of the individual classifiers and degree of similarity. For this purpose, each abstract-level classifier is considered as a random variable producing one class label as the output for an input pattern. From the initial set of classifiers, new slightly different sets are generated by applying specific operators, which are defined at the purpose. Finally, the sets of synthetic classifiers have been used to estimate the performance of combination methods for abstract-level classifiers. The experimental results demonstrate the effectiveness of the proposed approach.

Keywords: Abstract-level Classifier, Dempster-Shafer Rule, Multi-expert Systems, Similarity Index, System Evaluation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1439
559 Cognitive Weighted Polymorphism Factor: A Comprehension Augmented Complexity Metric

Authors: T. Francis Thamburaj, A. Aloysius

Abstract:

Polymorphism is one of the main pillars of objectoriented paradigm. It induces hidden forms of class dependencies which may impact software quality, resulting in higher cost factor for comprehending, debugging, testing, and maintaining the software. In this paper, a new cognitive complexity metric called Cognitive Weighted Polymorphism Factor (CWPF) is proposed. Apart from the software structural complexity, it includes the cognitive complexity on the basis of type. The cognitive weights are calibrated based on 27 empirical studies with 120 persons. A case study and experimentation of the new software metric shows positive results. Further, a comparative study is made and the correlation test has proved that CWPF complexity metric is a better, more comprehensive, and more realistic indicator of the software complexity than Abreu’s Polymorphism Factor (PF) complexity metric.

Keywords: Cognitive complexity metric, cognitive weighted polymorphism factor, object-oriented metrics, polymorphism factor, software metrics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2182
558 Position Based Routing Protocol with More Reliability in Mobile Ad Hoc Network

Authors: Mahboobeh Abdoos, Karim Faez, Masoud Sabaei

Abstract:

Position based routing protocols are the kinds of routing protocols, which they use of nodes location information, instead of links information to routing. In position based routing protocols, it supposed that the packet source node has position information of itself and it's neighbors and packet destination node. Greedy is a very important position based routing protocol. In one of it's kinds, named MFR (Most Forward Within Radius), source node or packet forwarder node, sends packet to one of it's neighbors with most forward progress towards destination node (closest neighbor to destination). Using distance deciding metric in Greedy to forward packet to a neighbor node, is not suitable for all conditions. If closest neighbor to destination node, has high speed, in comparison with source node or intermediate packet forwarder node speed or has very low remained battery power, then packet loss probability is increased. Proposed strategy uses combination of metrics distancevelocity similarity-power, to deciding about giving the packet to which neighbor. Simulation results show that the proposed strategy has lower lost packets average than Greedy, so it has more reliability.

Keywords: Mobile Ad Hoc Network, Position Based, Reliability, Routing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1722
557 Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance

Authors: Loai AbdAllah, Mahmoud Kaiyal

Abstract:

Missing values in real-world datasets are a common problem. Many algorithms were developed to deal with this problem, most of them replace the missing values with a fixed value that was computed based on the observed values. In our work, we used a distance function based on Bhattacharyya distance to measure the distance between objects with missing values. Bhattacharyya distance, which measures the similarity of two probability distributions. The proposed distance distinguishes between known and unknown values. Where the distance between two known values is the Mahalanobis distance. When, on the other hand, one of them is missing the distance is computed based on the distribution of the known values, for the coordinate that contains the missing value. This method was integrated with Wikaya, a digital health company developing a platform that helps to improve prevention of chronic diseases such as diabetes and cancer. In order for Wikaya’s recommendation system to work distance between users need to be measured. Since there are missing values in the collected data, there is a need to develop a distance function distances between incomplete users profiles. To evaluate the accuracy of the proposed distance function in reflecting the actual similarity between different objects, when some of them contain missing values, we integrated it within the framework of k nearest neighbors (kNN) classifier, since its computation is based only on the similarity between objects. To validate this, we ran the algorithm over diabetes and breast cancer datasets, standard benchmark datasets from the UCI repository. Our experiments show that kNN classifier using our proposed distance function outperforms the kNN using other existing methods.

Keywords: Missing values, distance metric, Bhattacharyya distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 739
556 Asymmetrical Informative Estimation for Macroeconomic Model: Special Case in the Tourism Sector of Thailand

Authors: Chukiat Chaiboonsri, Satawat Wannapan

Abstract:

This paper used an asymmetric informative concept to apply in the macroeconomic model estimation of the tourism sector in Thailand. The variables used to statistically analyze are Thailand international and domestic tourism revenues, the expenditures of foreign and domestic tourists, service investments by private sectors, service investments by the government of Thailand, Thailand service imports and exports, and net service income transfers. All of data is a time-series index which was observed between 2002 and 2015. Empirically, the tourism multiplier and accelerator were estimated by two statistical approaches. The first was the result of the Generalized Method of Moments model (GMM) based on the assumption which the tourism market in Thailand had perfect information (Symmetrical data). The second was the result of the Maximum Entropy Bootstrapping approach (MEboot) based on the process that attempted to deal with imperfect information and reduced uncertainty in data observations (Asymmetrical data). In addition, the tourism leakages were investigated by a simple model based on the injections and leakages concept. The empirical findings represented the parameters computed from the MEboot approach which is different from the GMM method. However, both of the MEboot estimation and GMM model suggests that Thailand’s tourism sectors are in a period capable of stimulating the economy.

Keywords: Thailand tourism, maximum entropy bootstrapping approach, macroeconomic model, asymmetric information.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1215
555 Genetic Characterization of Barley Genotypes via Inter-Simple Sequence Repeat

Authors: Mustafa Yorgancılar, Emine Atalay, Necdet Akgün, Ali Topal

Abstract:

In this study, polymerase chain reaction based Inter-simple sequence repeat (ISSR) from DNA fingerprinting techniques were used to investigate the genetic relationships among barley crossbreed genotypes in Turkey. It is important that selection based on the genetic base in breeding programs via ISSR, in terms of breeding time. 14 ISSR primers generated a total of 97 bands, of which 81 (83.35%) were polymorphic. The highest total resolution power (RP) value was obtained from the F2 (0.53) and M16 (0.51) primers. According to the ISSR result, the genetic similarity index changed between 0.64–095; Lane 3 with Line 6 genotypes were the closest, while Line 36 were the most distant ones. The ISSR markers were found to be promising for assessing genetic diversity in barley crossbreed genotypes.

Keywords: Barley, crossbreed, genetic similarity, ISSR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 873
554 XML Schema Automatic Matching Solution

Authors: Huynh Quyet Thang, Vo Sy Nam

Abstract:

Schema matching plays a key role in many different applications, such as schema integration, data integration, data warehousing, data transformation, E-commerce, peer-to-peer data management, ontology matching and integration, semantic Web, semantic query processing, etc. Manual matching is expensive and error-prone, so it is therefore important to develop techniques to automate the schema matching process. In this paper, we present a solution for XML schema automated matching problem which produces semantic mappings between corresponding schema elements of given source and target schemas. This solution contributed in solving more comprehensively and efficiently XML schema automated matching problem. Our solution based on combining linguistic similarity, data type compatibility and structural similarity of XML schema elements. After describing our solution, we present experimental results that demonstrate the effectiveness of this approach.

Keywords: XML Schema, Schema Matching, SemanticMatching, Automatic XML Schema Matching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1795
553 Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Authors: Shohei Maruyama, Yasuo Matsuyama, Sachiyo Aburatani

Abstract:

Development of a method to estimate gene functions is an important task in bioinformatics. One of the approaches for the annotation is the identification of the metabolic pathway that genes are involved in. Since gene expression data reflect various intracellular phenomena, those data are considered to be related with genes’ functions. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.

Keywords: Metabolic pathways, gene expression data, microarray, Kullback–Leibler divergence, KL divergence, support vector machines, SVM, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2283
552 Fighter Aircraft Selection Using Technique for Order Preference by Similarity to Ideal Solution with Multiple Criteria Decision Making Analysis

Authors: C. Ardil

Abstract:

This paper presents a multiple criteria decision making analysis technique for selecting fighter aircraft for the national air force. The selection of military aircraft is a process consisting of contradictory goals and objectives. When a modern air force needs to choose fighter aircraft to upgrade existing fleets, a multiple criteria decision making analysis and scenario planning for defense acquisition has been put forward. The selection of fighter aircraft for the air defense force is a strategic decision making process, since the purchase or lease of fighter jets, maintenance and operating costs and having a fleet is the biggest cost for the air force. Multiple criteria decision making analysis methods are effectively applied to facilitate decision making from various available options. The selection criteria were determined using the literature on the problem of fighter aircraft selection. The selection of fighter aircraft to be purchased for the air defense forces is handled using a multiple criteria decision making analysis technique that also determines a suitable methodological approach for the defense procurement and fleet upgrade planning process. The aim of this study is to originate an approach to evaluate fighter aircraft alternatives, Su-35, F-35, and TF-X (MMU), based on technique for order preference by similarity to ideal solution (TOPSIS).

Keywords: Fighter Aircraft, Fighter Aircraft Selection, Technique for Order Preference by Similarity to Ideal Solution, TOPSIS, Multiple Criteria Decision Making, Multiple Criteria Decision Making Analysis, MCDMA, Su-35, F-35, TF-X (MMU)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 568
551 Design, Development by Functional Analysis in UML and Static Test of a Multimedia Voice and Video Communication Platform on IP for a Use Adapted to the Context of Local Businesses in Lubumbashi

Authors: Blaise Fyama, Elie Museng, Grace Mukoma

Abstract:

In this article we present a java implementation of video telephony using the SIP protocol (Session Initiation Protocol). After a functional analysis of the SIP protocol, we relied on the work of Italian researchers of University of Parma-Italy to acquire adequate libraries for the development of our own communication tool. In order to optimize the code and improve the prototype, we used, in an incremental approach, test techniques based on a static analysis based on the evaluation of the complexity of the software with the application of metrics and the number cyclomatic of Mccabe. The objective is to promote the emergence of local start-ups producing IP video in a well understood local context. We have arrived at the creation of a video telephony tool whose code is optimized.

Keywords: Static analysis, coding, complexity, mccabe metrics, Sip, uml.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 336
550 Computing Entropy for Ortholog Detection

Authors: Hsing-Kuo Pao, John Case

Abstract:

Biological sequences from different species are called or-thologs if they evolved from a sequence of a common ancestor species and they have the same biological function. Approximations of Kolmogorov complexity or entropy of biological sequences are already well known to be useful in extracting similarity information between such sequences -in the interest, for example, of ortholog detection. As is well known, the exact Kolmogorov complexity is not algorithmically computable. In prac-tice one can approximate it by computable compression methods. How-ever, such compression methods do not provide a good approximation to Kolmogorov complexity for short sequences. Herein is suggested a new ap-proach to overcome the problem that compression approximations may notwork well on short sequences. This approach is inspired by new, conditional computations of Kolmogorov entropy. A main contribution of the empir-ical work described shows the new set of entropy-based machine learning attributes provides good separation between positive (ortholog) and nega-tive (non-ortholog) data - better than with good, previously known alter-natives (which do not employ some means to handle short sequences well).Also empirically compared are the new entropy based attribute set and a number of other, more standard similarity attributes sets commonly used in genomic analysis. The various similarity attributes are evaluated by cross validation, through boosted decision tree induction C5.0, and by Receiver Operating Characteristic (ROC) analysis. The results point to the conclu-sion: the new, entropy based attribute set by itself is not the one giving the best prediction; however, it is the best attribute set for use in improving the other, standard attribute sets when conjoined with them.

Keywords: compression, decision tree, entropy, ortholog, ROC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1795
549 A Framework for Review Spam Detection Research

Authors: Mohammadali Tavakoli, Atefeh Heydari, Zuriati Ismail, Naomie Salim

Abstract:

With the increasing number of people reviewing products online in recent years, opinion sharing websites has become the most important source of customers’ opinions. Unfortunately, spammers generate and post fake reviews in order to promote or demote brands and mislead potential customers. These are notably destructive not only for potential customers, but also for business holders and manufacturers. However, research in this area is not adequate, and many critical problems related to spam detection have not been solved to date. To provide green researchers in the domain with a great aid, in this paper, we have attempted to create a highquality framework to make a clear vision on review spam-detection methods. In addition, this report contains a comprehensive collection of detection metrics used in proposed spam-detection approaches. These metrics are extremely applicable for developing novel detection methods.

Keywords: Fake reviews, Feature collection, Opinion spam, Spam detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2473
548 Ranking Genes from DNA Microarray Data of Cervical Cancer by a local Tree Comparison

Authors: Frank Emmert-Streib, Matthias Dehmer, Jing Liu, Max Muhlhauser

Abstract:

The major objective of this paper is to introduce a new method to select genes from DNA microarray data. As criterion to select genes we suggest to measure the local changes in the correlation graph of each gene and to select those genes whose local changes are largest. More precisely, we calculate the correlation networks from DNA microarray data of cervical cancer whereas each network represents a tissue of a certain tumor stage and each node in the network represents a gene. From these networks we extract one tree for each gene by a local decomposition of the correlation network. The interpretation of a tree is that it represents the n-nearest neighbor genes on the n-th level of a tree, measured by the Dijkstra distance, and, hence, gives the local embedding of a gene within the correlation network. For the obtained trees we measure the pairwise similarity between trees rooted by the same gene from normal to cancerous tissues. This evaluates the modification of the tree topology due to tumor progression. Finally, we rank the obtained similarity values from all tissue comparisons and select the top ranked genes. For these genes the local neighborhood in the correlation networks changes most between normal and cancerous tissues. As a result we find that the top ranked genes are candidates suspected to be involved in tumor growth. This indicates that our method captures essential information from the underlying DNA microarray data of cervical cancer.

Keywords: Graph similarity, generalized trees, graph alignment, DNA microarray data, cervical cancer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1718
547 Web Proxy Detection via Bipartite Graphs and One-Mode Projections

Authors: Zhipeng Chen, Peng Zhang, Qingyun Liu, Li Guo

Abstract:

With the Internet becoming the dominant channel for business and life, many IPs are increasingly masked using web proxies for illegal purposes such as propagating malware, impersonate phishing pages to steal sensitive data or redirect victims to other malicious targets. Moreover, as Internet traffic continues to grow in size and complexity, it has become an increasingly challenging task to detect the proxy service due to their dynamic update and high anonymity. In this paper, we present an approach based on behavioral graph analysis to study the behavior similarity of web proxy users. Specifically, we use bipartite graphs to model host communications from network traffic and build one-mode projections of bipartite graphs for discovering social-behavior similarity of web proxy users. Based on the similarity matrices of end-users from the derived one-mode projection graphs, we apply a simple yet effective spectral clustering algorithm to discover the inherent web proxy users behavior clusters. The web proxy URL may vary from time to time. Still, the inherent interest would not. So, based on the intuition, by dint of our private tools implemented by WebDriver, we examine whether the top URLs visited by the web proxy users are web proxies. Our experiment results based on real datasets show that the behavior clusters not only reduce the number of URLs analysis but also provide an effective way to detect the web proxies, especially for the unknown web proxies.

Keywords: Bipartite graph, clustering, one-mode projection, web proxy detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 703
546 Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Authors: Yunseok Noh, Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park

Abstract:

Scripts are one of the basic text resources to understand broadcasting contents. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches, and provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scene segments consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics by statistical learning method. To tackle this problem, we propose a method to improve topic quality with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, more accurate topical representations lead to get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. By iteratively inferring topics and determining semantically neighborhood scene segments, we draw a topic space represents broadcasting contents well. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

Keywords: Broadcasting contents, generalized P´olya urn model, scripts, text similarity, topic model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1782
545 Error-Robust Nature of Genome Profiling Applied for Clustering of Species Demonstrated by Computer Simulation

Authors: Shamim Ahmed Koichi Nishigaki

Abstract:

Genome profiling (GP), a genotype based technology, which exploits random PCR and temperature gradient gel electrophoresis, has been successful in identification/classification of organisms. In this technology, spiddos (Species identification dots) and PaSS (Pattern similarity score) were employed for measuring the closeness (or distance) between genomes. Based on the closeness (PaSS), we can buildup phylogenetic trees of the organisms. We noticed that the topology of the tree is rather robust against the experimental fluctuation conveyed by spiddos. This fact was confirmed quantitatively in this study by computer-simulation, providing the limit of the reliability of this highly powerful methodology. As a result, we could demonstrate the effectiveness of the GP approach for identification/classification of organisms.

Keywords: Fluctuation, Genome profiling (GP), Pattern similarity score (PaSS), Robustness, Spiddos-shift.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1497
544 Destination Port Detection for Vessels: An Analytic Tool for Optimizing Port Authorities Resources

Authors: Lubna Eljabu, Mohammad Etemad, Stan Matwin

Abstract:

Port authorities have many challenges in congested ports to allocate their resources to provide a safe and secure loading/unloading procedure for cargo vessels. Selecting a destination port is the decision of a vessel master based on many factors such as weather, wavelength and changes of priorities. Having access to a tool which leverages Automatic Identification System (AIS) messages to monitor vessel’s movements and accurately predict their next destination port promotes an effective resource allocation process for port authorities. In this research, we propose a method, namely, Reference Route of Trajectory (RRoT) to assist port authorities in predicting inflow and outflow traffic in their local environment by monitoring AIS messages. Our RRo method creates a reference route based on historical AIS messages. It utilizes some of the best trajectory similarity measures to identify the destination of a vessel using their recent movement. We evaluated five different similarity measures such as Discrete Frechet Distance (DFD), Dynamic Time ´ Warping (DTW), Partial Curve Mapping (PCM), Area between two curves (Area) and Curve length (CL). Our experiments show that our method identifies the destination port with an accuracy of 98.97% and an f-measure of 99.08% using Dynamic Time Warping (DTW) similarity measure.

Keywords: Spatial temporal data mining, trajectory mining, trajectory similarity, resource optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 637
543 Applications of Rough Set Decompositions in Information Retrieval

Authors: Chen Wu, Xiaohua Hu

Abstract:

This paper proposes rough set models with three different level knowledge granules in incomplete information system under tolerance relation by similarity between objects according to their attribute values. Through introducing dominance relation on the discourse to decompose similarity classes into three subclasses: little better subclass, little worse subclass and vague subclass, it dismantles lower and upper approximations into three components. By using these components, retrieving information to find naturally hierarchical expansions to queries and constructing answers to elaborative queries can be effective. It illustrates the approach in applying rough set models in the design of information retrieval system to access different granular expanded documents. The proposed method enhances rough set model application in the flexibility of expansions and elaborative queries in information retrieval.

Keywords: Incomplete information system, Rough set model, tolerance relation, dominance relation, approximation, decomposition, elaborative query.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1576
542 Performance Analysis of Proprietary and Non-Proprietary Tools for Regression Testing Using Genetic Algorithm

Authors: K. Hema Shankari, R. Thirumalaiselvi, N. V. Balasubramanian

Abstract:

The present paper addresses to the research in the area of regression testing with emphasis on automated tools as well as prioritization of test cases. The uniqueness of regression testing and its cyclic nature is pointed out. The difference in approach between industry, with business model as basis, and academia, with focus on data mining, is highlighted. Test Metrics are discussed as a prelude to our formula for prioritization; a case study is further discussed to illustrate this methodology. An industrial case study is also described in the paper, where the number of test cases is so large that they have to be grouped as Test Suites. In such situations, a genetic algorithm proposed by us can be used to reconfigure these Test Suites in each cycle of regression testing. The comparison is made between a proprietary tool and an open source tool using the above-mentioned metrics. Our approach is clarified through several tables.

Keywords: APFD metric, genetic algorithm, regression testing, RFT tool, test case prioritization, selenium tool.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 882
541 Social Media Idea Ontology: A Concept for Semantic Search of Product Ideas in Customer Knowledge through User-Centered Metrics and Natural Language Processing

Authors: Martin H¨ausl, Maximilian Auch, Johannes Forster, Peter Mandl, Alexander Schill

Abstract:

In order to survive on the market, companies must constantly develop improved and new products. These products are designed to serve the needs of their customers in the best possible way. The creation of new products is also called innovation and is primarily driven by a company’s internal research and development department. However, a new approach has been taking place for some years now, involving external knowledge in the innovation process. This approach is called open innovation and identifies customer knowledge as the most important source in the innovation process. This paper presents a concept of using social media posts as an external source to support the open innovation approach in its initial phase, the Ideation phase. For this purpose, the social media posts are semantically structured with the help of an ontology and the authors are evaluated using graph-theoretical metrics such as density. For the structuring and evaluation of relevant social media posts, we also use the findings of Natural Language Processing, e. g. Named Entity Recognition, specific dictionaries, Triple Tagger and Part-of-Speech-Tagger. The selection and evaluation of the tools used are discussed in this paper. Using our ontology and metrics to structure social media posts enables users to semantically search these posts for new product ideas and thus gain an improved insight into the external sources such as customer needs.

Keywords: Idea ontology, innovation management, open innovation, semantic search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 751
540 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning

Authors: Walid Cherif

Abstract:

Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.

Keywords: Data mining, knowledge discovery, machine learning, similarity measurement, supervised classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1487
539 MCDM Spectrum Handover Models for Cognitive Wireless Networks

Authors: Cesar Hernández, Diego Giral, Fernando Santa

Abstract:

Spectrum handover is a significant topic in the cognitive radio networks to assure an efficient data transmission in the cognitive radio user’s communications. This paper proposes a comparison between three spectrum handover models: VIKOR, SAW and MEW. Four evaluation metrics are used. These metrics are, accumulative average of failed handover, accumulative average of handover performed, accumulative average of transmission bandwidth and, accumulative average of the transmission delay. As a difference with related work, the performance of the three spectrum handover models was validated with captured data of spectrum occupancy in experiments performed at the GSM frequency band (824 MHz - 849 MHz). These data represent the actual behavior of the licensed users for this wireless frequency band. The results of the comparison show that VIKOR Algorithm provides a 15.8% performance improvement compared to SAW Algorithm and, it is 12.1% better than the MEW Algorithm.

Keywords: Cognitive radio, decision making, MEW, SAW, spectrum handover, VIKOR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2108
538 Image Retrieval Using Fused Features

Authors: K. Sakthivel, R. Nallusamy, C. Kavitha

Abstract:

The system is designed to show images which are related to the query image. Extracting color, texture, and shape features from an image plays a vital role in content-based image retrieval (CBIR). Initially RGB image is converted into HSV color space due to its perceptual uniformity. From the HSV image, Color features are extracted using block color histogram, texture features using Haar transform and shape feature using Fuzzy C-means Algorithm. Then, the characteristics of the global and local color histogram, texture features through co-occurrence matrix and Haar wavelet transform and shape are compared and analyzed for CBIR. Finally, the best method of each feature is fused during similarity measure to improve image retrieval effectiveness and accuracy.

Keywords: Color Histogram, Haar Wavelet Transform, Fuzzy C-means, Co-occurrence matrix; Similarity measure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2099
537 A Numerical Solution Based On Operational Matrix of Differentiation of Shifted Second Kind Chebyshev Wavelets for a Stefan Problem

Authors: Rajeev, N. K. Raigar

Abstract:

In this study, one dimensional phase change problem (a Stefan problem) is considered and a numerical solution of this problem is discussed. First, we use similarity transformation to convert the governing equations into ordinary differential equations with its boundary conditions. The solutions of ordinary differential equation with the associated boundary conditions and interface condition (Stefan condition) are obtained by using a numerical approach based on operational matrix of differentiation of shifted second kind Chebyshev wavelets. The obtained results are compared with existing exact solution which is sufficiently accurate.

Keywords: Operational matrix of differentiation, Similarity transformation, Shifted second kind Chebyshev wavelets, Stefan problem.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1961
536 Generation of Photo-Mosaic Images through Block Matching and Color Adjustment

Authors: Hae-Yeoun Lee

Abstract:

Mosaic refers to a technique that makes image by gathering lots of small materials in various colors. This paper presents an automatic algorithm that makes the photo-mosaic image using photos. The algorithm is composed of 4 steps: partition and feature extraction, block matching, redundancy removal and color adjustment. The input image is partitioned in the small block to extract feature. Each block is matched to find similar photo in database by comparing similarity with Euclidean difference between blocks. The intensity of the block is adjusted to enhance the similarity of image by replacing the value of light and darkness with that of relevant block. Further, the quality of image is improved by minimizing the redundancy of tiles in the adjacent blocks. Experimental results support that the proposed algorithm is excellent in quantitative analysis and qualitative analysis.

Keywords: Photo-mosaic, Euclidean distance, Block matching, Intensity adjustment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3532