Search results for: Clinical documents

594 The Usefulness of Logical Structure in Flexible Document Categorization

Authors: Jebari Chaker, Ounalli Habib

Abstract:

This paper presents a new approach for automatic document categorization. Exploiting the logical structure of the document, our approach assigns a HTML document to one or more categories (thesis, paper, call for papers, email, ...). Using a set of training documents, our approach generates a set of rules used to categorize new documents. The approach flexibility is carried out with rule weight association representing your importance in the discrimination between possible categories. This weight is dynamically modified at each new document categorization. The experimentation of the proposed approach provides satisfactory results.

Keywords: categorization rule, document categorization, flexible categorization, logical structure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1201

593 Ontology-based Concept Weighting for Text Documents

Authors: Hmway Hmway Tar, Thi Thi Soe Nyaunt

Abstract:

Documents clustering become an essential technology with the popularity of the Internet. That also means that fast and high-quality document clustering technique play core topics. Text clustering or shortly clustering is about discovering semantically related groups in an unstructured collection of documents. Clustering has been very popular for a long time because it provides unique ways of digesting and generalizing large amounts of information. One of the issues of clustering is to extract proper feature (concept) of a problem domain. The existing clustering technology mainly focuses on term weight calculation. To achieve more accurate document clustering, more informative features including concept weight are important. Feature Selection is important for clustering process because some of the irrelevant or redundant feature may misguide the clustering results. To counteract this issue, the proposed system presents the concept weight for text clustering system developed based on a k-means algorithm in accordance with the principles of ontology so that the important of words of a cluster can be identified by the weight values. To a certain extent, it has resolved the semantic problem in specific areas.

Keywords: Clustering, Concept Weight, Document clustering, Feature Selection, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2358

592 Language and Retrieval Accuracy

Authors: Ahmed Abdelali, Jim Cowie, Hamdy S. Soliman

Abstract:

One of the major challenges in the Information Retrieval field is handling the massive amount of information available to Internet users. Existing ranking techniques and strategies that govern the retrieval process fall short of expected accuracy. Often relevant documents are buried deep in the list of documents returned by the search engine. In order to improve retrieval accuracy we examine the issue of language effect on the retrieval process. Then, we propose a solution for a more biased, user-centric relevance for retrieved data. The results demonstrate that using indices based on variations of the same language enhances the accuracy of search engines for individual users.

Keywords: Information Search and Retrieval, LanguageVariants, Search Engine, Retrieval Accuracy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1420

591 Web Page Watermarking: XML files using Synonyms and Acronyms

Authors: Nighat Mir, Sayed Afaq Hussain

Abstract:

Advent enhancements in the field of computing have increased massive use of web based electronic documents. Current Copyright protection laws are inadequate to prove the ownership for electronic documents and do not provide strong features against copying and manipulating information from the web. This has opened many channels for securing information and significant evolutions have been made in the area of information security. Digital Watermarking has developed into a very dynamic area of research and has addressed challenging issues for digital content. Watermarking can be visible (logos or signatures) and invisible (encoding and decoding). Many visible watermarking techniques have been studied for text documents but there are very few for web based text. XML files are used to trade information on the internet and contain important information. In this paper, two invisible watermarking techniques using Synonyms and Acronyms are proposed for XML files to prove the intellectual ownership and to achieve the security. Analysis is made for different attacks and amount of capacity to be embedded in the XML file is also noticed. A comparative analysis for capacity is also made for both methods. The system has been implemented using C# language and all tests are made practically to get the results.

Keywords: Watermarking, Extensible Markup Language (XML), Synonyms, Acronyms, Copyright protection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2226

590 Selection of Relevant Servers in Distributed Information Retrieval System

Authors: Benhamouda Sara, Guezouli Larbi

Abstract:

Nowadays, the dissemination of information touches the distributed world, where selecting the relevant servers to a user request is an important problem in distributed information retrieval. During the last decade, several research studies on this issue have been launched to find optimal solutions and many approaches of collection selection have been proposed. In this paper, we propose a new collection selection approach that takes into consideration the number of documents in a collection that contains terms of the query and the weights of those terms in these documents. We tested our method and our studies show that this technique can compete with other state-of-the-art algorithms that we choose to test the performance of our approach.

Keywords: Distributed information retrieval, relevance, server selection, collection selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1323

589 Elimination of Redundant Links in Web Pages– Mathematical Approach

Authors: G. Poonkuzhali, K.Thiagarajan, K.Sarukesi

Abstract:

With the enormous growth on the web, users get easily lost in the rich hyper structure. Thus developing user friendly and automated tools for providing relevant information without any redundant links to the users to cater to their needs is the primary task for the website owners. Most of the existing web mining algorithms have concentrated on finding frequent patterns while neglecting the less frequent one that are likely to contain the outlying data such as noise, irrelevant and redundant data. This paper proposes new algorithm for mining the web content by detecting the redundant links from the web documents using set theoretical(classical mathematics) such as subset, union, intersection etc,. Then the redundant links is removed from the original web content to get the required information by the user..

Keywords: Web documents, Web content mining, redundantlink, outliers, set theory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1959

588 Kano’s Model for Clinical Laboratory

Authors: Khaled N. El-Hashmi, Omar K.Gnieber

Abstract:

The clinical laboratory has received considerable recognition globally due to the rapid development of advanced technology, economic demands and its role in a patient’s treatment cycle. Although various cross-domain experiments and practices with respect to clinical laboratory projects are ready for the full swing, the customer needs are still ambiguous and debatable. The purpose of this study is to apply Kano’s model and customer satisfaction matrix to categorize service quality attributes in order to see how well these attributes are able to satisfy customer needs. The result reveals that ten of the 26 service quality attributes have greater impacts on highly increasing customer’s satisfaction and should be taken in consideration firstly.

Keywords: Clinical laboratory, Customer satisfaction matrix, Kano’s Model, Quality Attributes, Voice of Customer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2753

587 Interactive, Topic-Oriented Search Support by a Centroid-Based Text Categorisation

Authors: Mario Kubek, Herwig Unger

Abstract:

Centroid terms are single words that semantically and topically characterise text documents and so may serve as their very compact representation in automatic text processing. In the present paper, centroids are used to measure the relevance of text documents with respect to a given search query. Thus, a new graphbased paradigm for searching texts in large corpora is proposed and evaluated against keyword-based methods. The first, promising experimental results demonstrate the usefulness of the centroid-based search procedure. It is shown that especially the routing of search queries in interactive and decentralised search systems can be greatly improved by applying this approach. A detailed discussion on further fields of its application completes this contribution.

Keywords: Search algorithm, centroid, query, keyword, cooccurrence, categorisation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 576

586 AnQL: A Query Language for Annotation Documents

Authors: Neerja Bhatnagar, Ben A. Juliano, Renee S. Renner

Abstract:

This paper presents data annotation models at five levels of granularity (database, relation, column, tuple, and cell) of relational data to address the problem of unsuitability of most relational databases to express annotations. These models do not require any structural and schematic changes to the underlying database. These models are also flexible, extensible, customizable, database-neutral, and platform-independent. This paper also presents an SQL-like query language, named Annotation Query Language (AnQL), to query annotation documents. AnQL is simple to understand and exploits the already-existent wide knowledge and skill set of SQL.

Keywords: Annotation query language, data annotations, data annotation models, semantic data annotations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1745

585 Extraction of Significant Phrases from Text

Authors: Yuan J. Lui

Abstract:

Prospective readers can quickly determine whether a document is relevant to their information need if the significant phrases (or keyphrases) in this document are provided. Although keyphrases are useful, not many documents have keyphrases assigned to them, and manually assigning keyphrases to existing documents is costly. Therefore, there is a need for automatic keyphrase extraction. This paper introduces a new domain independent keyphrase extraction algorithm. The algorithm approaches the problem of keyphrase extraction as a classification task, and uses a combination of statistical and computational linguistics techniques, a new set of attributes, and a new machine learning method to distinguish keyphrases from non-keyphrases. The experiments indicate that this algorithm performs better than other keyphrase extraction tools and that it significantly outperforms Microsoft Word 2000-s AutoSummarize feature. The domain independence of this algorithm has also been confirmed in our experiments.

Keywords: classification, keyphrase extraction, machine learning, summarization

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2006

584 An Overview of Technology Availability to Support Remote Decentralized Clinical Trials

Authors: S. Huber, B. Schnalzer, B. Alcalde, S. Hanke, L. Mpaltadoros, T. G. Stavropoulos, S. Nikolopoulos, I. Kompatsiaris, L. Pérez-Breva, V. Rodrigo-Casares, J. Fons-Martínez, J. de Bruin

Abstract:

Developing new medicine and health solutions and improving patient health currently rely on the successful execution of clinical trials, which generate relevant safety and efficacy data. For their success, recruitment and retention of participants are some of the most challenging aspects of protocol adherence. Main barriers include: i) lack of awareness of clinical trials; ii) long distance from the clinical site; iii) the burden on participants, including the duration and number of clinical visits, and iv) high dropout rate. Most of these aspects could be addressed with a new paradigm, namely the Remote Decentralized Clinical Trials (RDCTs). Furthermore, the COVID-19 pandemic has highlighted additional advantages and challenges for RDCTs in practice, allowing participants to join trials from home and not depending on site visits, etc. Nevertheless, RDCTs should follow the process and the quality assurance of conventional clinical trials, which involve several processes. For each part of the trial, the Building Blocks, existing software and technologies were assessed through a systematic search. The technology needed to perform RDCTs is widely available and validated but is yet segmented and developed in silos, as different software solutions address different parts of the trial and at various levels. The current paper is analyzing the availability of technology to perform RDCTs, identifying gaps and providing an overview of Basic Building Blocks and functionalities that need to be covered to support the described processes.

Keywords: architectures and frameworks for health informatics systems, clinical trials, information and communications technology, remote decentralized clinical trials, technology availability

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 681

583 Clinical Signs of Neonatal Calves in Experimental Colisepticemia

Authors: Samad Lotfollahzadeh

Abstract:

Escherichia coli (E. coli) is the most isolated bacteria from blood circulation of septicemic calves. Given the prevalence of septicemia in animals and its economic importance in veterinary practice, better understanding of changes in clinical signs following disease, may contribute to early detection of disorder. The present study has been carried out to detect changes of clinical signs in induced sepsis in calves with E. coli. Colisepticemia has been induced in 10 twenty-day old healthy Holstein- Frisian calves with intravenous injection of 1.5 X 109 colony forming units (cfu) of O111:H8 strain of E. coli. Clinical signs including rectal temperature, heart rate, respiratory rate, shock, appetite, sucking reflex, feces consistency, general behavior, dehydration and standing ability were recorded in experimental calves during 24 hours after induction of colisepticemia. Blood culture was also carried out from calves four times during experiment. ANOVA with repeated measure is used to see changes of calves’ clinical signs to experimental colisepticemia, and values of P≤ 0.05 was considered statistically significant. Mean values of rectal temperature and heart rate as well as median values of respiratory rate, appetite, suckling reflex, standing ability and feces consistency of experimental calves increased significantly during study (P<0.05). In the present study median value of shock score was not significantly increased in experimental calves (P> 0.05). The results of present study showed that total score of clinical signs in calves with experimental colisepticemia increased significantly, although score of some clinical signs such as shock did not change significantly.

Keywords: Calves, Clinical signs scoring, E. coli O111:H8, Experimental colisepticemia,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2052

582 Popularization of the Communist Manifesto in 19th Century Europe

Authors: Xuanyu Bai

Abstract:

“The Communist Manifesto”, written by Karl Marx and Friedrich Engels, is one of the most significant documents throughout the whole history which covers across different fields including Economic, Politic, Sociology and Philosophy. Instead of discussing the Communist ideas presented in the Communist Manifesto, the essay focuses on exploring the reasons that contributed to the popularization of the document and its influence on political revolutions in 19^th century Europe by concentrating on the document itself along with other primary and secondary sources and temporal artwork. Combining the details from the Communist Manifesto and other documents, Marx’s writing style and word choice, his convincible notions about a new society dominated by proletariats, and the revolutionary idea of class destruction has led to the popularization of the Communist Manifesto and influenced the latter political revolutions.

Keywords: Communist Manifesto, The Wealth of Nations, 19th century Europe, word choice, capitalism, communism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 489

581 Distributed Splay Suffix Arrays: A New Structure for Distributed String Search

Authors: Tu Kun, Gu Nai-jie, Bi Kun, Liu Gang, Dong Wan-li

Abstract:

As a structure for processing string problem, suffix array is certainly widely-known and extensively-studied. But if the string access pattern follows the “90/10" rule, suffix array can not take advantage of the fact that we often find something that we have just found. Although the splay tree is an efficient data structure for small documents when the access pattern follows the “90/10" rule, it requires many structures and an excessive amount of pointer manipulations for efficiently processing and searching large documents. In this paper, we propose a new and conceptually powerful data structure, called splay suffix arrays (SSA), for string search. This data structure combines the features of splay tree and suffix arrays into a new approach which is suitable to implementation on both conventional and clustered computers.

Keywords: suffix arrays, splay tree, string search, distributedalgorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1726

580 Slovenian Spatial Legislation over Time and Its Issues

Authors: Andreja Benko

Abstract:

Article presents a short overview of the architects’ profession over time with outlined work of the architectural theoreticians. In the continuation is described a former affiliation of Slovenia as well as the spatial planning documents that were in use until the Slovenia joint Yugoslavia (last part in 1919). This legislation from former Austro-Hungarian monarchy was valid almost until 1950 in some parts of Yugoslavia even longer. Upon that will be mentioned some valid Slovenian spatial documents which will be compared with the German legislation. Analyzed will be the number of architect and spatial planners in Slovenia and also their number upon certain region in Slovenia. Based on that will be given also the number from statistical office of Slovenia of the number of buildings between years 2007 and 2012, and described also the collapse of the major construction companies in Slovenia and consequences of that. At the end will be outlined the morality and ethics by spatial interventions and lack of the architectural law in Slovenia as well as the problematic of minimal collaboration between the Ministry of infrastructure and spatial planning with the profession.

Keywords: Architect, history, legislation, Slovenia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1588

579 Measuring the Structural Similarity of Web-based Documents: A Novel Approach

Authors: Matthias Dehmer, Frank Emmert Streib, Alexander Mehler, Jürgen Kilian

Abstract:

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.

Keywords: Graph similarity, hierarchical and directed graphs, hypertext, generalized trees, web structure mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2514

578 Policy Management Framework for Managing Enterprise Policies

Authors: Dahir A. Ga'al, Wardah Zainal Abidin

Abstract:

Policy management in organizations became rising issue in the last decade. It’s because of today’s regulatory requirements in the organizations. To manage policies in large organizations is an imperative work. However, major challenges facing organizations in the last decade is managing all the policies in the organization and making them an active documents rather than simple (inactive) documents stored in computer hard drive or on a shelf. Because of this challenge, organizations need policy management program. This policy management program can be either manual or automated. This paper presents suggestions towards managing policies in organizations. As well as possible policy management solution or program to be utilized, manual or automated. The research first examines the models and frameworks used for managing policies from various perspectives in the literature of the research area/domain. At the end of this paper, a policy management framework is proposed for managing enterprise policies effectively and in a simplified manner.

Keywords: Policy, policy management, policy management program, policy repository.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2563

577 Optometric-lab: a Stereophotogrammetry Tool for Eye Movements Records

Authors: E. F. P. Leme, L. J. R. Lopez, D. G. Goroso

Abstract:

In this paper as showed a non-invasive 3D eye tracker for optometry clinical applications. Measurements of biomechanical variables in clinical practice have many font of errors associated with traditional procedments such cover test (CT), near point of accommodation (NPC), eye ductions (ED), eye vergences (EG) and, eye versions (ES). Ocular motility should always be tested but all evaluations have a subjective interpretations by practitioners, the results is based in clinical experiences, repeatability and accuracy don-t exist. Optometric-lab is a tool with 3 (tree) analogical video cameras triggered and synchronized in one acquisition board AD. The variables globe rotation angle and velocity can be quantified. Data record frequency was performed with 27Hz, camera calibration was performed in a know volume and image radial distortion adjustments.

Keywords: Eye Tracking, strabismus, eye movements, optometry.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1759

576 Neural-Symbolic Machine-Learning for Knowledge Discovery and Adaptive Information Retrieval

Authors: Hager Kammoun, Jean Charles Lamirel, Mohamed Ben Ahmed

Abstract:

In this paper, a model for an information retrieval system is proposed which takes into account that knowledge about documents and information need of users are dynamic. Two methods are combined, one qualitative or symbolic and the other quantitative or numeric, which are deemed suitable for many clustering contexts, data analysis, concept exploring and knowledge discovery. These two methods may be classified as inductive learning techniques. In this model, they are introduced to build “long term" knowledge about past queries and concepts in a collection of documents. The “long term" knowledge can guide and assist the user to formulate an initial query and can be exploited in the process of retrieving relevant information. The different kinds of knowledge are organized in different points of view. This may be considered an enrichment of the exploration level which is coherent with the concept of document/query structure.

Keywords: Information Retrieval Systems, machine learning, classification, Galois lattices, Self Organizing Map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1146

575 Algorithm for Information Retrieval Optimization

Authors: Kehinde K. Agbele, Kehinde Daniel Aruleba, Eniafe F. Ayetiran

Abstract:

When using Information Retrieval Systems (IRS), users often present search queries made of ad-hoc keywords. It is then up to the IRS to obtain a precise representation of the user’s information need and the context of the information. This paper investigates optimization of IRS to individual information needs in order of relevance. The study addressed development of algorithms that optimize the ranking of documents retrieved from IRS. This study discusses and describes a Document Ranking Optimization (DROPT) algorithm for information retrieval (IR) in an Internet-based or designated databases environment. Conversely, as the volume of information available online and in designated databases is growing continuously, ranking algorithms can play a major role in the context of search results. In this paper, a DROPT technique for documents retrieved from a corpus is developed with respect to document index keywords and the query vectors. This is based on calculating the weight (

Keywords: Internet ranking,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1427

574 Event Template Generation for News Articles

Authors: A. Kowcika, E. Umamaheswari, T.V. Geetha

Abstract:

In this paper we focus on event extraction from Tamil news article. This system utilizes a scoring scheme for extracting and grouping event-specific sentences. Using this scoring scheme eventspecific clustering is performed for multiple documents. Events are extracted from each document using a scoring scheme based on feature score and condition score. Similarly event specific sentences are clustered from multiple documents using this scoring scheme. The proposed system builds the Event Template based on user specified query. The templates are filled with event specific details like person, location and timeline extracted from the formed clusters. The proposed system applies these methodologies for Tamil news articles that have been enconverted into UNL graphs using a Tamil to UNL-enconverter. The main intention of this work is to generate an event based template.

Keywords: Event Extraction, Score based Clustering, Segmentation, Template Generation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1645

573 A Study on Finding Similar Document with Multiple Categories

Authors: R. Saraçoğlu, N. Allahverdi

Abstract:

Searching similar documents and document management subjects have important place in text mining. One of the most important parts of similar document research studies is the process of classifying or clustering the documents. In this study, a similar document search approach that includes discussion of out the case of belonging to multiple categories (multiple categories problem) has been carried. The proposed method that based on Fuzzy Similarity Classification (FSC) has been compared with Rocchio algorithm and naive Bayes method which are widely used in text mining. Empirical results show that the proposed method is quite successful and can be applied effectively. For the second stage, multiple categories vector method based on information of categories regarding to frequency of being seen together has been used. Empirical results show that achievement is increased almost two times, when proposed method is compared with classical approach.

Keywords: Document similarity, Fuzzy classification, Multiple categories, Text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1658

572 Disparities versus Similarities: WHO GPPQCL and ISO/IEC 17025:2017 International Standards for Quality Management Systems in Pharmaceutical Laboratories

Authors: M. A. Okezue, K. L. Clase, S. R. Byrn, P. Shivanand

Abstract:

Medicines regulatory authorities expect pharmaceutical companies and contract research organizations to seek ways to certify that their laboratory control measurements are reliable. Establishing and maintaining laboratory quality standards are essential in ensuring the accuracy of test results. ‘ISO/IEC 17025:2017’ and ‘WHO Good Practices for Pharmaceutical Quality Control Laboratories (GPPQCL)’ are two quality standards commonly employed in developing laboratory quality systems. A review was conducted on the two standards to elaborate on areas on convergence and divergence. The goal was to understand how differences in each standard's requirements may influence laboratories' choices as to which document is easier to adopt for quality systems. A qualitative review method compared similar items in the two standards while mapping out areas where there were specific differences in the requirements of the two documents. The review also provided a detailed description of the clauses and parts covering management and technical requirements in these laboratory standards. The review showed that both documents share requirements for over ten critical areas covering objectives, infrastructure, management systems, and laboratory processes. There were, however, differences in standard expectations where GPPQCL emphasizes system procedures for planning and future budgets that will ensure continuity. Conversely, ISO 17025 was more focused on the risk management approach to establish laboratory quality systems. Elements in the two documents form common standard requirements to assure the validity of laboratory test results that promote mutual recognition. The ISO standard currently has more global patronage than GPPQCL.

Keywords: ISO/IEC 17025:2017, laboratory standards, quality control, WHO GPPQCL

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 977

571 Experiments on Element and Document Statistics for XML Retrieval

Authors: Mohamed Ben Aouicha, Mohamed Tmar, Mohand Boughanem, Mohamed Abid

Abstract:

This paper presents an information retrieval model on XML documents based on tree matching. Queries and documents are represented by extended trees. An extended tree is built starting from the original tree, with additional weighted virtual links between each node and its indirect descendants allowing to directly reach each descendant. Therefore only one level separates between each node and its indirect descendants. This allows to compare the user query and the document with flexibility and with respect to the structural constraints of the query. The content of each node is very important to decide weither a document element is relevant or not, thus the content should be taken into account in the retrieval process. We separate between the structure-based and the content-based retrieval processes. The content-based score of each node is commonly based on the well-known Tf × Idf criteria. In this paper, we compare between this criteria and another one we call Tf × Ief. The comparison is based on some experiments into a dataset provided by INEX1 to show the effectiveness of our approach on one hand and those of both weighting functions on the other.

Keywords: XML retrieval, INEX, Tf × Idf, Tf × Ief

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1275

570 A Materialized Approach to the Integration of XML Documents: the OSIX System

Authors: H. Ahmad, S. Kermanshahani, A. Simonet, M. Simonet

Abstract:

The data exchanged on the Web are of different nature from those treated by the classical database management systems; these data are called semi-structured data since they do not have a regular and static structure like data found in a relational database; their schema is dynamic and may contain missing data or types. Therefore, the needs for developing further techniques and algorithms to exploit and integrate such data, and extract relevant information for the user have been raised. In this paper we present the system OSIX (Osiris based System for Integration of XML Sources). This system has a Data Warehouse model designed for the integration of semi-structured data and more precisely for the integration of XML documents. The architecture of OSIX relies on the Osiris system, a DL-based model designed for the representation and management of databases and knowledge bases. Osiris is a viewbased data model whose indexing system supports semantic query optimization. We show that the problem of query processing on a XML source is optimized by the indexing approach proposed by Osiris.

Keywords: Data integration, semi-structured data, views, XML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1540

569 Semantic Enhanced Social Media Sentiments for Stock Market Prediction

Authors: K. Nirmala Devi, V. Murali Bhaskaran

Abstract:

Traditional document representation for classification follows Bag of Words (BoW) approach to represent the term weights. The conventional method uses the Vector Space Model (VSM) to exploit the statistical information of terms in the documents and they fail to address the semantic information as well as order of the terms present in the documents. Although, the phrase based approach follows the order of the terms present in the documents rather than semantics behind the word. Therefore, a semantic concept based approach is used in this paper for enhancing the semantics by incorporating the ontology information. In this paper a novel method is proposed to forecast the intraday stock market price directional movement based on the sentiments from Twitter and money control news articles. The stock market forecasting is a very difficult and highly complicated task because it is affected by many factors such as economic conditions, political events and investor’s sentiment etc. The stock market series are generally dynamic, nonparametric, noisy and chaotic by nature. The sentiment analysis along with wisdom of crowds can automatically compute the collective intelligence of future performance in many areas like stock market, box office sales and election outcomes. The proposed method utilizes collective sentiments for stock market to predict the stock price directional movements. The collective sentiments in the above social media have powerful prediction on the stock price directional movements as up/down by using Granger Causality test.

Keywords: Bag of Words, Collective Sentiments, Ontology, Semantic relations, Sentiments, Social media, Stock Prediction, Twitter, Vector Space Model and wisdom of crowds.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2745

568 Theoretical Literature Review on Lack of Cardiorespiratory Fitness and Its Effects on Children

Authors: E. Abdi

Abstract:

The purpose of this theoretical literature review is to study the relevant academic literature on lack of cardiorespiratory fitness and its effects on children. The total of thirty eight relevant documents were identified and considered for this review which nineteen of those were original research articles published in peer reviewed journals. The other nineteen articles were statistical documents. This literature review is structured to examine 5 effects in deficiency of cardiorespiratory fitness in school aged children (A) Relative Age Effect (RAE), (B) Obesity, (C) Inadequate fitness level (D) Unhealthy life style, and (E) Academics. The categories provide a theoretical framework for future studies where results are driven from the literature review. The study discusses that regular physical fitness assists children and adolescents to develop healthy physical activity behaviors which can be sustained throughout adult life. Conclusion suggests that advocacy for increasing physical activity and decreasing sedentary behaviors at school and home are necessary.

Keywords: Cardiorespiratory, endurance, physical activity, physical fitness.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2178

567 The Relevance of Data Warehousing and Data Mining in the Field of Evidence-based Medicine to Support Healthcare Decision Making

Authors: Nevena Stolba, A Min Tjoa

Abstract:

Evidence-based medicine is a new direction in modern healthcare. Its task is to prevent, diagnose and medicate diseases using medical evidence. Medical data about a large patient population is analyzed to perform healthcare management and medical research. In order to obtain the best evidence for a given disease, external clinical expertise as well as internal clinical experience must be available to the healthcare practitioners at right time and in the right manner. External evidence-based knowledge can not be applied directly to the patient without adjusting it to the patient-s health condition. We propose a data warehouse based approach as a suitable solution for the integration of external evidence-based data sources into the existing clinical information system and data mining techniques for finding appropriate therapy for a given patient and a given disease. Through integration of data warehousing, OLAP and data mining techniques in the healthcare area, an easy to use decision support platform, which supports decision making process of care givers and clinical managers, is built. We present three case studies, which show, that a clinical data warehouse that facilitates evidence-based medicine is a reliable, powerful and user-friendly platform for strategic decision making, which has a great relevance for the practice and acceptance of evidence-based medicine.

Keywords: data mining, data warehousing, decision-support systems, evidence-based medicine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3730

566 Concept Indexing using Ontology and Supervised Machine Learning

Authors: Rossitza M. Setchi, Qiao Tang

Abstract:

Nowadays, ontologies are the only widely accepted paradigm for the management of sharable and reusable knowledge in a way that allows its automatic interpretation. They are collaboratively created across the Web and used to index, search and annotate documents. The vast majority of the ontology based approaches, however, focus on indexing texts at document level. Recently, with the advances in ontological engineering, it became clear that information indexing can largely benefit from the use of general purpose ontologies which aid the indexing of documents at word level. This paper presents a concept indexing algorithm, which adds ontology information to words and phrases and allows full text to be searched, browsed and analyzed at different levels of abstraction. This algorithm uses a general purpose ontology, OntoRo, and an ontologically tagged corpus, OntoCorp, both developed for the purpose of this research. OntoRo and OntoCorp are used in a two-stage supervised machine learning process aimed at generating ontology tagging rules. The first experimental tests show a tagging accuracy of 78.91% which is encouraging in terms of the further improvement of the algorithm.

Keywords: Concepts, indexing, machine learning, ontology, tagging.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1629

565 Multidisciplinary Approach to Diagnosis of Primary Progressive Aphasia in a Younger Middle Aged Patient

Authors: Robert Krause

Abstract:

Primary progressive aphasia (PPA) is a neurodegenerative disease similar to frontotemporal and semantic dementia, while having a different clinical image and anatomic pathology topography. Nonetheless, they are often included under an umbrella term: frontotemporal lobar degeneration (FTLD). In the study, examples of diagnosing PPA are presented through the multidisciplinary lens of specialists from different fields (neurologists, psychiatrists, clinical speech therapists, clinical neuropsychologists and others) using a variety of diagnostic tools such as MR, PET/CT, genetic screening and neuropsychological and logopedic methods. Thanks to that, specialists can get a better and clearer understanding of PPA diagnosis. The study summarizes the concrete procedures and results of different specialists while diagnosing PPA in a patient of younger middle age and illustrates the importance of multidisciplinary approach to differential diagnosis of PPA.

Keywords: Primary progressive aphasia, etiology, diagnosis, younger middle age.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 576