Search results for: Query
168 Functional and Efficient Query Interpreters: Principle, Application and Performances’ Comparison
Authors: Laurent Thiry, Michel Hassenforder
Abstract:
This paper presents a general approach to implement efficient queries’ interpreters in a functional programming language. Indeed, most of the standard tools actually available use an imperative and/or object-oriented language for the implementation (e.g. Java for Jena-Fuseki) but other paradigms are possible with, maybe, better performances. To proceed, the paper first explains how to model data structures and queries in a functional point of view. Then, it proposes a general methodology to get performances (i.e. number of computation steps to answer a query) then it explains how to integrate some optimization techniques (short-cut fusion and, more important, data transformations). It then compares the functional server proposed to a standard tool (Fuseki) demonstrating that the first one can be twice to ten times faster to answer queries.Keywords: data transformation, functional programming, information server, optimization
Procedia PDF Downloads 159167 On the Interactive Search with Web Documents
Authors: Mario Kubek, Herwig Unger
Abstract:
Due to the large amount of information in the World Wide Web (WWW, web) and the lengthy and usually linearly ordered result lists of web search engines that do not indicate semantic relationships between their entries, the search for topically similar and related documents can become a tedious task. Especially, the process of formulating queries with proper terms representing specific information needs requires much effort from the user. This problem gets even bigger when the user's knowledge on a subject and its technical terms is not sufficient enough to do so. This article presents the new and interactive search application DocAnalyser that addresses this problem by enabling users to find similar and related web documents based on automatic query formulation and state-of-the-art search word extraction. Additionally, this tool can be used to track topics across semantically connected web documentsKeywords: DocAnalyser, interactive web search, search word extraction, query formulation, source topic detection, topic tracking
Procedia PDF Downloads 394166 Evaluation of Firearm Injury Syndromic Surveillance in Utah
Authors: E. Bennion, A. Acharya, S. Barnes, D. Ferrell, S. Luckett-Cole, G. Mower, J. Nelson, Y. Nguyen
Abstract:
Objective: This study aimed to evaluate the validity of a firearm injury query in the Early Notification of Community-based Epidemics syndromic surveillance system. Syndromic surveillance data are used at the Utah Department of Health for early detection of and rapid response to unusually high rates of violence and injury, among other health outcomes. The query of interest was defined by the Centers for Disease Control and Prevention and used chief complaint and discharge diagnosis codes to capture initial emergency department encounters for firearm injury of all intents. Design: Two epidemiologists manually reviewed electronic health records of emergency department visits captured by the query from April-May 2020, compared results, and sent conflicting determinations to two arbiters. Results: Of the 85 unique records captured, 67 were deemed probable, 19 were ruled out, and two were undetermined, resulting in a positive predictive value of 75.3%. Common reasons for false positives included non-initial encounters and misleading keywords. Conclusion: Improving the validity of syndromic surveillance data would better inform outbreak response decisions made by state and local health departments. The firearm injury definition could be refined to exclude non-initial encounters by negating words such as “last month,” “last week,” and “aftercare”; and to exclude non-firearm injury by negating words such as “pellet gun,” “air gun,” “nail gun,” “bullet bike,” and “exit wound” when a firearm is not mentioned.Keywords: evaluation, health information system, firearm injury, syndromic surveillance
Procedia PDF Downloads 168165 Design and Implementation of Partial Denoising Boundary Image Matching Using Indexing Techniques
Authors: Bum-Soo Kim, Jin-Uk Kim
Abstract:
In this paper, we design and implement a partial denoising boundary image matching system using indexing techniques. Converting boundary images to time-series makes it feasible to perform fast search using indexes even on a very large image database. Thus, using this converting method we develop a client-server system based on the previous partial denoising research in the GUI (graphical user interface) environment. The client first converts a query image given by a user to a time-series and sends denoising parameters and the tolerance with this time-series to the server. The server identifies similar images from the index by evaluating a range query, which is constructed using inputs given from the client, and sends the resulting images to the client. Experimental results show that our system provides much intuitive and accurate matching result.Keywords: boundary image matching, indexing, partial denoising, time-series matching
Procedia PDF Downloads 141164 2D Fingerprint Performance for PubChem Chemical Database
Authors: Fatimah Zawani Abdullah, Shereena Mohd Arif, Nurul Malim
Abstract:
The study of molecular similarity search in chemical database is increasingly widespread, especially in the area of drug discovery. Similarity search is an application in the field of Chemoinformatics to measure the similarity between the molecular structure which is known as the query and the structure of chemical compounds in the database. Similarity search is also one of the approaches in virtual screening which involves computational techniques and scoring the probabilities of activity. The main objective of this work is to determine the best fingerprint when compared to the other five fingerprints selected in this study using PubChem chemical dataset. This paper will discuss the similarity searching process conducted using 6 types of descriptors, which are ECFP4, ECFC4, FCFP4, FCFC4, SRECFC4 and SRFCFC4 on 15 activity classes of PubChem dataset using Tanimoto coefficient to calculate the similarity between the query structures and each of the database structure. The results suggest that ECFP4 performs the best to be used with Tanimoto coefficient in the PubChem dataset.Keywords: 2D fingerprints, Tanimoto, PubChem, similarity searching, chemoinformatics
Procedia PDF Downloads 294163 Structuring Paraphrases: The Impact Sentence Complexity Has on Key Leader Engagements
Authors: Meaghan Bowman
Abstract:
Soldiers are taught about the importance of effective communication with repetition of the phrase, “Communication is key.” They receive training in preparing for, and carrying out, interactions between foreign and domestic leaders to gain crucial information about a mission. These interactions are known as Key Leader Engagements (KLEs). For the training of KLEs, doctrine mandates the skills needed to conduct these “engagements” such as how to: behave appropriately, identify key leaders, and employ effective strategies. Army officers in training learn how to confront leaders, what information to gain, and how to ask questions respectfully. Unfortunately, soldiers rarely learn how to formulate questions optimally. Since less complex questions are easier to understand, we hypothesize that semantic complexity affects content understanding, and that age and education levels may have an effect on one’s ability to form paraphrases and judge their quality. In this study, we looked at paraphrases of queries as well as judgments of both the paraphrases’ naturalness and their semantic similarity to the query. Queries were divided into three complexity categories based on the number of relations (the first number) and the number of knowledge graph edges (the second number). Two crowd-sourced tasks were completed by Amazon volunteer participants, also known as turkers, to answer the research questions: (i) Are more complex queries harder to paraphrase and judge and (ii) Do age and education level affect the ability to understand complex queries. We ran statistical tests as follows: MANOVA for query understanding and two-way ANOVA to understand the relationship between query complexity and education and age. A probe of the number of given-level queries selected for paraphrasing by crowd-sourced workers in seven age ranges yielded promising results. We found significant evidence that age plays a role and marginally significant evidence that education level plays a role. These preliminary tests, with output p-values of 0.0002 and 0.068, respectively, suggest the importance of content understanding in a communication skill set. This basic ability to communicate, which may differ by age and education, permits reproduction and quality assessment and is crucial in training soldiers for effective participation in KLEs.Keywords: engagement, key leader, paraphrasing, query complexity, understanding
Procedia PDF Downloads 162162 SC-LSH: An Efficient Indexing Method for Approximate Similarity Search in High Dimensional Space
Authors: Sanaa Chafik, Imane Daoudi, Mounim A. El Yacoubi, Hamid El Ouardi
Abstract:
Locality Sensitive Hashing (LSH) is one of the most promising techniques for solving nearest neighbour search problem in high dimensional space. Euclidean LSH is the most popular variation of LSH that has been successfully applied in many multimedia applications. However, the Euclidean LSH presents limitations that affect structure and query performances. The main limitation of the Euclidean LSH is the large memory consumption. In order to achieve a good accuracy, a large number of hash tables is required. In this paper, we propose a new hashing algorithm to overcome the storage space problem and improve query time, while keeping a good accuracy as similar to that achieved by the original Euclidean LSH. The Experimental results on a real large-scale dataset show that the proposed approach achieves good performances and consumes less memory than the Euclidean LSH.Keywords: approximate nearest neighbor search, content based image retrieval (CBIR), curse of dimensionality, locality sensitive hashing, multidimensional indexing, scalability
Procedia PDF Downloads 322161 Enhance Security in XML Databases: XLog File for Severity-Aware Trust-Based Access Control
Authors: A: Asmawi, L. S. Affendey, N. I. Udzir, R. Mahmod
Abstract:
The topic of enhancing security in XML databases is important as it includes protecting sensitive data and providing a secure environment to users. In order to improve security and provide dynamic access control for XML databases, we presented XLog file to calculate user trust values by recording users’ bad transaction, errors and query severities. Severity-aware trust-based access control for XML databases manages the access policy depending on users' trust values and prevents unauthorized processes, malicious transactions and insider threats. Privileges are automatically modified and adjusted over time depending on user behaviour and query severity. Logging in database is an important process and is used for recovery and security purposes. In this paper, the Xlog file is presented as a dynamic and temporary log file for XML databases to enhance the level of security.Keywords: XML database, trust-based access control, severity-aware, trust values, log file
Procedia PDF Downloads 300160 Personalization of Context Information Retrieval Model via User Search Behaviours for Ranking Document Relevance
Authors: Kehinde Agbele, Longe Olumide, Daniel Ekong, Dele Seluwa, Akintoye Onamade
Abstract:
One major problem of most existing information retrieval systems (IRS) is that they provide even access and retrieval results to individual users specially based on the query terms user issued to the system. When using IRS, users often present search queries made of ad-hoc keywords. It is then up to IRS to obtain a precise representation of user’s information need, and the context of the information. In effect, the volume and range of the Internet documents is growing exponentially and consequently causes difficulties for a user to obtain information that precisely matches the user interest. Diverse combination techniques are used to achieve the specific goal. This is due, firstly, to the fact that users often do not present queries to IRS that optimally represent the information they want, and secondly, the measure of a document's relevance is highly subjective between diverse users. In this paper, we address the problem by investigating the optimization of IRS to individual information needs in order of relevance. The paper addressed the development of algorithms that optimize the ranking of documents retrieved from IRS. This paper addresses this problem with a two-fold approach in order to retrieve domain-specific documents. Firstly, the design of context of information. The context of a query determines retrieved information relevance using personalization and context-awareness. Thus, executing the same query in diverse contexts often leads to diverse result rankings based on the user preferences. Secondly, the relevant context aspects should be incorporated in a way that supports the knowledge domain representing users’ interests. In this paper, the use of evolutionary algorithms is incorporated to improve the effectiveness of IRS. A context-based information retrieval system that learns individual needs from user-provided relevance feedback is developed whose retrieval effectiveness is evaluated using precision and recall metrics. The results demonstrate how to use attributes from user interaction behavior to improve the IR effectiveness.Keywords: context, document relevance, information retrieval, personalization, user search behaviors
Procedia PDF Downloads 464159 Performance Evaluation of Hierarchical Location-Based Services Coupled to the Greedy Perimeter Stateless Routing Protocol for Wireless Sensor Networks
Authors: Rania Khadim, Mohammed Erritali, Abdelhakim Maaden
Abstract:
Nowadays Wireless Sensor Networks have attracted worldwide research and industrial interest, because they can be applied in various areas. Geographic routing protocols are very suitable to those networks because they use location information when they need to route packets. Obviously, location information is maintained by Location-Based Services provided by network nodes in a distributed way. In this paper we choose to evaluate the performance of two hierarchical rendezvous location based-services, GLS (Grid Location Service) and HLS (Hierarchical Location Service) coupled to the GPSR routing protocol (Greedy Perimeter Stateless Routing) for Wireless Sensor Network. The simulations were performed using NS2 simulator to evaluate the performance and power of the two services in term of location overhead, the request travel time (RTT) and the query Success ratio (QSR). This work presents also a new scalability performance study of both GLS and HLS, specifically, what happens if the number of nodes N increases. The study will focus on three qualitative metrics: The location maintenance cost, the location query cost and the storage cost.Keywords: location based-services, routing protocols, scalability, wireless sensor networks
Procedia PDF Downloads 373158 Healthcare Big Data Analytics Using Hadoop
Authors: Chellammal Surianarayanan
Abstract:
Healthcare industry is generating large amounts of data driven by various needs such as record keeping, physician’s prescription, medical imaging, sensor data, Electronic Patient Record(EPR), laboratory, pharmacy, etc. Healthcare data is so big and complex that they cannot be managed by conventional hardware and software. The complexity of healthcare big data arises from large volume of data, the velocity with which the data is accumulated and different varieties such as structured, semi-structured and unstructured nature of data. Despite the complexity of big data, if the trends and patterns that exist within the big data are uncovered and analyzed, higher quality healthcare at lower cost can be provided. Hadoop is an open source software framework for distributed processing of large data sets across clusters of commodity hardware using a simple programming model. The core components of Hadoop include Hadoop Distributed File System which offers way to store large amount of data across multiple machines and MapReduce which offers way to process large data sets with a parallel, distributed algorithm on a cluster. Hadoop ecosystem also includes various other tools such as Hive (a SQL-like query language), Pig (a higher level query language for MapReduce), Hbase(a columnar data store), etc. In this paper an analysis has been done as how healthcare big data can be processed and analyzed using Hadoop ecosystem.Keywords: big data analytics, Hadoop, healthcare data, towards quality healthcare
Procedia PDF Downloads 415157 Real-Time Data Stream Partitioning over a Sliding Window in Real-Time Spatial Big Data
Authors: Sana Hamdi, Emna Bouazizi, Sami Faiz
Abstract:
In recent years, real-time spatial applications, like location-aware services and traffic monitoring, have become more and more important. Such applications result dynamic environments where data as well as queries are continuously moving. As a result, there is a tremendous amount of real-time spatial data generated every day. The growth of the data volume seems to outspeed the advance of our computing infrastructure. For instance, in real-time spatial Big Data, users expect to receive the results of each query within a short time period without holding in account the load of the system. But with a huge amount of real-time spatial data generated, the system performance degrades rapidly especially in overload situations. To solve this problem, we propose the use of data partitioning as an optimization technique. Traditional horizontal and vertical partitioning can increase the performance of the system and simplify data management. But they remain insufficient for real-time spatial Big data; they can’t deal with real-time and stream queries efficiently. Thus, in this paper, we propose a novel data partitioning approach for real-time spatial Big data named VPA-RTSBD (Vertical Partitioning Approach for Real-Time Spatial Big data). This contribution is an implementation of the Matching algorithm for traditional vertical partitioning. We find, firstly, the optimal attribute sequence by the use of Matching algorithm. Then, we propose a new cost model used for database partitioning, for keeping the data amount of each partition more balanced limit and for providing a parallel execution guarantees for the most frequent queries. VPA-RTSBD aims to obtain a real-time partitioning scheme and deals with stream data. It improves the performance of query execution by maximizing the degree of parallel execution. This affects QoS (Quality Of Service) improvement in real-time spatial Big Data especially with a huge volume of stream data. The performance of our contribution is evaluated via simulation experiments. The results show that the proposed algorithm is both efficient and scalable, and that it outperforms comparable algorithms.Keywords: real-time spatial big data, quality of service, vertical partitioning, horizontal partitioning, matching algorithm, hamming distance, stream query
Procedia PDF Downloads 158156 Improved Image Retrieval for Efficient Localization in Urban Areas Using Location Uncertainty Data
Authors: Mahdi Salarian, Xi Xu, Rashid Ansari
Abstract:
Accurate localization of mobile devices based on camera-acquired visual media information usually requires a search over a very large GPS-referenced image database. This paper proposes an efficient method for limiting the search space for image retrieval engine by extracting and leveraging additional media information about Estimated Positional Error (EP E) to address complexity and accuracy issues in the search, especially to be used for compensating GPS location inaccuracy in dense urban areas. The improved performance is achieved by up to a hundred-fold reduction in the search area used in available reference methods while providing improved accuracy. To test our procedure we created a database by acquiring Google Street View (GSV) images for down town of Chicago. Other available databases are not suitable for our approach due to lack of EP E for the query images. We tested the procedure using more than 200 query images along with EP E acquired mostly in the densest areas of Chicago with different phones and in different conditions such as low illumination and from under rail tracks. The effectiveness of our approach and the effect of size and sector angle of the search area are discussed and experimental results demonstrate how our proposed method can improve performance just by utilizing a data that is available for mobile systems such as smart phones.Keywords: localization, retrieval, GPS uncertainty, bag of word
Procedia PDF Downloads 283155 Post Pandemic Mobility Analysis through Indexing and Sharding in MongoDB: Performance Optimization and Insights
Authors: Karan Vishavjit, Aakash Lakra, Shafaq Khan
Abstract:
The COVID-19 pandemic has pushed healthcare professionals to use big data analytics as a vital tool for tracking and evaluating the effects of contagious viruses. To effectively analyze huge datasets, efficient NoSQL databases are needed. The analysis of post-COVID-19 health and well-being outcomes and the evaluation of the effectiveness of government efforts during the pandemic is made possible by this research’s integration of several datasets, which cuts down on query processing time and creates predictive visual artifacts. We recommend applying sharding and indexing technologies to improve query effectiveness and scalability as the dataset expands. Effective data retrieval and analysis are made possible by spreading the datasets into a sharded database and doing indexing on individual shards. Analysis of connections between governmental activities, poverty levels, and post-pandemic well being is the key goal. We want to evaluate the effectiveness of governmental initiatives to improve health and lower poverty levels. We will do this by utilising advanced data analysis and visualisations. The findings provide relevant data that supports the advancement of UN sustainable objectives, future pandemic preparation, and evidence-based decision-making. This study shows how Big Data and NoSQL databases may be used to address problems with global health.Keywords: big data, COVID-19, health, indexing, NoSQL, sharding, scalability, well being
Procedia PDF Downloads 71154 A Local Tensor Clustering Algorithm to Annotate Uncharacterized Genes with Many Biological Networks
Authors: Paul Shize Li, Frank Alber
Abstract:
A fundamental task of clinical genomics is to unravel the functions of genes and their associations with disorders. Although experimental biology has made efforts to discover and elucidate the molecular mechanisms of individual genes in the past decades, still about 40% of human genes have unknown functions, not to mention the diseases they may be related to. For those biologists who are interested in a particular gene with unknown functions, a powerful computational method tailored for inferring the functions and disease relevance of uncharacterized genes is strongly needed. Studies have shown that genes strongly linked to each other in multiple biological networks are more likely to have similar functions. This indicates that the densely connected subgraphs in multiple biological networks are useful in the functional and phenotypic annotation of uncharacterized genes. Therefore, in this work, we have developed an integrative network approach to identify the frequent local clusters, which are defined as those densely connected subgraphs that frequently occur in multiple biological networks and consist of the query gene that has few or no disease or function annotations. This is a local clustering algorithm that models multiple biological networks sharing the same gene set as a three-dimensional matrix, the so-called tensor, and employs the tensor-based optimization method to efficiently find the frequent local clusters. Specifically, massive public gene expression data sets that comprehensively cover dynamic, physiological, and environmental conditions are used to generate hundreds of gene co-expression networks. By integrating these gene co-expression networks, for a given uncharacterized gene that is of biologist’s interest, the proposed method can be applied to identify the frequent local clusters that consist of this uncharacterized gene. Finally, those frequent local clusters are used for function and disease annotation of this uncharacterized gene. This local tensor clustering algorithm outperformed the competing tensor-based algorithm in both module discovery and running time. We also demonstrated the use of the proposed method on real data of hundreds of gene co-expression data and showed that it can comprehensively characterize the query gene. Therefore, this study provides a new tool for annotating the uncharacterized genes and has great potential to assist clinical genomic diagnostics.Keywords: local tensor clustering, query gene, gene co-expression network, gene annotation
Procedia PDF Downloads 169153 A Conversational Chatbot for Cricket Analytics
Authors: Kishan Bharadwaj Shridhar
Abstract:
Cricket is a data-rich sport, generating vast amounts of information, much of which is captured as textual commentary. Leading cricket data providers, such as ESPN Cricinfo include valuable Decision Review System (DRS) statistics within these commentaries, often as footnotes. Despite the significance of this data, accessing and analyzing it efficiently remains a challenge. This paper presents the development of a sophisticated chatbot designed to answer queries specifically about DRS in cricket. It supports up to seven distinct query types, including individual player statistics, umpire performance, player vs umpire dynamics, comparisons between batter and bowler, a player’s record at specific venues and more. Additionally, it enables stateful conversations, allowing a user to seamlessly build upon previous queries for a fluid and interactive experience. Leveraging advanced text-to-SQL methodologies and open-source frameworks such as Langgraph, it ensures low latency and robust performance. A distinct prompt engineering module enables the system to accurately interpret query intent, dynamically transitioning to an assisted text-to-SQL approach or a rule-based engine, as needed. This solution is the one of its kind in cricket analytics, offering unparalleled insights in cricket through an intuitive interface. It can be extended to other facets of cricket data and beyond, to other sports that generate textual data.Keywords: conversational AI, cricket data analytics, text to SQL, large language models, stateful conversations.
Procedia PDF Downloads 4152 Regression Approach for Optimal Purchase of Hosts Cluster in Fixed Fund for Hadoop Big Data Platform
Authors: Haitao Yang, Jianming Lv, Fei Xu, Xintong Wang, Yilin Huang, Lanting Xia, Xuewu Zhu
Abstract:
Given a fixed fund, purchasing fewer hosts of higher capability or inversely more of lower capability is a must-be-made trade-off in practices for building a Hadoop big data platform. An exploratory study is presented for a Housing Big Data Platform project (HBDP), where typical big data computing is with SQL queries of aggregate, join, and space-time condition selections executed upon massive data from more than 10 million housing units. In HBDP, an empirical formula was introduced to predict the performance of host clusters potential for the intended typical big data computing, and it was shaped via a regression approach. With this empirical formula, it is easy to suggest an optimal cluster configuration. The investigation was based on a typical Hadoop computing ecosystem HDFS+Hive+Spark. A proper metric was raised to measure the performance of Hadoop clusters in HBDP, which was tested and compared with its predicted counterpart, on executing three kinds of typical SQL query tasks. Tests were conducted with respect to factors of CPU benchmark, memory size, virtual host division, and the number of element physical host in cluster. The research has been applied to practical cluster procurement for housing big data computing.Keywords: Hadoop platform planning, optimal cluster scheme at fixed-fund, performance predicting formula, typical SQL query tasks
Procedia PDF Downloads 232151 Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores
Authors: Ankit Sinha, Soham Banerjee, Pratik Chattopadhyay
Abstract:
Automated product recognition in retail stores is an important real-world application in the domain of Computer Vision and Pattern Recognition. In this paper, we consider the problem of automatically identifying the classes of the products placed on racks in retail stores from an image of the rack and information about the query/product images. We improve upon the existing approaches in terms of effectiveness and memory requirement by developing a two-stage object detection and recognition pipeline comprising of a Faster-RCNN-based object localizer that detects the object regions in the rack image and a ResNet-18-based image encoder that classifies the detected regions into the appropriate classes. Each of the models is fine-tuned using appropriate data sets for better prediction and data augmentation is performed on each query image to prepare an extensive gallery set for fine-tuning the ResNet-18-based product recognition model. This encoder is trained using a triplet loss function following the strategy of online-hard-negative-mining for improved prediction. The proposed models are lightweight and can be connected in an end-to-end manner during deployment to automatically identify each product object placed in a rack image. Extensive experiments using Grozi-32k and GP-180 data sets verify the effectiveness of the proposed model.Keywords: retail stores, faster-RCNN, object localization, ResNet-18, triplet loss, data augmentation, product recognition
Procedia PDF Downloads 157150 An Extensible Software Infrastructure for Computer Aided Custom Monitoring of Patients in Smart Homes
Authors: Ritwik Dutta, Marylin Wolf
Abstract:
This paper describes the trade-offs and the design from scratch of a self-contained, easy-to-use health dashboard software system that provides customizable data tracking for patients in smart homes. The system is made up of different software modules and comprises a front-end and a back-end component. Built with HTML, CSS, and JavaScript, the front-end allows adding users, logging into the system, selecting metrics, and specifying health goals. The back-end consists of a NoSQL Mongo database, a Python script, and a SimpleHTTPServer written in Python. The database stores user profiles and health data in JSON format. The Python script makes use of the PyMongo driver library to query the database and displays formatted data as a daily snapshot of user health metrics against target goals. Any number of standard and custom metrics can be added to the system, and corresponding health data can be fed automatically, via sensor APIs or manually, as text or picture data files. A real-time METAR request API permits correlating weather data with patient health, and an advanced query system is implemented to allow trend analysis of selected health metrics over custom time intervals. Available on the GitHub repository system, the project is free to use for academic purposes of learning and experimenting, or practical purposes by building on it.Keywords: flask, Java, JavaScript, health monitoring, long-term care, Mongo, Python, smart home, software engineering, webserver
Procedia PDF Downloads 391149 Transcriptomine: The Nuclear Receptor Signaling Transcriptome Database
Authors: Scott A. Ochsner, Christopher M. Watkins, Apollo McOwiti, David L. Steffen Lauren B. Becnel, Neil J. McKenna
Abstract:
Understanding signaling by nuclear receptors (NRs) requires an appreciation of their cognate ligand- and tissue-specific transcriptomes. While target gene regulation data are abundant in this field, they reside in hundreds of discrete publications in formats refractory to routine query and analysis and, accordingly, their full value to the NR signaling community has not been realized. One of the mandates of the Nuclear Receptor Signaling Atlas (NURSA) is to facilitate access of the community to existing public datasets. Pursuant to this mandate we are developing a freely-accessible community web resource, Transcriptomine, to bring together the sum total of available expression array and RNA-Seq data points generated by the field in a single location. Transcriptomine currently contains over 25,000,000 gene fold change datapoints from over 1200 contrasts relevant to over 100 NRs, ligands and coregulators in over 200 tissues and cell lines. Transcriptomine is designed to accommodate a spectrum of end users ranging from the bench researcher to those with advanced bioinformatic training. Visualization tools allow users to build custom charts to compare and contrast patterns of gene regulation across different tissues and in response to different ligands. Our resource affords an entirely new paradigm for leveraging gene expression data in the NR signaling field, empowering users to query gene fold changes across diverse regulatory molecules, tissues and cell lines, target genes, biological functions and disease associations, and that would otherwise be prohibitive in terms of time and effort. Transcriptomine will be regularly updated with gene lists from future genome-wide expression array and expression-sequencing datasets in the NR signaling field.Keywords: target gene database, informatics, gene expression, transcriptomics
Procedia PDF Downloads 275148 Static Application Security Testing Approach for Non-Standard Smart Contracts
Authors: Antonio Horta, Renato Marinho, Raimir Holanda
Abstract:
Considered as an evolution of the Blockchain, the Ethereum platform, besides allowing transactions of its cryptocurrency named Ether, it allows the programming of decentralised applications (DApps) and smart contracts. However, this functionality into blockchains has raised other types of threats, and the exploitation of smart contracts vulnerabilities has taken companies to experience big losses. This research intends to figure out the number of contracts that are under risk of being drained. Through a deep investigation, more than two hundred thousand smart contracts currently available in the Ethereum platform were scanned and estimated how much money is at risk. The experiment was based in a query run on Google Big Query in July 2022 and returned 50,707,133 contracts published on the Ethereum platform. After applying the filtering criteria, the experimentgot 430,584 smart contracts to download and analyse. The filtering criteria consisted of filtering out: ERC20 and ERC721 contracts, contracts without transactions, and contracts without balance. From this amount of 430,584 smart contracts selected, only 268,103 had source codes published on Etherscan, however, we discovered, using a hashing process, that there were contracts duplication. Removing the duplicated contracts, the process ended up with 20,417 source codes, which were analysed using the open source SAST tool smartbugswith oyente and securify algorithms. In the end, there was nearly $100,000 at risk of being drained from the potentially vulnerable smart contracts. It is important to note that the tools used in this study may generate false positives, which may interfere with the number of vulnerable contracts. To address this point, our next step in this research is to develop an application to test the contract in a parallel environment to verify the vulnerability. Finally, this study aims to alert users and companies about the risk on not properly creating and analysing their smart contracts before publishing them into the platform. As any other application, smart contracts are at risk of having vulnerabilities which, in this case, may result in direct financial losses.Keywords: blockchain, reentrancy, static application security testing, smart contracts
Procedia PDF Downloads 88147 Automated Evaluation Approach for Time-Dependent Question Answering Pairs on Web Crawler Based Question Answering System
Authors: Shraddha Chaudhary, Raksha Agarwal, Niladri Chatterjee
Abstract:
This work demonstrates a web crawler-based generalized end-to-end open domain Question Answering (QA) system. An efficient QA system requires a significant amount of domain knowledge to answer any question with the aim to find an exact and correct answer in the form of a number, a noun, a short phrase, or a brief piece of text for the user's questions. Analysis of the question, searching the relevant document, and choosing an answer are three important steps in a QA system. This work uses a web scraper (Beautiful Soup) to extract K-documents from the web. The value of K can be calibrated on the basis of a trade-off between time and accuracy. This is followed by a passage ranking process using the MS-Marco dataset trained on 500K queries to extract the most relevant text passage, to shorten the lengthy documents. Further, a QA system is used to extract the answers from the shortened documents based on the query and return the top 3 answers. For evaluation of such systems, accuracy is judged by the exact match between predicted answers and gold answers. But automatic evaluation methods fail due to the linguistic ambiguities inherent in the questions. Moreover, reference answers are often not exhaustive or are out of date. Hence correct answers predicted by the system are often judged incorrect according to the automated metrics. One such scenario arises from the original Google Natural Question (GNQ) dataset which was collected and made available in the year 2016. Use of any such dataset proves to be inefficient with respect to any questions that have time-varying answers. For illustration, if the query is where will be the next Olympics? Gold Answer for the above query as given in the GNQ dataset is “Tokyo”. Since the dataset was collected in the year 2016, and the next Olympics after 2016 were in 2020 that was in Tokyo which is absolutely correct. But if the same question is asked in 2022 then the answer is “Paris, 2024”. Consequently, any evaluation based on the GNQ dataset will be incorrect. Such erroneous predictions are usually given to human evaluators for further validation which is quite expensive and time-consuming. To address this erroneous evaluation, the present work proposes an automated approach for evaluating time-dependent question-answer pairs. In particular, it proposes a metric using the current timestamp along with top-n predicted answers from a given QA system. To test the proposed approach GNQ dataset has been used and the system achieved an accuracy of 78% for a test dataset comprising 100 QA pairs. This test data was automatically extracted using an analysis-based approach from 10K QA pairs of the GNQ dataset. The results obtained are encouraging. The proposed technique appears to have the possibility of developing into a useful scheme for gathering precise, reliable, and specific information in a real-time and efficient manner. Our subsequent experiments will be guided towards establishing the efficacy of the above system for a larger set of time-dependent QA pairs.Keywords: web-based information retrieval, open domain question answering system, time-varying QA, QA evaluation
Procedia PDF Downloads 101146 An Integrated Mathematical Approach to Measure the Capacity of MMTS
Authors: Bayan Bevrani, Robert L. Burdett, Prasad K. D. V. Yarlagadda
Abstract:
This article focuses upon multi-modal transportation systems (MMTS) and the issues surrounding the determination of system capacity. For that purpose a multi-objective framework is advocated that integrates all the different modes and many different competing capacity objectives. This framework is analytical in nature and facilitates a variety of capacity querying and capacity expansion planning.Keywords: analytical model, capacity analysis, capacity query, multi-modal transportation system (MMTS)
Procedia PDF Downloads 361145 Synthetic Method of Contextual Knowledge Extraction
Authors: Olga Kononova, Sergey Lyapin
Abstract:
Global information society requirements are transparency and reliability of data, as well as ability to manage information resources independently; particularly to search, to analyze, to evaluate information, thereby obtaining new expertise. Moreover, it is satisfying the society information needs that increases the efficiency of the enterprise management and public administration. The study of structurally organized thematic and semantic contexts of different types, automatically extracted from unstructured data, is one of the important tasks for the application of information technologies in education, science, culture, governance and business. The objectives of this study are the contextual knowledge typologization, selection or creation of effective tools for extracting and analyzing contextual knowledge. Explication of various kinds and forms of the contextual knowledge involves the development and use full-text search information systems. For the implementation purposes, the authors use an e-library 'Humanitariana' services such as the contextual search, different types of queries (paragraph-oriented query, frequency-ranked query), automatic extraction of knowledge from the scientific texts. The multifunctional e-library «Humanitariana» is realized in the Internet-architecture in WWS-configuration (Web-browser / Web-server / SQL-server). Advantage of use 'Humanitariana' is in the possibility of combining the resources of several organizations. Scholars and research groups may work in a local network mode and in distributed IT environments with ability to appeal to resources of any participating organizations servers. Paper discusses some specific cases of the contextual knowledge explication with the use of the e-library services and focuses on possibilities of new types of the contextual knowledge. Experimental research base are science texts about 'e-government' and 'computer games'. An analysis of the subject-themed texts trends allowed to propose the content analysis methodology, that combines a full-text search with automatic construction of 'terminogramma' and expert analysis of the selected contexts. 'Terminogramma' is made out as a table that contains a column with a frequency-ranked list of words (nouns), as well as columns with an indication of the absolute frequency (number) and the relative frequency of occurrence of the word (in %% ppm). The analysis of 'e-government' materials showed, that the state takes a dominant position in the processes of the electronic interaction between the authorities and society in modern Russia. The media credited the main role in these processes to the government, which provided public services through specialized portals. Factor analysis revealed two factors statistically describing the used terms: human interaction (the user) and the state (government, processes organizer); interaction management (public officer, processes performer) and technology (infrastructure). Isolation of these factors will lead to changes in the model of electronic interaction between government and society. In this study, the dominant social problems and the prevalence of different categories of subjects of computer gaming in science papers from 2005 to 2015 were identified. Therefore, there is an evident identification of several types of contextual knowledge: micro context; macro context; dynamic context; thematic collection of queries (interactive contextual knowledge expanding a composition of e-library information resources); multimodal context (functional integration of iconographic and full-text resources through hybrid quasi-semantic algorithm of search). Further studies can be pursued both in terms of expanding the resource base on which they are held, and in terms of the development of appropriate tools.Keywords: contextual knowledge, contextual search, e-library services, frequency-ranked query, paragraph-oriented query, technologies of the contextual knowledge extraction
Procedia PDF Downloads 360144 The Ecosystem of Food Allergy Clinical Trials: A Systematic Review
Authors: Eimar Yadir Quintero Tapias
Abstract:
Background: Science is not generally self-correcting; many clinical studies end with the same conclusion "more research is needed." This study hypothesizes that first, we need a better appraisal of the available (and unavailable) evidence instead of creating more of the same false inquiries. Methods: Systematic review of ClinicalTrials.gov study records using the following Boolean operators: (food OR nut OR milk OR egg OR shellfish OR wheat OR peanuts) AND (allergy OR allergies OR hypersensitivity OR hypersensitivities). Variables included the status of the study (e g., active and completed), availability of results, sponsor type, sample size, among others. To determine the rates of non-publication in journals indexed by PubMed, an advanced search query using the specific Number of Clinical Trials (e.g., NCT000001 OR NCT000002 OR...) was performed. As a prophylactic measure to prevent P-hacking, data analyses only included descriptive statistics and not inferential approaches. Results: A total of 2092 study records matched the search query described above (date: September 13, 2019). Most studies were interventional (n = 1770; 84.6%) and the remainder observational (n = 322; 15.4%). Universities, hospitals, and research centers sponsored over half of these investigations (n = 1208; 57.7%), 308 studies (14.7%) were industry-funded, and 147 received NIH grants; the remaining studies got mixed sponsorship. Regarding completed studies (n = 1156; 55.2%), 248 (21.5%) have results available at the registry site, and 417 (36.1%) matched NCT numbers of journal papers indexed by PubMed. Conclusions: The internal and external validity of human research is critical for the appraisal of medical evidence. It is imperative to analyze the entire dataset of clinical studies, preferably at a patient-level anonymized raw data, before rushing to conclusions with insufficient and inadequate information. Publication bias and non-registration of clinical trials limit the evaluation of the evidence concerning therapeutic interventions for food allergy, such as oral and sublingual immunotherapy, as well as any other medical condition. Over half of the food allergy human research remains unpublished.Keywords: allergy, clinical trials, immunology, systematic reviews
Procedia PDF Downloads 138143 Assessment of Students Skills in Error Detection in SQL Classes using Rubric Framework - An Empirical Study
Authors: Dirson Santos De Campos, Deller James Ferreira, Anderson Cavalcante Gonçalves, Uyara Ferreira Silva
Abstract:
Rubrics to learning research provide many evaluation criteria and expected performance standards linked to defined student activity for learning and pedagogical objectives. Despite the rubric being used in education at all levels, academic literature on rubrics as a tool to support research in SQL Education is quite rare. There is a large class of SQL queries is syntactically correct, but certainly, not all are semantically correct. Detecting and correcting errors is a recurring problem in SQL education. In this paper, we usthe Rubric Abstract Framework (RAF), which consists of steps, that allows us to map the information to measure student performance guided by didactic objectives defined by the teacher as long as it is contextualized domain modeling by rubric. An empirical study was done that demonstrates how rubrics can mitigate student difficulties in finding logical errors and easing teacher workload in SQL education. Detecting and correcting logical errors is an important skill for students. Researchers have proposed several ways to improve SQL education because understanding this paradigm skills are crucial in software engineering and computer science. The RAF instantiation was using in an empirical study developed during the COVID-19 pandemic in database course. The pandemic transformed face-to-face and remote education, without presential classes. The lab activities were conducted remotely, which hinders the teaching-learning process, in particular for this research, in verifying the evidence or statements of knowledge, skills, and abilities (KSAs) of students. Various research in academia and industry involved databases. The innovation proposed in this paper is the approach used where the results obtained when using rubrics to map logical errors in query formulation have been analyzed with gains obtained by students empirically verified. The research approach can be used in the post-pandemic period in both classroom and distance learning.Keywords: rubric, logical error, structured query language (SQL), empirical study, SQL education
Procedia PDF Downloads 191142 Challenges over Two Semantic Repositories - OWLIM and AllegroGraph
Authors: Paria Tajabor, Azin Azarbani
Abstract:
The purpose of this research study is exploring two kind of semantic repositories with regards to various factors to find the best approaches that an artificial manager can use to produce ontology in a system based on their interaction, association and research. To this end, as the best way to evaluate each system and comparing with others is analysis, several benchmarking over these two repositories were examined. These two semantic repositories: OWLIM and AllegroGraph will be the main core of this study. The general objective of this study is to be able to create an efficient and cost-effective manner reports which is required to support decision making in any large enterprise.Keywords: OWLIM, allegrograph, RDF, reasoning, semantic repository, semantic-web, SPARQL, ontology, query
Procedia PDF Downloads 263141 Development of Requirements Analysis Tool for Medical Autonomy in Long-Duration Space Exploration Missions
Authors: Lara Dutil-Fafard, Caroline Rhéaume, Patrick Archambault, Daniel Lafond, Neal W. Pollock
Abstract:
Improving resources for medical autonomy of astronauts in prolonged space missions, such as a Mars mission, requires not only technology development, but also decision-making support systems. The Advanced Crew Medical System - Medical Condition Requirements study, funded by the Canadian Space Agency, aimed to create knowledge content and a scenario-based query capability to support medical autonomy of astronauts. The key objective of this study was to create a prototype tool for identifying medical infrastructure requirements in terms of medical knowledge, skills and materials. A multicriteria decision-making method was used to prioritize the highest risk medical events anticipated in a long-term space mission. Starting with those medical conditions, event sequence diagrams (ESDs) were created in the form of decision trees where the entry point is the diagnosis and the end points are the predicted outcomes (full recovery, partial recovery, or death/severe incapacitation). The ESD formalism was adapted to characterize and compare possible outcomes of medical conditions as a function of available medical knowledge, skills, and supplies in a given mission scenario. An extensive literature review was performed and summarized in a medical condition database. A PostgreSQL relational database was created to allow query-based evaluation of health outcome metrics with different medical infrastructure scenarios. Critical decision points, skill and medical supply requirements, and probable health outcomes were compared across chosen scenarios. The three medical conditions with the highest risk rank were acute coronary syndrome, sepsis, and stroke. Our efforts demonstrate the utility of this approach and provide insight into the effort required to develop appropriate content for the range of medical conditions that may arise.Keywords: decision support system, event-sequence diagram, exploration mission, medical autonomy, scenario-based queries, space medicine
Procedia PDF Downloads 128140 Life in the Fermata
Authors: Liza Michaeli
Abstract:
Fermata, from fermare, means to be on hold, conjoining contenere (contain), fissare (fasten), and rimanere (stay). It is a musical indication that a note should be (that is, calls to be) prolonged beyond the "normal duration" the note value would indicate. To be "held up" is to be held impractically in the lunga pausa, where the breath "is to be taken," taken in, but the release is impeded. In this pause of unspecified length, life is being prolonged against your discretion but given to you to inhabit. That is why the note value, the recognized "normative value," does not "accept" the significance of the prolongation. The value is interested in "saving time." The query at the heart of "Life in the fermata" will therefore be: is this life on hold, or rather, life holding. The paper shall meditate, via musical notation, on the significance of experience "inside": namely, the hold.Keywords: fermata, experience, significance, the hold, prolongation
Procedia PDF Downloads 177139 A Web-Based Systems Immunology Toolkit Allowing the Visualization and Comparative Analysis of Publically Available Collective Data to Decipher Immune Regulation in Early Life
Authors: Mahbuba Rahman, Sabri Boughorbel, Scott Presnell, Charlie Quinn, Darawan Rinchai, Damien Chaussabel, Nico Marr
Abstract:
Collections of large-scale datasets made available in public repositories can be used to identify and fill gaps in biomedical knowledge. But first, these data need to be made readily accessible to researchers for analysis and interpretation. Here a collection of transcriptome datasets was made available to investigate the functional programming of human hematopoietic cells in early life. Thirty two datasets were retrieved from the NCBI Gene Expression Omnibus (GEO) and loaded in a custom, interactive web application called the Gene Expression browser (GXB), designed for visualization and query of integrated large-scale data. Multiple sample groupings and gene rank lists were created based on the study design and variables in each dataset. Web links to customized graphical views can be generated by users and subsequently be used to graphically present data in manuscripts for publication. The GXB tool also enables browsing of a single gene across datasets, which can provide information on the role of a given molecule across biological systems. The dataset collection is available online. As a proof-of-principle, one of the datasets (GSE25087) was re-analyzed to identify genes that are differentially expressed by regulatory T cells in early life. Re-analysis of this dataset and a cross-study comparison using multiple other datasets in the above mentioned collection revealed that PMCH, a gene encoding a precursor of melanin-concentrating hormone (MCH), a cyclic neuropeptide, is highly expressed in a variety of other hematopoietic cell types, including neonatal erythroid cells as well as plasmacytoid dendritic cells upon viral infection. Our findings suggest an as yet unrecognized role of MCH in immune regulation, thereby highlighting the unique potential of the curated dataset collection and systems biology approach to generate new hypotheses which can be tested in future mechanistic studies.Keywords: early-life, GEO datasets, PMCH, interactive query, systems biology
Procedia PDF Downloads 297