Search results for: querying web
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25

Search results for: querying web

25 Parallel Querying of Distributed Ontologies with Shared Vocabulary

Authors: Sharjeel Aslam, Vassil Vassilev, Karim Ouazzane

Abstract:

Ontologies and various semantic repositories became a convenient approach for implementing model-driven architectures of distributed systems on the Web. SPARQL is the standard query language for querying such. However, although SPARQL is well-established standard for querying semantic repositories in RDF and OWL format and there are commonly used APIs which supports it, like Jena for Java, its parallel option is not incorporated in them. This article presents a complete framework consisting of an object algebra for parallel RDF and an index-based implementation of the parallel query engine capable of dealing with the distributed RDF ontologies which share common vocabulary. It has been implemented in Java, and for validation of the algorithms has been applied to the problem of organizing virtual exhibitions on the Web.

Keywords: distributed ontologies, parallel querying, semantic indexing, shared vocabulary, SPARQL

Procedia PDF Downloads 157
24 Fuzzy Semantic Annotation of Web Resources

Authors: Sahar Maâlej Dammak, Anis Jedidi, Rafik Bouaziz

Abstract:

With the great mass of pages managed through the world, and especially with the advent of the Web, their manual annotation is impossible. We focus, in this paper, on the semiautomatic annotation of the web pages. We propose an approach and a framework for semantic annotation of web pages entitled “Querying Web”. Our solution is an enhancement of the first result of annotation done by the “Semantic Radar” Plug-in on the web resources, by annotations using an enriched domain ontology. The concepts of the result of Semantic Radar may be connected to several terms of the ontology, but connections may be uncertain. We represent annotations as possibility distributions. We use the hierarchy defined in the ontology to compute degrees of possibilities. We want to achieve an automation of the fuzzy semantic annotation of web resources.

Keywords: fuzzy semantic annotation, semantic web, domain ontologies, querying web

Procedia PDF Downloads 332
23 Application of IF Rough Data on Knowledge Towards Malaria of Rural Tribal Communities in Tripura

Authors: Chhaya Gangwal, R. N. Bhaumik, Shishir Kumar

Abstract:

Handling uncertainty and impreciseness of knowledge appears to be a challenging task in Information Systems. Intuitionistic fuzzy (IF) and rough set theory enhances databases by allowing it for the management of uncertainty and impreciseness. This paper presents a new efficient query optimization technique for the multi-valued or imprecise IF rough database. The usefulness of this technique was illustrated on malaria knowledge from the rural tribal communities of Tripura where most of the information is multi-valued and imprecise. Then, the querying about knowledge on malaria is executed into SQL server to make the implementation of IF rough data querying simpler.

Keywords: intuitionistic fuzzy set, rough set, relational database, IF rough relational database

Procedia PDF Downloads 399
22 User Guidance for Effective Query Interpretation in Natural Language Interfaces to Ontologies

Authors: Aliyu Isah Agaie, Masrah Azrifah Azmi Murad, Nurfadhlina Mohd Sharef, Aida Mustapha

Abstract:

Natural Language Interfaces typically support a restricted language and also have scopes and limitations that naïve users are unaware of, resulting in errors when the users attempt to retrieve information from ontologies. To overcome this challenge, an auto-suggest feature is introduced into the querying process where users are guided through the querying process using interactive query construction system. Guiding users to formulate their queries, while providing them with an unconstrained (or almost unconstrained) way to query the ontology results in better interpretation of the query and ultimately lead to an effective search. The approach described in this paper is unobtrusive and subtly guides the users, so that they have a choice of either selecting from the suggestion list or typing in full. The user is not coerced into accepting system suggestions and can express himself using fragments or full sentences.

Keywords: auto-suggest, expressiveness, habitability, natural language interface, query interpretation, user guidance

Procedia PDF Downloads 440
21 Spatial Data Mining by Decision Trees

Authors: Sihem Oujdi, Hafida Belbachir

Abstract:

Existing methods of data mining cannot be applied on spatial data because they require spatial specificity consideration, as spatial relationships. This paper focuses on the classification with decision trees, which are one of the data mining techniques. We propose an extension of the C4.5 algorithm for spatial data, based on two different approaches Join materialization and Querying on the fly the different tables. Similar works have been done on these two main approaches, the first - Join materialization - favors the processing time in spite of memory space, whereas the second - Querying on the fly different tables- promotes memory space despite of the processing time. The modified C4.5 algorithm requires three entries tables: a target table, a neighbor table, and a spatial index join that contains the possible spatial relationship among the objects in the target table and those in the neighbor table. Thus, the proposed algorithms are applied to a spatial data pattern in the accidentology domain. A comparative study of our approach with other works of classification by spatial decision trees will be detailed.

Keywords: C4.5 algorithm, decision trees, S-CART, spatial data mining

Procedia PDF Downloads 577
20 An Integrated Mathematical Approach to Measure the Capacity of MMTS

Authors: Bayan Bevrani, Robert L. Burdett, Prasad K. D. V. Yarlagadda

Abstract:

This article focuses upon multi-modal transportation systems (MMTS) and the issues surrounding the determination of system capacity. For that purpose a multi-objective framework is advocated that integrates all the different modes and many different competing capacity objectives. This framework is analytical in nature and facilitates a variety of capacity querying and capacity expansion planning.

Keywords: analytical model, capacity analysis, capacity query, multi-modal transportation system (MMTS)

Procedia PDF Downloads 318
19 A Temporal QoS Ontology For ERTMS/ETCS

Authors: Marc Sango, Olimpia Hoinaru, Christophe Gransart, Laurence Duchien

Abstract:

Ontologies offer a means for representing and sharing information in many domains, particularly in complex domains. For example, it can be used for representing and sharing information of System Requirement Specification (SRS) of complex systems like the SRS of ERTMS/ETCS written in natural language. Since this system is a real-time and critical system, generic ontologies, such as OWL and generic ERTMS ontologies provide minimal support for modeling temporal information omnipresent in these SRS documents. To support the modeling of temporal information, one of the challenges is to enable representation of dynamic features evolving in time within a generic ontology with a minimal redesign of it. The separation of temporal information from other information can help to predict system runtime operation and to properly design and implement them. In addition, it is helpful to provide a reasoning and querying techniques to reason and query temporal information represented in the ontology in order to detect potential temporal inconsistencies. Indeed, a user operation, such as adding a new constraint on existing planning constraints can cause temporal inconsistencies, which can lead to system failures. To address this challenge, we propose a lightweight 3-layer temporal Quality of Service (QoS) ontology for representing, reasoning and querying over temporal and non-temporal information in a complex domain ontology. Representing QoS entities in separated layers can clarify the distinction between the non QoS entities and the QoS entities in an ontology. The upper generic layer of the proposed ontology provides an intuitive knowledge of domain components, specially ERTMS/ETCS components. The separation of the intermediate QoS layer from the lower QoS layer allows us to focus on specific QoS Characteristics, such as temporal or integrity characteristics. In this paper, we focus on temporal information that can be used to predict system runtime operation. To evaluate our approach, an example of the proposed domain ontology for handover operation, as well as a reasoning rule over temporal relations in this domain-specific ontology, are given.

Keywords: system requirement specification, ERTMS/ETCS, temporal ontologies, domain ontologies

Procedia PDF Downloads 375
18 A Framework for SQL Learning: Linking Learning Taxonomy, Cognitive Model and Cross Cutting Factors

Authors: Huda Al Shuaily, Karen Renaud

Abstract:

Databases comprise the foundation of most software systems. System developers inevitably write code to query these databases. The de facto language for querying is SQL and this, consequently, is the default language taught by higher education institutions. There is evidence that learners find it hard to master SQL, harder than mastering other programming languages such as Java. Educators do not agree about explanations for this seeming anomaly. Further investigation may well reveal the reasons. In this paper, we report on our investigations into how novices learn SQL, the actual problems they experience when writing SQL, as well as the differences between expert and novice SQL query writers. We conclude by presenting a model of SQL learning that should inform the instructional material design process better to support the SQL learning process.

Keywords: pattern, SQL, learning, model

Procedia PDF Downloads 223
17 Ontology for a Voice Transcription of OpenStreetMap Data: The Case of Space Apprehension by Visually Impaired Persons

Authors: Said Boularouk, Didier Josselin, Eitan Altman

Abstract:

In this paper, we present a vocal ontology of OpenStreetMap data for the apprehension of space by visually impaired people. Indeed, the platform based on produsage gives a freedom to data producers to choose the descriptors of geocoded locations. Unfortunately, this freedom, called also folksonomy leads to complicate subsequent searches of data. We try to solve this issue in a simple but usable method to extract data from OSM databases in order to send them to visually impaired people using Text To Speech technology. We focus on how to help people suffering from visual disability to plan their itinerary, to comprehend a map by querying computer and getting information about surrounding environment in a mono-modal human-computer dialogue.

Keywords: TTS, ontology, open street map, visually impaired

Procedia PDF Downloads 259
16 Application of Semantic Technologies in Rapid Reconfiguration of Factory Systems

Authors: J. Zhang, K. Agyapong-Kodua

Abstract:

Digital factory based on visual design and simulation has emerged as a mainstream to reduce digital development life cycle. Some basic industrial systems are being integrated via semantic modelling, and products (P) matching process (P)-resource (R) requirements are designed to fulfill current customer demands. Nevertheless, product design is still limited to fixed product models and known knowledge of product engineers. Therefore, this paper presents a rapid reconfiguration method based on semantic technologies with PPR ontologies to reuse known and unknown knowledge. In order to avoid the influence of big data, our system uses a cloud manufactory and distributed database to improve the efficiency of querying meeting PPR requirements.

Keywords: semantic technologies, factory system, digital factory, cloud manufactory

Procedia PDF Downloads 448
15 A Context-Centric Chatbot for Cryptocurrency Using the Bidirectional Encoder Representations from Transformers Neural Networks

Authors: Qitao Xie, Qingquan Zhang, Xiaofei Zhang, Di Tian, Ruixuan Wen, Ting Zhu, Ping Yi, Xin Li

Abstract:

Inspired by the recent movement of digital currency, we are building a question answering system concerning the subject of cryptocurrency using Bidirectional Encoder Representations from Transformers (BERT). The motivation behind this work is to properly assist digital currency investors by directing them to the corresponding knowledge bases that can offer them help and increase the querying speed. BERT, one of newest language models in natural language processing, was investigated to improve the quality of generated responses. We studied different combinations of hyperparameters of the BERT model to obtain the best fit responses. Further, we created an intelligent chatbot for cryptocurrency using BERT. A chatbot using BERT shows great potential for the further advancement of a cryptocurrency market tool. We show that the BERT neural networks generalize well to other tasks by applying it successfully to cryptocurrency.

Keywords: bidirectional encoder representations from transformers, BERT, chatbot, cryptocurrency, deep learning

Procedia PDF Downloads 100
14 Computing Continuous Skyline Queries without Discriminating between Static and Dynamic Attributes

Authors: Ibrahim Gomaa, Hoda M. O. Mokhtar

Abstract:

Although most of the existing skyline queries algorithms focused basically on querying static points through static databases; with the expanding number of sensors, wireless communications and mobile applications, the demand for continuous skyline queries has increased. Unlike traditional skyline queries which only consider static attributes, continuous skyline queries include dynamic attributes, as well as the static ones. However, as skyline queries computation is based on checking the domination of skyline points over all dimensions, considering both the static and dynamic attributes without separation is required. In this paper, we present an efficient algorithm for computing continuous skyline queries without discriminating between static and dynamic attributes. Our algorithm in brief proceeds as follows: First, it excludes the points which will not be in the initial skyline result; this pruning phase reduces the required number of comparisons. Second, the association between the spatial positions of data points is examined; this phase gives an idea of where changes in the result might occur and consequently enables us to efficiently update the skyline result (continuous update) rather than computing the skyline from scratch. Finally, experimental evaluation is provided which demonstrates the accuracy, performance and efficiency of our algorithm over other existing approaches.

Keywords: continuous query processing, dynamic database, moving object, skyline queries

Procedia PDF Downloads 179
13 Recommender System Based on Mining Graph Databases for Data-Intensive Applications

Authors: Mostafa Gamal, Hoda K. Mohamed, Islam El-Maddah, Ali Hamdi

Abstract:

In recent years, many digital documents on the web have been created due to the rapid growth of ’social applications’ communities or ’Data-intensive applications’. The evolution of online-based multimedia data poses new challenges in storing and querying large amounts of data for online recommender systems. Graph data models have been shown to be more efficient than relational data models for processing complex data. This paper will explain the key differences between graph and relational databases, their strengths and weaknesses, and why using graph databases is the best technology for building a realtime recommendation system. Also, The paper will discuss several similarity metrics algorithms that can be used to compute a similarity score of pairs of nodes based on their neighbourhoods or their properties. Finally, the paper will discover how NLP strategies offer the premise to improve the accuracy and coverage of realtime recommendations by extracting the information from the stored unstructured knowledge, which makes up the bulk of the world’s data to enrich the graph database with this information. As the size and number of data items are increasing rapidly, the proposed system should meet current and future needs.

Keywords: graph databases, NLP, recommendation systems, similarity metrics

Procedia PDF Downloads 61
12 Merging and Comparing Ontologies Generically

Authors: Xiuzhan Guo, Arthur Berrill, Ajinkya Kulkarni, Kostya Belezko, Min Luo

Abstract:

Ontology operations, e.g., aligning and merging, were studied and implemented extensively in different settings, such as categorical operations, relation algebras, and typed graph grammars, with different concerns. However, aligning and merging operations in the settings share some generic properties, e.g., idempotence, commutativity, associativity, and representativity, labeled by (I), (C), (A), and (R), respectively, which are defined on an ontology merging system (D~M), where D is a non-empty set of the ontologies concerned, ~ is a binary relation on D modeling ontology aligning and M is a partial binary operation on D modeling ontology merging. Given an ontology repository, a finite set O ⊆ D, its merging closure Ô is the smallest set of ontologies, which contains the repository and is closed with respect to merging. If (I), (C), (A), and (R) are satisfied, then both D and Ô are partially ordered naturally by merging, Ô is finite and can be computed, compared, and sorted efficiently, including sorting, selecting, and querying some specific elements, e.g., maximal ontologies and minimal ontologies. We also show that the ontology merging system, given by ontology V -alignment pairs and pushouts, satisfies the properties: (I), (C), (A), and (R) so that the merging system is partially ordered and the merging closure of a given repository with respect to pushouts can be computed efficiently.

Keywords: ontology aligning, ontology merging, merging system, poset, merging closure, ontology V-alignment pair, ontology homomorphism, ontology V-alignment pair homomorphism, pushout

Procedia PDF Downloads 852
11 Toxic Ingredients Contained in Our Cosmetics

Authors: El Alia Boularas, H. Bekkar, H. Larachi, H. Rezk-kallah

Abstract:

Introduction: Notwithstanding cosmetics are used in life every day, these products are not all innocuous and harmless, as they may contain ingredients responsible for allergic reactions and, possibly, for other health problems. Additionally, environmental pollution should be taken into account. Thus, it is time to investigate what is ‘hidden behind beauty’. Aims: 1.To investigate prevalence of 13 chemical ingredients in cosmetics being object of concern, which the Algerians use regularly. 2.To know the profile of questioned consumers and describe their opinion on cosmetics. Methods: The survey was carried out in year 2013 over a period of 3 months, among Algerian Internet users having an e-mail address or a Facebook account.The study investigated 13 chemical agents showing health and environmental problems, selected after analysis of the recent studies published on the subject, the lists of national and international regulatory references on chemical hazards, and querying the database Skin Deep presented by the Environmental Working Group. Results: 300 people distributed all over the Algerian territory participated in the survey, providing information about 731 cosmetics; 86% aged from 20 to 39 years, with a sex ratio=0,27. A percentage of 43% of the analyzed cosmetics contained at least one of the 13 toxic ingredients. The targeted ingredient that has been most frequently reported was ‘perfume’ followed by parabens and PEG.85% of the participants declared that cosmetics ‘can contain toxic substances’, 27% asserted that they verify regularly the list of ingredients when they buy cosmetics, 61% said that they try to avoid the toxic ingredients, among whom 24 % were more vigilant on the presence of parabens, 95% were in favour of the strengthening of the Algerian laws on cosmetics. Conclusion: The results of the survey provide the indication of a widespread presence of toxic chemical ingredients in personal care products that Algerians use daily.

Keywords: Algerians consumers, cosmetics, survey, toxic ingredients

Procedia PDF Downloads 243
10 Ontology based Fault Detection and Diagnosis system Querying and Reasoning examples

Authors: Marko Batic, Nikola Tomasevic, Sanja Vranes

Abstract:

One of the strongholds in the ubiquitous efforts related to the energy conservation and energy efficiency improvement is represented by the retrofit of high energy consumers in buildings. In general, HVAC systems represent the highest energy consumers in buildings. However they usually suffer from mal-operation and/or malfunction, causing even higher energy consumption than necessary. Various Fault Detection and Diagnosis (FDD) systems can be successfully employed for this purpose, especially when it comes to the application at a single device/unit level. In the case of more complex systems, where multiple devices are operating in the context of the same building, significant energy efficiency improvements can only be achieved through application of comprehensive FDD systems relying on additional higher level knowledge, such as their geographical location, served area, their intra- and inter- system dependencies etc. This paper presents a comprehensive FDD system that relies on the utilization of common knowledge repository that stores all critical information. The discussed system is deployed as a test-bed platform at the two at Fiumicino and Malpensa airports in Italy. This paper aims at presenting advantages of implementation of the knowledge base through the utilization of ontology and offers improved functionalities of such system through examples of typical queries and reasoning that enable derivation of high level energy conservation measures (ECM). Therefore, key SPARQL queries and SWRL rules, based on the two instantiated airport ontologies, are elaborated. The detection of high level irregularities in the operation of airport heating/cooling plants is discussed and estimation of energy savings is reported.

Keywords: airport ontology, knowledge management, ontology modeling, reasoning

Procedia PDF Downloads 473
9 Graph Neural Network-Based Classification for Disease Prediction in Health Care Heterogeneous Data Structures of Electronic Health Record

Authors: Raghavi C. Janaswamy

Abstract:

In the healthcare sector, heterogenous data elements such as patients, diagnosis, symptoms, conditions, observation text from physician notes, and prescriptions form the essentials of the Electronic Health Record (EHR). The data in the form of clear text and images are stored or processed in a relational format in most systems. However, the intrinsic structure restrictions and complex joins of relational databases limit the widespread utility. In this regard, the design and development of realistic mapping and deep connections as real-time objects offer unparallel advantages. Herein, a graph neural network-based classification of EHR data has been developed. The patient conditions have been predicted as a node classification task using a graph-based open source EHR data, Synthea Database, stored in Tigergraph. The Synthea DB dataset is leveraged due to its closer representation of the real-time data and being voluminous. The graph model is built from the EHR heterogeneous data using python modules, namely, pyTigerGraph to get nodes and edges from the Tigergraph database, PyTorch to tensorize the nodes and edges, PyTorch-Geometric (PyG) to train the Graph Neural Network (GNN) and adopt the self-supervised learning techniques with the AutoEncoders to generate the node embeddings and eventually perform the node classifications using the node embeddings. The model predicts patient conditions ranging from common to rare situations. The outcome is deemed to open up opportunities for data querying toward better predictions and accuracy.

Keywords: electronic health record, graph neural network, heterogeneous data, prediction

Procedia PDF Downloads 50
8 Unsupervised Learning and Similarity Comparison of Water Mass Characteristics with Gaussian Mixture Model for Visualizing Ocean Data

Authors: Jian-Heng Wu, Bor-Shen Lin

Abstract:

The temperature-salinity relationship is one of the most important characteristics used for identifying water masses in marine research. Temperature-salinity characteristics, however, may change dynamically with respect to the geographic location and is quite sensitive to the depth at the same location. When depth is taken into consideration, however, it is not easy to compare the characteristics of different water masses efficiently for a wide range of areas of the ocean. In this paper, the Gaussian mixture model was proposed to analyze the temperature-salinity-depth characteristics of water masses, based on which comparison between water masses may be conducted. Gaussian mixture model could model the distribution of a random vector and is formulated as the weighting sum for a set of multivariate normal distributions. The temperature-salinity-depth data for different locations are first used to train a set of Gaussian mixture models individually. The distance between two Gaussian mixture models can then be defined as the weighting sum of pairwise Bhattacharyya distances among the Gaussian distributions. Consequently, the distance between two water masses may be measured fast, which allows the automatic and efficient comparison of the water masses for a wide range area. The proposed approach not only can approximate the distribution of temperature, salinity, and depth directly without the prior knowledge for assuming the regression family, but may restrict the complexity by controlling the number of mixtures when the amounts of samples are unevenly distributed. In addition, it is critical for knowledge discovery in marine research to represent, manage and share the temperature-salinity-depth characteristics flexibly and responsively. The proposed approach has been applied to a real-time visualization system of ocean data, which may facilitate the comparison of water masses by aggregating the data without degrading the discriminating capabilities. This system provides an interface for querying geographic locations with similar temperature-salinity-depth characteristics interactively and for tracking specific patterns of water masses, such as the Kuroshio near Taiwan or those in the South China Sea.

Keywords: water mass, Gaussian mixture model, data visualization, system framework

Procedia PDF Downloads 95
7 Towards an Environmental Knowledge System in Water Management

Authors: Mareike Dornhoefer, Madjid Fathi

Abstract:

Water supply and water quality are key problems of mankind at the moment and - due to increasing population - in the future. Management disciplines like water, environment and quality management therefore need to closely interact, to establish a high level of water quality and to guarantee water supply in all parts of the world. Groundwater remediation is one aspect in this process. From a knowledge management perspective it is only possible to solve complex ecological or environmental problems if different factors, expert knowledge of various stakeholders and formal regulations regarding water, waste or chemical management are interconnected in form of a knowledge base. In general knowledge management focuses the processes of gathering and representing existing and new knowledge in a way, which allows for inference or deduction of knowledge for e.g. a situation where a problem solution or decision support are required. A knowledge base is no sole data repository, but a key element in a knowledge based system, thus providing or allowing for inference mechanisms to deduct further knowledge from existing facts. In consequence this knowledge provides decision support. The given paper introduces an environmental knowledge system in water management. The proposed environmental knowledge system is part of a research concept called Green Knowledge Management. It applies semantic technologies or concepts such as ontology or linked open data to interconnect different data and information sources about environmental aspects, in this case, water quality, as well as background material enriching an established knowledge base. Examples for the aforementioned ecological or environmental factors threatening water quality are among others industrial pollution (e.g. leakage of chemicals), environmental changes (e.g. rise in temperature) or floods, where all kinds of waste are merged and transferred into natural water environments. Water quality is usually determined with the help of measuring different indicators (e.g. chemical or biological), which are gathered with the help of laboratory testing, continuous monitoring equipment or other measuring processes. During all of these processes data are gathered and stored in different databases. Meanwhile the knowledge base needs to be established through interconnecting data of these different data sources and enriching its semantics. Experts may add their knowledge or experiences of previous incidents or influencing factors. In consequence querying or inference mechanisms are applied for the deduction of coherence between indicators, predictive developments or environmental threats. Relevant processes or steps of action may be modeled in form of a rule based approach. Overall the environmental knowledge system supports the interconnection of information and adding semantics to create environmental knowledge about water environment, supply chain as well as quality. The proposed concept itself is a holistic approach, which links to associated disciplines like environmental and quality management. Quality indicators and quality management steps need to be considered e.g. for the process and inference layers of the environmental knowledge system, thus integrating the aforementioned management disciplines in one water management application.

Keywords: water quality, environmental knowledge system, green knowledge management, semantic technologies, quality management

Procedia PDF Downloads 185
6 Citation Analysis of New Zealand Court Decisions

Authors: Tobias Milz, L. Macpherson, Varvara Vetrova

Abstract:

The law is a fundamental pillar of human societies as it shapes, controls and governs how humans conduct business, behave and interact with each other. Recent advances in computer-assisted technologies such as NLP, data science and AI are creating opportunities to support the practice, research and study of this pervasive domain. It is therefore not surprising that there has been an increase in investments into supporting technologies for the legal industry (also known as “legal tech” or “law tech”) over the last decade. A sub-discipline of particular appeal is concerned with assisted legal research. Supporting law researchers and practitioners to retrieve information from the vast amount of ever-growing legal documentation is of natural interest to the legal research community. One tool that has been in use for this purpose since the early nineteenth century is legal citation indexing. Among other use cases, they provided an effective means to discover new precedent cases. Nowadays, computer-assisted network analysis tools can allow for new and more efficient ways to reveal the “hidden” information that is conveyed through citation behavior. Unfortunately, access to openly available legal data is still lacking in New Zealand and access to such networks is only commercially available via providers such as LexisNexis. Consequently, there is a need to create, analyze and provide a legal citation network with sufficient data to support legal research tasks. This paper describes the development and analysis of a legal citation Network for New Zealand containing over 300.000 decisions from 125 different courts of all areas of law and jurisdiction. Using python, the authors assembled web crawlers, scrapers and an OCR pipeline to collect and convert court decisions from openly available sources such as NZLII into uniform and machine-readable text. This facilitated the use of regular expressions to identify references to other court decisions from within the decision text. The data was then imported into a graph-based database (Neo4j) with the courts and their respective cases represented as nodes and the extracted citations as links. Furthermore, additional links between courts of connected cases were added to indicate an indirect citation between the courts. Neo4j, as a graph-based database, allows efficient querying and use of network algorithms such as PageRank to reveal the most influential/most cited courts and court decisions over time. This paper shows that the in-degree distribution of the New Zealand legal citation network resembles a power-law distribution, which indicates a possible scale-free behavior of the network. This is in line with findings of the respective citation networks of the U.S. Supreme Court, Austria and Germany. The authors of this paper provide the database as an openly available data source to support further legal research. The decision texts can be exported from the database to be used for NLP-related legal research, while the network can be used for in-depth analysis. For example, users of the database can specify the network algorithms and metrics to only include specific courts to filter the results to the area of law of interest.

Keywords: case citation network, citation analysis, network analysis, Neo4j

Procedia PDF Downloads 70
5 Worldwide GIS Based Earthquake Information System/Alarming System for Microzonation/Liquefaction and It’s Application for Infrastructure Development

Authors: Rajinder Kumar Gupta, Rajni Kant Agrawal, Jaganniwas

Abstract:

One of the most frightening phenomena of nature is the occurrence of earthquake as it has terrible and disastrous effects. Many earthquakes occur every day worldwide. There is need to have knowledge regarding the trends in earthquake occurrence worldwide. The recoding and interpretation of data obtained from the establishment of the worldwide system of seismological stations made this possible. From the analysis of recorded earthquake data, the earthquake parameters and source parameters can be computed and the earthquake catalogues can be prepared. These catalogues provide information on origin, time, epicenter locations (in term of latitude and longitudes) focal depths, magnitude and other related details of the recorded earthquakes. Theses catalogues are used for seismic hazard estimation. Manual interpretation and analysis of these data is tedious and time consuming. A geographical information system is a computer based system designed to store, analyzes and display geographic information. The implementation of integrated GIS technology provides an approach which permits rapid evaluation of complex inventor database under a variety of earthquake scenario and allows the user to interactively view results almost immediately. GIS technology provides a powerful tool for displaying outputs and permit to users to see graphical distribution of impacts of different earthquake scenarios and assumptions. An endeavor has been made in present study to compile the earthquake data for the whole world in visual Basic on ARC GIS Plate form so that it can be used easily for further analysis to be carried out by earthquake engineers. The basic data on time of occurrence, location and size of earthquake has been compiled for further querying based on various parameters. A preliminary analysis tool is also provided in the user interface to interpret the earthquake recurrence in region. The user interface also includes the seismic hazard information already worked out under GHSAP program. The seismic hazard in terms of probability of exceedance in definite return periods is provided for the world. The seismic zones of the Indian region are included in the user interface from IS 1893-2002 code on earthquake resistant design of buildings. The City wise satellite images has been inserted in Map and based on actual data the following information could be extracted in real time: • Analysis of soil parameters and its effect • Microzonation information • Seismic hazard and strong ground motion • Soil liquefaction and its effect in surrounding area • Impacts of liquefaction on buildings and infrastructure • Occurrence of earthquake in future and effect on existing soil • Propagation of earth vibration due of occurrence of Earthquake GIS based earthquake information system has been prepared for whole world in Visual Basic on ARC GIS Plate form and further extended micro level based on actual soil parameters. Individual tools has been developed for liquefaction, earthquake frequency etc. All information could be used for development of infrastructure i.e. multi story structure, Irrigation Dam & Its components, Hydro-power etc in real time for present and future.

Keywords: GIS based earthquake information system, microzonation, analysis and real time information about liquefaction, infrastructure development

Procedia PDF Downloads 281
4 SPARK: An Open-Source Knowledge Discovery Platform That Leverages Non-Relational Databases and Massively Parallel Computational Power for Heterogeneous Genomic Datasets

Authors: Thilina Ranaweera, Enes Makalic, John L. Hopper, Adrian Bickerstaffe

Abstract:

Data are the primary asset of biomedical researchers, and the engine for both discovery and research translation. As the volume and complexity of research datasets increase, especially with new technologies such as large single nucleotide polymorphism (SNP) chips, so too does the requirement for software to manage, process and analyze the data. Researchers often need to execute complicated queries and conduct complex analyzes of large-scale datasets. Existing tools to analyze such data, and other types of high-dimensional data, unfortunately suffer from one or more major problems. They typically require a high level of computing expertise, are too simplistic (i.e., do not fit realistic models that allow for complex interactions), are limited by computing power, do not exploit the computing power of large-scale parallel architectures (e.g. supercomputers, GPU clusters etc.), or are limited in the types of analysis available, compounded by the fact that integrating new analysis methods is not straightforward. Solutions to these problems, such as those developed and implemented on parallel architectures, are currently available to only a relatively small portion of medical researchers with access and know-how. The past decade has seen a rapid expansion of data management systems for the medical domain. Much attention has been given to systems that manage phenotype datasets generated by medical studies. The introduction of heterogeneous genomic data for research subjects that reside in these systems has highlighted the need for substantial improvements in software architecture. To address this problem, we have developed SPARK, an enabling and translational system for medical research, leveraging existing high performance computing resources, and analysis techniques currently available or being developed. It builds these into The Ark, an open-source web-based system designed to manage medical data. SPARK provides a next-generation biomedical data management solution that is based upon a novel Micro-Service architecture and Big Data technologies. The system serves to demonstrate the applicability of Micro-Service architectures for the development of high performance computing applications. When applied to high-dimensional medical datasets such as genomic data, relational data management approaches with normalized data structures suffer from unfeasibly high execution times for basic operations such as insert (i.e. importing a GWAS dataset) and the queries that are typical of the genomics research domain. SPARK resolves these problems by incorporating non-relational NoSQL databases that have been driven by the emergence of Big Data. SPARK provides researchers across the world with user-friendly access to state-of-the-art data management and analysis tools while eliminating the need for high-level informatics and programming skills. The system will benefit health and medical research by eliminating the burden of large-scale data management, querying, cleaning, and analysis. SPARK represents a major advancement in genome research technologies, vastly reducing the burden of working with genomic datasets, and enabling cutting edge analysis approaches that have previously been out of reach for many medical researchers.

Keywords: biomedical research, genomics, information systems, software

Procedia PDF Downloads 228
3 Enhancing Scalability in Ethereum Network Analysis: Methods and Techniques

Authors: Stefan K. Behfar

Abstract:

The rapid growth of the Ethereum network has brought forth the urgent need for scalable analysis methods to handle the increasing volume of blockchain data. In this research, we propose efficient methodologies for making Ethereum network analysis scalable. Our approach leverages a combination of graph-based data representation, probabilistic sampling, and parallel processing techniques to achieve unprecedented scalability while preserving critical network insights. Data Representation: We develop a graph-based data representation that captures the underlying structure of the Ethereum network. Each block transaction is represented as a node in the graph, while the edges signify temporal relationships. This representation ensures efficient querying and traversal of the blockchain data. Probabilistic Sampling: To cope with the vastness of the Ethereum blockchain, we introduce a probabilistic sampling technique. This method strategically selects a representative subset of transactions and blocks, allowing for concise yet statistically significant analysis. The sampling approach maintains the integrity of the network properties while significantly reducing the computational burden. Graph Convolutional Networks (GCNs): We incorporate GCNs to process the graph-based data representation efficiently. The GCN architecture enables the extraction of complex spatial and temporal patterns from the sampled data. This combination of graph representation and GCNs facilitates parallel processing and scalable analysis. Distributed Computing: To further enhance scalability, we adopt distributed computing frameworks such as Apache Hadoop and Apache Spark. By distributing computation across multiple nodes, we achieve a significant reduction in processing time and enhanced memory utilization. Our methodology harnesses the power of parallelism, making it well-suited for large-scale Ethereum network analysis. Evaluation and Results: We extensively evaluate our methodology on real-world Ethereum datasets covering diverse time periods and transaction volumes. The results demonstrate its superior scalability, outperforming traditional analysis methods. Our approach successfully handles the ever-growing Ethereum data, empowering researchers and developers with actionable insights from the blockchain. Case Studies: We apply our methodology to real-world Ethereum use cases, including detecting transaction patterns, analyzing smart contract interactions, and predicting network congestion. The results showcase the accuracy and efficiency of our approach, emphasizing its practical applicability in real-world scenarios. Security and Robustness: To ensure the reliability of our methodology, we conduct thorough security and robustness evaluations. Our approach demonstrates high resilience against adversarial attacks and perturbations, reaffirming its suitability for security-critical blockchain applications. Conclusion: By integrating graph-based data representation, GCNs, probabilistic sampling, and distributed computing, we achieve network scalability without compromising analytical precision. This approach addresses the pressing challenges posed by the expanding Ethereum network, opening new avenues for research and enabling real-time insights into decentralized ecosystems. Our work contributes to the development of scalable blockchain analytics, laying the foundation for sustainable growth and advancement in the domain of blockchain research and application.

Keywords: Ethereum, scalable network, GCN, probabilistic sampling, distributed computing

Procedia PDF Downloads 24
2 VIAN-DH: Computational Multimodal Conversation Analysis Software and Infrastructure

Authors: Teodora Vukovic, Christoph Hottiger, Noah Bubenhofer

Abstract:

The development of VIAN-DH aims at bridging two linguistic approaches: conversation analysis/interactional linguistics (IL), so far a dominantly qualitative field, and computational/corpus linguistics and its quantitative and automated methods. Contemporary IL investigates the systematic organization of conversations and interactions composed of speech, gaze, gestures, and body positioning, among others. These highly integrated multimodal behaviour is analysed based on video data aimed at uncovering so called “multimodal gestalts”, patterns of linguistic and embodied conduct that reoccur in specific sequential positions employed for specific purposes. Multimodal analyses (and other disciplines using videos) are so far dependent on time and resource intensive processes of manual transcription of each component from video materials. Automating these tasks requires advanced programming skills, which is often not in the scope of IL. Moreover, the use of different tools makes the integration and analysis of different formats challenging. Consequently, IL research often deals with relatively small samples of annotated data which are suitable for qualitative analysis but not enough for making generalized empirical claims derived quantitatively. VIAN-DH aims to create a workspace where many annotation layers required for the multimodal analysis of videos can be created, processed, and correlated in one platform. VIAN-DH will provide a graphical interface that operates state-of-the-art tools for automating parts of the data processing. The integration of tools that already exist in computational linguistics and computer vision, facilitates data processing for researchers lacking programming skills, speeds up the overall research process, and enables the processing of large amounts of data. The main features to be introduced are automatic speech recognition for the transcription of language, automatic image recognition for extraction of gestures and other visual cues, as well as grammatical annotation for adding morphological and syntactic information to the verbal content. In the ongoing instance of VIAN-DH, we focus on gesture extraction (pointing gestures, in particular), making use of existing models created for sign language and adapting them for this specific purpose. In order to view and search the data, VIAN-DH will provide a unified format and enable the import of the main existing formats of annotated video data and the export to other formats used in the field, while integrating different data source formats in a way that they can be combined in research. VIAN-DH will adapt querying methods from corpus linguistics to enable parallel search of many annotation levels, combining token-level and chronological search for various types of data. VIAN-DH strives to bring crucial and potentially revolutionary innovation to the field of IL, (that can also extend to other fields using video materials). It will allow the processing of large amounts of data automatically and, the implementation of quantitative analyses, combining it with the qualitative approach. It will facilitate the investigation of correlations between linguistic patterns (lexical or grammatical) with conversational aspects (turn-taking or gestures). Users will be able to automatically transcribe and annotate visual, spoken and grammatical information from videos, and to correlate those different levels and perform queries and analyses.

Keywords: multimodal analysis, corpus linguistics, computational linguistics, image recognition, speech recognition

Procedia PDF Downloads 64
1 The Road Ahead: Merging Human Cyber Security Expertise with Generative AI

Authors: Brennan Lodge

Abstract:

Cybersecurity professionals have long been embroiled in a digital arms race, confronting increasingly sophisticated threats with innovative solutions. The field of cybersecurity is in an unending race against malicious adversaries. As threats evolve in complexity, the tools used to defend against them need to advance even faster. Burdened with a vast arsenal of tools and an expansive scope of threat intelligence, analysts frequently navigate a complex web, trying to discern patterns amidst information overload. Herein lies the potential of Retrieval Augmented Generation (RAG). By combining the capabilities of Large Language Models (LLMs) with a generative AI facet, RAG brings to the table an unparalleled ability for real-time cross-referencing, bridging the gap between raw data and actionable insights. Imagine an analyst named Sarah working at a global Fortune 500 company. Every day, Sarah navigates a maze of diverse knowledge bases, real-time threat intelligence, and her company's vast proprietary data, from network specifics to intricate technical blueprints. One day, she's challenged by a potential breach through a personal device due to the company's global "Bring Your Own Device" policy. With the clock ticking, Sarah has mere minutes to trace the malware's origin, all while considering complex regional regulations. As she races against the benchmark of Mean Time To Resolution (MTTR), she wonders: Could "Cozy Bear" with its notorious malware tactic, HAMMERTOSS, be behind this? Balancing policy intricacies, global network considerations, and ever-emerging cyber threats, Sarah's role epitomizes the intense challenges faced by today's cybersecurity analysts. While analysts grapple with this array of intricate, time-sensitive challenges, the necessity for precision and efficiency is key. RAG technology—a cutting-edge advancement in Gen AI—is a promising solution. Designed to assimilate diverse data sources such as cyber advisory notices, phishing email sentiment, secure and insecure code examples, information security policy documentation, and the MITRE ATT&CK framework, RAG equips analysts with real-time querying capabilities through a vector database and a cross referenced concise response from a Gen AI model. Traditional relational databases often necessitate a tedious process of filtering through numerous entries. Now, with the synergy of vector databases and Gen AI models, analysts can rapidly access both contextually or semantically akin data points. This augmented approach equips analysts with a comprehensive understanding of the prevailing cyber threats, elevating the robustness of cybersecurity defenses and upskilling the analyst and team, too. Vector databases underpin the knowledge translation in Gen AI. They bridge the gap between raw data and translation into meaningful insights, ensuring that analysts are equipped with comprehensive and relevant information. This superior capability of the RAG framework, with its impressive depth and precision, finds application across a broad spectrum of cybersecurity challenges. Let's delve into some use cases where its potential becomes particularly evident: Phishing Email Sentiment Analysis: Phishing remains a predominant vector for cybersecurity breaches. Leveraging RAG's capabilities, analysts can not only assess the potential malevolence of an email but can also understand the context behind it. By cross-referencing patterns from varied data sources in real-time, the detection process evolves from a mere content evaluation to a holistic understanding of attacker tactics, behaviors, and evolving profiles. This allows for the identification of nuanced phishing strategies that might otherwise go undetected. Insecure Code Analysis: Software vulnerabilities form a critical entry point for cyber adversaries. With RAG, the process of code evaluation undergoes a transformation. Instead of manual code reviews, the system pulls insights from vector databases and historical code snippets marked as insecure, enabling detection of vulnerabilities based on historical patterns, emerging threat vectors, and even predictive threat modeling. This ensures that even the most obfuscated or embedded vulnerabilities are identified, and corrective measures can be promptly implemented. Vulnerability and Upskill Advisory: In the fast-paced world of cybersecurity, staying updated is paramount. Through RAG's capabilities, analysts are not only made aware of real-time vulnerabilities but are also guided on the necessary skills and tools needed to combat them. By dynamically sourcing data through vulnerability advisories, news on advanced persistent threats, and tactics to defend, RAG ensures that analysts are not only reactive to threats but are also proactively upskilled, thereby bolstering their defense mechanisms. Information Security Policies for Compliance Teams: Compliance remains at the heart of many organizational cybersecurity strategies. However, with ever-shifting regulatory landscapes, staying compliant becomes a moving target. RAG's ability to source real-time data ensures that compliance teams always have access to the latest policy changes, guidelines, and best practices. This not only facilitates adherence to current standards but also anticipates future shifts, assists with audits, and ensures that organizations remain ahead of the compliance curve. Fusing a RAG architecture with platforms like Slack amplifies its practical utility. Slack, known for its real-time communication prowess, seamlessly evolves into more than just a messaging platform in this context. Cybersecurity analysts can pose intricate queries within Slack and, almost instantaneously, receive comprehensive feedback powered by the harmonious interplay of RAG and Gen AI. This integration effectively transforms Slack into an AI-augmented chatbot-like assistant for cybersecurity professionals, always ready to provide informed insights on-demand, making it an indispensable ally in the ever-evolving cyber battlefield. Navigating the vast landscape of cybersecurity, analysts often encounter unfamiliar terminologies and techniques., analysts require tools that not only detect or inform them of threats, like CISA (U.S Cybersecurity Infrastructure Security Agency) Advisories, but also interpret and communicate them effectively. Consider a junior cybersecurity analyst named Alex, who comes across the term "Kerberoasting" while reviewing a network log. Unfamiliar with its intricacies, Alex turns to Slack to pose a query: "chat explain is Kerberoasting, using CISA." Almost instantaneously, Slack, powered by the harmonious interplay of RAG and Gen AI, provides a detailed response, cross-referencing a recent cyber advisory on the technique. It explains how attackers can exploit the Kerberos Ticket Granting Service to decipher service account passwords, potentially compromising a network. In this dynamic realm of cybersecurity, the blend of RAG and Generative AI represents more than just a technological leap. It embodies a paradigm shift, promising a future where human expertise and AI-driven precision join forces. As cyber threats continue their relentless advance, this synergy ensures that defenders are equipped with an arsenal that's not just reactive, but also profoundly insightful. No longer should analysts be submerged in a deluge of data without direction. Instead, they should be empowered, to discern, act, and preempt with unparalleled clarity and confidence. By harmoniously intertwining human discernment with AI capabilities, we should chart a path towards a future where cybersecurity is not just about defense, but about achieving a strategic advantage, paving the way for a safer, informed and a more secure digital horizon.

Keywords: cybersecurity, gen AI, retrieval augmented generation, cybersecurity defense strategies

Procedia PDF Downloads 33