Search results for: graph partition
369 Hierarchical Clustering Algorithms in Data Mining
Authors: Z. Abdullah, A. R. Hamdan
Abstract:
Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the areas in data mining and it can be classified into partition, hierarchical, density based, and grid-based. Therefore, in this paper, we do a survey and review for four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON, and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems, as well as deriving more robust and scalable algorithms for clustering.Keywords: clustering, unsupervised learning, algorithms, hierarchical
Procedia PDF Downloads 885368 Mechanisms Underlying Comprehension of Visualized Personal Health Information: An Eye Tracking Study
Authors: Da Tao, Mingfu Qin, Wenkai Li, Tieyan Wang
Abstract:
While the use of electronic personal health portals has gained increasing popularity in the healthcare industry, users usually experience difficulty in comprehending and correctly responding to personal health information, partly due to inappropriate or poor presentation of the information. The way personal health information is visualized may affect how users perceive and assess their personal health information. This study was conducted to examine the effects of information visualization format and visualization mode on the comprehension and perceptions of personal health information among personal health information users with eye tracking techniques. A two-factor within-subjects experimental design was employed, where participants were instructed to complete a series of personal health information comprehension tasks under varied types of visualization mode (i.e., whether the information visualization is static or dynamic) and three visualization formats (i.e., bar graph, instrument-like graph, and text-only format). Data on a set of measures, including comprehension performance, perceptions, and eye movement indicators, were collected during the task completion in the experiment. Repeated measure analysis of variance analyses (RM-ANOVAs) was used for data analysis. The results showed that while the visualization format yielded no effects on comprehension performance, it significantly affected users’ perceptions (such as perceived ease of use and satisfaction). The two graphic visualizations yielded significantly higher favorable scores on subjective evaluations than that of the text format. While visualization mode showed no effects on users’ perception measures, it significantly affected users' comprehension performance in that dynamic visualization significantly reduced users' information search time. Both visualization format and visualization mode had significant main effects on eye movement behaviors, and their interaction effects were also significant. While the bar graph format and text format had similar time to first fixation across dynamic and static visualizations, instrument-like graph format had a larger time to first fixation for dynamic visualization than for static visualization. The two graphic visualization formats yielded shorter total fixation duration compared with the text-only format, indicating their ability to improve information comprehension efficiency. The results suggest that dynamic visualization can improve efficiency in comprehending important health information, and graphic visualization formats were favored more by users. The findings are helpful in the underlying comprehension mechanism of visualized personal health information and provide important implications for optimal design and visualization of personal health information.Keywords: eye tracking, information comprehension, personal health information, visualization
Procedia PDF Downloads 109367 An Optimized Approach to Generate the Possible States of Football Tournaments Final Table
Authors: Mouslem Damkhi
Abstract:
This paper focuses on possible states of a football tournament final table according to the number of participating teams. Each team holds a position in the table with which it is possible to determine the highest and lowest points for that team. This paper proposes an optimized search space based on the minimum and maximum number of points which can be gained by each team to produce and enumerate the possible states for a football tournament final table. The proposed search space minimizes producing the invalid states which cannot occur during a football tournament. The generated states are filtered by a validity checking algorithm which seeks to reach a tournament graph based on a generated state. Thus, the algorithm provides a way to determine which team’s wins, draws and loses values guarantee a particular table position. The paper also presents and discusses the experimental results of the approach on the tournaments with up to eight teams. Comparing with a blind search algorithm, our proposed approach reduces generating the invalid states up to 99.99%, which results in a considerable optimization in term of the execution time.Keywords: combinatorics, enumeration, graph, tournament
Procedia PDF Downloads 122366 Source Identification Model Based on Label Propagation and Graph Ordinary Differential Equations
Authors: Fuyuan Ma, Yuhan Wang, Junhe Zhang, Ying Wang
Abstract:
Identifying the sources of information dissemination is a pivotal task in the study of collective behaviors in networks, enabling us to discern and intercept the critical pathways through which information propagates from its origins. This allows for the control of the information’s dissemination impact in its early stages. Numerous methods for source detection rely on pre-existing, underlying propagation models as prior knowledge. Current models that eschew prior knowledge attempt to harness label propagation algorithms to model the statistical characteristics of propagation states or employ Graph Neural Networks (GNNs) for deep reverse modeling of the diffusion process. These approaches are either deficient in modeling the propagation patterns of information or are constrained by the over-smoothing problem inherent in GNNs, which limits the stacking of sufficient model depth to excavate global propagation patterns. Consequently, we introduce the ODESI model. Initially, the model employs a label propagation algorithm to delineate the distribution density of infected states within a graph structure and extends the representation of infected states from integers to state vectors, which serve as the initial states of nodes. Subsequently, the model constructs a deep architecture based on GNNs-coupled Ordinary Differential Equations (ODEs) to model the global propagation patterns of continuous propagation processes. Addressing the challenges associated with solving ODEs on graphs, we approximate the analytical solutions to reduce computational costs. Finally, we conduct simulation experiments on two real-world social network datasets, and the results affirm the efficacy of our proposed ODESI model in source identification tasks.Keywords: source identification, ordinary differential equations, label propagation, complex networks
Procedia PDF Downloads 20365 A Polynomial Approach for a Graphical-based Integrated Production and Transport Scheduling with Capacity Restrictions
Authors: M. Ndeley
Abstract:
The performance of global manufacturing supply chains depends on the interaction of production and transport processes. Currently, the scheduling of these processes is done separately without considering mutual requirements, which leads to no optimal solutions. An integrated scheduling of both processes enables the improvement of supply chain performance. The integrated production and transport scheduling problem (PTSP) is NP-hard, so that heuristic methods are necessary to efficiently solve large problem instances as in the case of global manufacturing supply chains. This paper presents a heuristic scheduling approach which handles the integration of flexible production processes with intermodal transport, incorporating flexible land transport. The method is based on a graph that allows a reformulation of the PTSP as a shortest path problem for each job, which can be solved in polynomial time. The proposed method is applied to a supply chain scenario with a manufacturing facility in South Africa and shipments of finished product to customers within the Country. The obtained results show that the approach is suitable for the scheduling of large-scale problems and can be flexibly adapted to different scenarios.Keywords: production and transport scheduling problem, graph based scheduling, integrated scheduling
Procedia PDF Downloads 474364 Knowledge Graph Development to Connect Earth Metadata and Standard English Queries
Authors: Gabriel Montague, Max Vilgalys, Catherine H. Crawford, Jorge Ortiz, Dava Newman
Abstract:
There has never been so much publicly accessible atmospheric and environmental data. The possibilities of these data are exciting, but the sheer volume of available datasets represents a new challenge for researchers. The task of identifying and working with a new dataset has become more difficult with the amount and variety of available data. Datasets are often documented in ways that differ substantially from the common English used to describe the same topics. This presents a barrier not only for new scientists, but for researchers looking to find comparisons across multiple datasets or specialists from other disciplines hoping to collaborate. This paper proposes a method for addressing this obstacle: creating a knowledge graph to bridge the gap between everyday English language and the technical language surrounding these datasets. Knowledge graph generation is already a well-established field, although there are some unique challenges posed by working with Earth data. One is the sheer size of the databases – it would be infeasible to replicate or analyze all the data stored by an organization like The National Aeronautics and Space Administration (NASA) or the European Space Agency. Instead, this approach identifies topics from metadata available for datasets in NASA’s Earthdata database, which can then be used to directly request and access the raw data from NASA. By starting with a single metadata standard, this paper establishes an approach that can be generalized to different databases, but leaves the challenge of metadata harmonization for future work. Topics generated from the metadata are then linked to topics from a collection of English queries through a variety of standard and custom natural language processing (NLP) methods. The results from this method are then compared to a baseline of elastic search applied to the metadata. This comparison shows the benefits of the proposed knowledge graph system over existing methods, particularly in interpreting natural language queries and interpreting topics in metadata. For the research community, this work introduces an application of NLP to the ecological and environmental sciences, expanding the possibilities of how machine learning can be applied in this discipline. But perhaps more importantly, it establishes the foundation for a platform that can enable common English to access knowledge that previously required considerable effort and experience. By making this public data accessible to the full public, this work has the potential to transform environmental understanding, engagement, and action.Keywords: earth metadata, knowledge graphs, natural language processing, question-answer systems
Procedia PDF Downloads 149363 Real-Time Network Anomaly Detection Systems Based on Machine-Learning Algorithms
Authors: Zahra Ramezanpanah, Joachim Carvallo, Aurelien Rodriguez
Abstract:
This paper aims to detect anomalies in streaming data using machine learning algorithms. In this regard, we designed two separate pipelines and evaluated the effectiveness of each separately. The first pipeline, based on supervised machine learning methods, consists of two phases. In the first phase, we trained several supervised models using the UNSW-NB15 data-set. We measured the efficiency of each using different performance metrics and selected the best model for the second phase. At the beginning of the second phase, we first, using Argus Server, sniffed a local area network. Several types of attacks were simulated and then sent the sniffed data to a running algorithm at short intervals. This algorithm can display the results of each packet of received data in real-time using the trained model. The second pipeline presented in this paper is based on unsupervised algorithms, in which a Temporal Graph Network (TGN) is used to monitor a local network. The TGN is trained to predict the probability of future states of the network based on its past behavior. Our contribution in this section is introducing an indicator to identify anomalies from these predicted probabilities.Keywords: temporal graph network, anomaly detection, cyber security, IDS
Procedia PDF Downloads 103362 Row Detection and Graph-Based Localization in Tree Nurseries Using a 3D LiDAR
Authors: Ionut Vintu, Stefan Laible, Ruth Schulz
Abstract:
Agricultural robotics has been developing steadily over recent years, with the goal of reducing and even eliminating pesticides used in crops and to increase productivity by taking over human labor. The majority of crops are arranged in rows. The first step towards autonomous robots, capable of driving in fields and performing crop-handling tasks, is for robots to robustly detect the rows of plants. Recent work done towards autonomous driving between plant rows offers big robotic platforms equipped with various expensive sensors as a solution to this problem. These platforms need to be driven over the rows of plants. This approach lacks flexibility and scalability when it comes to the height of plants or distance between rows. This paper proposes instead an algorithm that makes use of cheaper sensors and has a higher variability. The main application is in tree nurseries. Here, plant height can range from a few centimeters to a few meters. Moreover, trees are often removed, leading to gaps within the plant rows. The core idea is to combine row detection algorithms with graph-based localization methods as they are used in SLAM. Nodes in the graph represent the estimated pose of the robot, and the edges embed constraints between these poses or between the robot and certain landmarks. This setup aims to improve individual plant detection and deal with exception handling, like row gaps, which are falsely detected as an end of rows. Four methods were developed for detecting row structures in the fields, all using a point cloud acquired with a 3D LiDAR as an input. Comparing the field coverage and number of damaged plants, the method that uses a local map around the robot proved to perform the best, with 68% covered rows and 25% damaged plants. This method is further used and combined with a graph-based localization algorithm, which uses the local map features to estimate the robot’s position inside the greater field. Testing the upgraded algorithm in a variety of simulated fields shows that the additional information obtained from localization provides a boost in performance over methods that rely purely on perception to navigate. The final algorithm achieved a row coverage of 80% and an accuracy of 27% damaged plants. Future work would focus on achieving a perfect score of 100% covered rows and 0% damaged plants. The main challenges that the algorithm needs to overcome are fields where the height of the plants is too small for the plants to be detected and fields where it is hard to distinguish between individual plants when they are overlapping. The method was also tested on a real robot in a small field with artificial plants. The tests were performed using a small robot platform equipped with wheel encoders, an IMU and an FX10 3D LiDAR. Over ten runs, the system achieved 100% coverage and 0% damaged plants. The framework built within the scope of this work can be further used to integrate data from additional sensors, with the goal of achieving even better results.Keywords: 3D LiDAR, agricultural robots, graph-based localization, row detection
Procedia PDF Downloads 139361 A Study on the Computation of Gourava Indices for Poly-L Lysine Dendrimer and Its Biomedical Applications
Authors: M. Helen
Abstract:
Chemical graph serves as a convenient model for any real or abstract chemical system. Dendrimers are novel three dimensional hyper branched globular nanopolymeric architectures. Drug delivery scientists are especially enthusiastic about possible utility of dendrimers as drug delivery tool. Dendrimers like poly L lysine (PLL), poly-propylene imine (PPI) and poly-amidoamine (PAMAM), etc., are used as gene carrier in drug delivery system because of their chemical characteristics. These characteristics of chemical compounds are analysed using topological indices (invariants under graph isomorphism) such as Wiener index, Zagreb index, etc., Prof. V. R. Kulli motivated by the application of Zagreb indices in finding the total π energy and derived Gourava indices which is an improved version over Zagreb indices. In this paper, we study the structure of PLL-Dendrimer that has the following applications: reduction in toxicity, colon delivery, and topical delivery. Also, we determine first and second Gourava indices, first and second hyper Gourava indices, product and sum connectivity Gourava indices for PLL-Dendrimer. Gourava Indices have found applications in Quantitative Structure-Property Relationship (QSPR)/ Quantitative Structure-Activity Relationship (QSAR) studies.Keywords: connectivity Gourava indices, dendrimer, Gourava indices, hyper GouravaG indices
Procedia PDF Downloads 138360 Toward an Understanding of the Neurofunctional Dissociation between Animal and Tool Concepts: A Graph Theoretical Analysis
Authors: Skiker Kaoutar, Mounir Maouene
Abstract:
Neuroimaging studies have shown that animal and tool concepts rely on distinct networks of brain areas. Animal concepts depend predominantly on temporal areas while tool concepts rely on fronto-temporo-parietal areas. However, the origin of this neurofunctional distinction for processing animal and tool concepts remains still unclear. Here, we address this question from a network perspective suggesting that the neural distinction between animals and tools might reflect the differences in their structural semantic networks. We build semantic networks for animal and tool concepts derived from Mc Rae and colleagues’s behavioral study conducted on a large number of participants. These two networks are thus analyzed through a large number of graph theoretical measures for small-worldness: centrality, clustering coefficient, average shortest path length, as well as resistance to random and targeted attacks. The results indicate that both animal and tool networks have small-world properties. More importantly, the animal network is more vulnerable to targeted attacks compared to the tool network a result that correlates with brain lesions studies.Keywords: animals, tools, network, semantics, small-world, resilience to damage
Procedia PDF Downloads 547359 Accessibility and Visibility through Space Syntax Analysis of the Linga Raj Temple in Odisha, India
Authors: S. Pramanik
Abstract:
Since the early ages, the Hindu temples have been interpreted through various Vedic philosophies. These temples are visited by pilgrims which demonstrate the rituals and religious belief of communities, reflecting a variety of actions and behaviors. Darsana— a direct seeing, is a part of the pilgrimage activity. During the process of Darsana, a devotee is prepared for entry in the temple to realize the cognizing Truth culminating in visualizing the idol of God, placed at the Garbhagriha (sanctum sanctorum). For this, the pilgrim must pass through a sequential arrangement of spaces. During the process of progress, the pilgrims visualize the spaces differently from various points of views. The viewpoints create a variety of spatial patterns in the minds of pilgrims coherent to the Hindu philosophies. The space organization and its order are perceived by various techniques of spatial analysis. A temple, as examples of Kalinga stylistic variations, has been chosen for the study. This paper intends to demonstrate some visual patterns generated during the process of Darsana (visibility) and its accessibility by Point Isovist Studies and Visibility Graph Analysis from the entrance (Simha Dwara) to The Sanctum sanctorum (Garbhagriha).Keywords: Hindu temple architecture, point isovist, space syntax analysis, visibility graph analysis
Procedia PDF Downloads 120358 Optimizing the Location of Parking Areas Adapted for Dangerous Goods in the European Road Transport Network
Authors: María Dolores Caro, Eugenio M. Fedriani, Ángel F. Tenorio
Abstract:
The transportation of dangerous goods by lorries throughout Europe must be done by using the roads conforming the European Road Transport Network. In this network, there are several parking areas where lorry drivers can park to rest according to the regulations. According to the "European Agreement concerning the International Carriage of Dangerous Goods by Road", parking areas where lorries transporting dangerous goods can park to rest, must follow several security stipulations to keep safe the rest of road users. At this respect, these lorries must be parked in adapted areas with strict and permanent surveillance measures. Moreover, drivers must satisfy several restrictions about resting and driving time. Under these facts, one may expect that there exist enough parking areas for the transport of this type of goods in order to obey the regulations prescribed by the European Union and its member countries. However, the already-existing parking areas are not sufficient to cover all the stops required by drivers transporting dangerous goods. Our main goal is, starting from the already-existing parking areas and the loading-and-unloading location, to provide an optimal answer to the following question: how many additional parking areas must be built and where must they be located to assure that lorry drivers can transport dangerous goods following all the stipulations about security and safety for their stops? The sense of the word “optimal” is due to the fact that we give a global solution for the location of parking areas throughout the whole European Road Transport Network, adjusting the number of additional areas to be as lower as possible. To do so, we have modeled the problem using graph theory since we are working with a road network. As nodes, we have considered the locations of each already-existing parking area, each loading-and-unloading area each road bifurcation. Each road connecting two nodes is considered as an edge in the graph whose weight corresponds to the distance between both nodes in the edge. By applying a new efficient algorithm, we have found the additional nodes for the network representing the new parking areas adapted for dangerous goods, under the fact that the distance between two parking areas must be less than or equal to 400 km.Keywords: trans-european transport network, dangerous goods, parking areas, graph-based modeling
Procedia PDF Downloads 280357 Optimization of Feeder Bus Routes at Urban Rail Transit Stations Based on Link Growth Probability
Authors: Yu Song, Yuefei Jin
Abstract:
Urban public transportation can be integrated when there is an efficient connection between urban rail lines, however, there are currently no effective or quick solutions being investigated for this connection. This paper analyzes the space-time distribution and travel demand of passenger connection travel based on taxi track data and data from the road network, excavates potential bus connection stations based on potential connection demand data, and introduces the link growth probability model in the complex network to solve the basic connection bus lines in order to ascertain the direction of the bus lines that are the most connected given the demand characteristics. Then, a tree view exhaustive approach based on constraints is suggested based on graph theory, which can hasten the convergence of findings while doing chain calculations. This study uses WEI QU NAN Station, the Xi'an Metro Line 2 terminal station in Shaanxi Province, as an illustration, to evaluate the model's and the solution method's efficacy. According to the findings, 153 prospective stations have been dug up in total, the feeder bus network for the entire line has been laid out, and the best route adjustment strategy has been found.Keywords: feeder bus, route optimization, link growth probability, the graph theory
Procedia PDF Downloads 77356 Assessment of Metal Dynamics in Dissolved and Particulate Phase in Human Impacted Hooghly River Estuary, India
Authors: Soumita Mitra, Santosh Kumar Sarkar
Abstract:
Hooghly river estuary (HRE), situated at the north eastern part of Bay of Bengal has global significance due to its holiness. It is of immense importance to the local population as it gives perpetual water supply for various activities such as transportation, fishing, boating, bathing etc. to the local people who settled on both the banks of this estuary. This study was done to assess the dissolved and particulate trace metal in the estuary covering a stretch of about 175 Km. The water samples were collected from the surface (0-5 cm) along the salinity gradient and metal concentration were studied both in dissolved and particulate phase using Graphite Furnace Atomic Absorption Spectrophotometer (GF-AAS) along some physical characteristics such as water temperature, salinity, pH, turbidity and total dissolved solids. Although much significant spatial variation was noticed but little enrichment was found along the downstream of the estuary. The mean concentration of the metals in the dissolved and particulate phase followed the same trend and as follows: Fe>Mn>Cr>Zn>Cu>Ni>Pb. The concentration of the metals in the particulate phase were much greater than that in dissolved phase which was also depicted from the values of the partition coefficient (Kd)(ml mg-1). The Kdvalues ranged from 1.5x105 (in case of Pb) to 4.29x106 (in case of Cr). The high value of Kd for Cr denoted that the metal Cr is mostly bounded with the suspended particulate matter while the least value for Pb signified it presence more in dissolved phase. Moreover, the concentrations of all the studied metals in the dissolved phase were many folds higher than their respective permissible limits assested by WHO 2008, 2009 and 2011. On the other hand, according to Sediment Quality Guidelines (SQGs), Zn, Cu and Ni in the particulate phase lied between ERL and ERM values but Cr exceeded ERM values at all the stations confirming that the estuary is mostly contaminated with the particulate Cr and it might cause frequent adverse effects on the aquatic life. Multivariate statistics Cluster analysis was also performed which separated the stations according to the level of contamination from several point and nonpoint sources. Thus, it is found that the estuarine system is much polluted by the toxic metals and further investigation, toxicological studies should be implemented for full risk assessment of this system, better management and restoration of the water quality of this globally significant aquatic system.Keywords: dissolved and particulate phase, Hooghly river estuary, partition coefficient, surface water, toxic metals
Procedia PDF Downloads 280355 Quantifying Multivariate Spatiotemporal Dynamics of Malaria Risk Using Graph-Based Optimization in Southern Ethiopia
Authors: Yonas Shuke Kitawa
Abstract:
Background: Although malaria incidence has substantially fallen sharply over the past few years, the rate of decline varies by district, time, and malaria type. Despite this turn-down, malaria remains a major public health threat in various districts of Ethiopia. Consequently, the present study is aimed at developing a predictive model that helps to identify the spatio-temporal variation in malaria risk by multiple plasmodium species. Methods: We propose a multivariate spatio-temporal Bayesian model to obtain a more coherent picture of the temporally varying spatial variation in disease risk. The spatial autocorrelation in such a data set is typically modeled by a set of random effects that assign a conditional autoregressive prior distribution. However, the autocorrelation considered in such cases depends on a binary neighborhood matrix specified through the border-sharing rule. Over here, we propose a graph-based optimization algorithm for estimating the neighborhood matrix that merely represents the spatial correlation by exploring the areal units as the vertices of a graph and the neighbor relations as the series of edges. Furthermore, we used aggregated malaria count in southern Ethiopia from August 2013 to May 2019. Results: We recognized that precipitation, temperature, and humidity are positively associated with the malaria threat in the area. On the other hand, enhanced vegetation index, nighttime light (NTL), and distance from coastal areas are negatively associated. Moreover, nonlinear relationships were observed between malaria incidence and precipitation, temperature, and NTL. Additionally, lagged effects of temperature and humidity have a significant effect on malaria risk by either species. More elevated risk of P. falciparum was observed following the rainy season, and unstable transmission of P. vivax was observed in the area. Finally, P. vivax risks are less sensitive to environmental factors than those of P. falciparum. Conclusion: The improved inference was gained by employing the proposed approach in comparison to the commonly used border-sharing rule. Additionally, different covariates are identified, including delayed effects, and elevated risks of either of the cases were observed in districts found in the central and western regions. As malaria transmission operates in a spatially continuous manner, a spatially continuous model should be employed when it is computationally feasible.Keywords: disease mapping, MSTCAR, graph-based optimization algorithm, P. falciparum, P. vivax, waiting matrix
Procedia PDF Downloads 78354 Visualizing the Commercial Activity of a City by Analyzing the Data Information in Layers
Authors: Taras Agryzkov, Jose L. Oliver, Leandro Tortosa, Jose Vicent
Abstract:
This paper aims to demonstrate how network models can be used to understand and to deal with some aspects of urban complexity. As it is well known, the Theory of Architecture and Urbanism has been using for decades’ intellectual tools based on the ‘sciences of complexity’ as a strategy to propose theoretical approaches about cities and about architecture. In this sense, it is possible to find a vast literature in which for instance network theory is used as an instrument to understand very diverse questions about cities: from their commercial activity to their heritage condition. The contribution of this research consists in adding one step of complexity to this process: instead of working with one single primal graph as it is usually done, we will show how new network models arise from the consideration of two different primal graphs interacting in two layers. When we model an urban network through a mathematical structure like a graph, the city is usually represented by a set of nodes and edges that reproduce its topology, with the data generated or extracted from the city embedded in it. All this information is normally displayed in a single layer. Here, we propose to separate the information in two layers so that we can evaluate the interaction between them. Besides, both layers may be composed of structures that do not have to coincide: from this bi-layer system, groups of interactions emerge, suggesting reflections and in consequence, possible actions.Keywords: graphs, mathematics, networks, urban studies
Procedia PDF Downloads 180353 Parameter Estimation for Contact Tracing in Graph-Based Models
Authors: Augustine Okolie, Johannes Müller, Mirjam Kretzchmar
Abstract:
We adopt a maximum-likelihood framework to estimate parameters of a stochastic susceptible-infected-recovered (SIR) model with contact tracing on a rooted random tree. Given the number of detectees per index case, our estimator allows to determine the degree distribution of the random tree as well as the tracing probability. Since we do not discover all infectees via contact tracing, this estimation is non-trivial. To keep things simple and stable, we develop an approximation suited for realistic situations (contract tracing probability small, or the probability for the detection of index cases small). In this approximation, the only epidemiological parameter entering the estimator is the basic reproduction number R0. The estimator is tested in a simulation study and applied to covid-19 contact tracing data from India. The simulation study underlines the efficiency of the method. For the empirical covid-19 data, we are able to compare different degree distributions and perform a sensitivity analysis. We find that particularly a power-law and a negative binomial degree distribution meet the data well and that the tracing probability is rather large. The sensitivity analysis shows no strong dependency on the reproduction number.Keywords: stochastic SIR model on graph, contact tracing, branching process, parameter inference
Procedia PDF Downloads 78352 Language Development and Growing Spanning Trees in Children Semantic Network
Authors: Somayeh Sadat Hashemi Kamangar, Fatemeh Bakouie, Shahriar Gharibzadeh
Abstract:
In this study, we target to exploit Maximum Spanning Trees (MST) of children's semantic networks to investigate their language development. To do so, we examine the graph-theoretic properties of word-embedding networks. The networks are made of words children learn prior to the age of 30 months as the nodes and the links which are built from the cosine vector similarity of words normatively acquired by children prior to two and a half years of age. These networks are weighted graphs and the strength of each link is determined by the numerical similarities of the two words (nodes) on the sides of the link. To avoid changing the weighted networks to the binaries by setting a threshold, constructing MSTs might present a solution. MST is a unique sub-graph that connects all the nodes in such a way that the sum of all the link weights is maximized without forming cycles. MSTs as the backbone of the semantic networks are suitable to examine developmental changes in semantic network topology in children. From these trees, several parameters were calculated to characterize the developmental change in network organization. We showed that MSTs provides an elegant method sensitive to capture subtle developmental changes in semantic network organization.Keywords: maximum spanning trees, word-embedding, semantic networks, language development
Procedia PDF Downloads 145351 Game Structure and Spatio-Temporal Action Detection in Soccer Using Graphs and 3D Convolutional Networks
Authors: Jérémie Ochin
Abstract:
Soccer analytics are built on two data sources: the frame-by-frame position of each player on the terrain and the sequences of events, such as ball drive, pass, cross, shot, throw-in... With more than 2000 ball-events per soccer game, their precise and exhaustive annotation, based on a monocular video stream such as a TV broadcast, remains a tedious and costly manual task. State-of-the-art methods for spatio-temporal action detection from a monocular video stream, often based on 3D convolutional neural networks, are close to reach levels of performances in mean Average Precision (mAP) compatibles with the automation of such task. Nevertheless, to meet their expectation of exhaustiveness in the context of data analytics, such methods must be applied in a regime of high recall – low precision, using low confidence score thresholds. This setting unavoidably leads to the detection of false positives that are the product of the well documented overconfidence behaviour of neural networks and, in this case, their limited access to contextual information and understanding of the game: their predictions are highly unstructured. Based on the assumption that professional soccer players’ behaviour, pose, positions and velocity are highly interrelated and locally driven by the player performing a ball-action, it is hypothesized that the addition of information regarding surrounding player’s appearance, positions and velocity in the prediction methods can improve their metrics. Several methods are compared to build a proper representation of the game surrounding a player, from handcrafted features of the local graph, based on domain knowledge, to the use of Graph Neural Networks trained in an end-to-end fashion with existing state-of-the-art 3D convolutional neural networks. It is shown that the inclusion of information regarding surrounding players helps reaching higher metrics.Keywords: fine-grained action recognition, human action recognition, convolutional neural networks, graph neural networks, spatio-temporal action recognition
Procedia PDF Downloads 24350 High-Fidelity Materials Screening with a Multi-Fidelity Graph Neural Network and Semi-Supervised Learning
Authors: Akeel A. Shah, Tong Zhang
Abstract:
Computational approaches to learning the properties of materials are commonplace, motivated by the need to screen or design materials for a given application, e.g., semiconductors and energy storage. Experimental approaches can be both time consuming and costly. Unfortunately, computational approaches such as ab-initio electronic structure calculations and classical or ab-initio molecular dynamics are themselves can be too slow for the rapid evaluation of materials, often involving thousands to hundreds of thousands of candidates. Machine learning assisted approaches have been developed to overcome the time limitations of purely physics-based approaches. These approaches, on the other hand, require large volumes of data for training (hundreds of thousands on many standard data sets such as QM7b). This means that they are limited by how quickly such a large data set of physics-based simulations can be established. At high fidelity, such as configuration interaction, composite methods such as G4, and coupled cluster theory, gathering such a large data set can become infeasible, which can compromise the accuracy of the predictions - many applications require high accuracy, for example band structures and energy levels in semiconductor materials and the energetics of charge transfer in energy storage materials. In order to circumvent this problem, multi-fidelity approaches can be adopted, for example the Δ-ML method, which learns a high-fidelity output from a low-fidelity result such as Hartree-Fock or density functional theory (DFT). The general strategy is to learn a map between the low and high fidelity outputs, so that the high-fidelity output is obtained a simple sum of the physics-based low-fidelity and correction, Although this requires a low-fidelity calculation, it typically requires far fewer high-fidelity results to learn the correction map, and furthermore, the low-fidelity result, such as Hartree-Fock or semi-empirical ZINDO, is typically quick to obtain, For high-fidelity outputs the result can be an order of magnitude or more in speed up. In this work, a new multi-fidelity approach is developed, based on a graph convolutional network (GCN) combined with semi-supervised learning. The GCN allows for the material or molecule to be represented as a graph, which is known to improve accuracy, for example SchNet and MEGNET. The graph incorporates information regarding the numbers of, types and properties of atoms; the types of bonds; and bond angles. They key to the accuracy in multi-fidelity methods, however, is the incorporation of low-fidelity output to learn the high-fidelity equivalent, in this case by learning their difference. Semi-supervised learning is employed to allow for different numbers of low and high-fidelity training points, by using an additional GCN-based low-fidelity map to predict high fidelity outputs. It is shown on 4 different data sets that a significant (at least one order of magnitude) increase in accuracy is obtained, using one to two orders of magnitude fewer low and high fidelity training points. One of the data sets is developed in this work, pertaining to 1000 simulations of quinone molecules (up to 24 atoms) at 5 different levels of fidelity, furnishing the energy, dipole moment and HOMO/LUMO.Keywords: .materials screening, computational materials, machine learning, multi-fidelity, graph convolutional network, semi-supervised learning
Procedia PDF Downloads 41349 Topological Language for Classifying Linear Chord Diagrams via Intersection Graphs
Authors: Michela Quadrini
Abstract:
Chord diagrams occur in mathematics, from the study of RNA to knot theory. They are widely used in theory of knots and links for studying the finite type invariants, whereas in molecular biology one important motivation to study chord diagrams is to deal with the problem of RNA structure prediction. An RNA molecule is a linear polymer, referred to as the backbone, that consists of four types of nucleotides. Each nucleotide is represented by a point, whereas each chord of the diagram stands for one interaction for Watson-Crick base pairs between two nonconsecutive nucleotides. A chord diagram is an oriented circle with a set of n pairs of distinct points, considered up to orientation preserving diffeomorphisms of the circle. A linear chord diagram (LCD) is a special kind of graph obtained cutting the oriented circle of a chord diagram. It consists of a line segment, called its backbone, to which are attached a number of chords with distinct endpoints. There is a natural fattening on any linear chord diagram; the backbone lies on the real axis, while all the chords are in the upper half-plane. Each linear chord diagram has a natural genus of its associated surface. To each chord diagram and linear chord diagram, it is possible to associate the intersection graph. It consists of a graph whose vertices correspond to the chords of the diagram, whereas the chord intersections are represented by a connection between the vertices. Such intersection graph carries a lot of information about the diagram. Our goal is to define an LCD equivalence class in terms of identity of intersection graphs, from which many chord diagram invariants depend. For studying these invariants, we introduce a new representation of Linear Chord Diagrams based on a set of appropriate topological operators that permits to model LCD in terms of the relations among chords. Such set is composed of: crossing, nesting, and concatenations. The crossing operator is able to generate the whole space of linear chord diagrams, and a multiple context free grammar able to uniquely generate each LDC starting from a linear chord diagram adding a chord for each production of the grammar is defined. In other words, it allows to associate a unique algebraic term to each linear chord diagram, while the remaining operators allow to rewrite the term throughout a set of appropriate rewriting rules. Such rules define an LCD equivalence class in terms of the identity of intersection graphs. Starting from a modelled RNA molecule and the linear chord, some authors proposed a topological classification and folding. Our LCD equivalence class could contribute to the RNA folding problem leading to the definition of an algorithm that calculates the free energy of the molecule more accurately respect to the existing ones. Such LCD equivalence class could be useful to obtain a more accurate estimate of link between the crossing number and the topological genus and to study the relation among other invariants.Keywords: chord diagrams, linear chord diagram, equivalence class, topological language
Procedia PDF Downloads 201348 Comparison of Unit Hydrograph Models to Simulate Flood Events at the Field Scale
Authors: Imene Skhakhfa, Lahbaci Ouerdachi
Abstract:
To ensure the overall coherence of simulated results, it is necessary to develop a robust validation process. In many applications, it is no longer content to calibrate and validate the model only in relation to the hydro graph measured at the outlet, but we try to better simulate the functioning of the watershed in space. Therefore the timing also performs compared to other variables such as water level measurements in intermediate stations or groundwater levels. As part of this work, we limit ourselves to modeling flood of short duration for which the process of evapotranspiration is negligible. The main parameters to identify the models are related to the method of unit hydro graph (HU). Three different models were tested: SNYDER, CLARK and SCS. These models differ in their mathematical structure and parameters to be calibrated while hydrological data are the same, the initial water content and precipitation. The models are compared on the basis of their performance in terms six objective criteria, three global criteria and three criteria representing volume, peak flow, and the mean square error. The first type of criteria gives more weight to strong events whereas the second considers all events to be of equal weight. The results show that the calibrated parameter values are dependent and also highlight the problems associated with the simulation of low flow events and intermittent precipitation.Keywords: model calibration, intensity, runoff, hydrograph
Procedia PDF Downloads 486347 An Insite to the Probabilistic Assessment of Reserves in Conventional Reservoirs
Authors: Sai Sudarshan, Harsh Vyas, Riddhiman Sherlekar
Abstract:
The oil and gas industry has been unwilling to adopt stochastic definition of reserves. Nevertheless, Monte Carlo simulation methods have gained acceptance by engineers, geoscientists and other professionals who want to evaluate prospects or otherwise analyze problems that involve uncertainty. One of the common applications of Monte Carlo simulation is the estimation of recoverable hydrocarbon from a reservoir.Monte Carlo Simulation makes use of random samples of parameters or inputs to explore the behavior of a complex system or process. It finds application whenever one needs to make an estimate, forecast or decision where there is significant uncertainty. First, the project focuses on performing Monte-Carlo Simulation on a given data set using U. S Department of Energy’s MonteCarlo Software, which is a freeware e&p tool. Further, an algorithm for simulation has been developed for MATLAB and program performs simulation by prompting user for input distributions and parameters associated with each distribution (i.e. mean, st.dev, min., max., most likely, etc.). It also prompts user for desired probability for which reserves are to be calculated. The algorithm so developed and tested in MATLAB further finds implementation in Python where existing libraries on statistics and graph plotting have been imported to generate better outcome. With PyQt designer, codes for a simple graphical user interface have also been written. The graph so plotted is then validated with already available results from U.S DOE MonteCarlo Software.Keywords: simulation, probability, confidence interval, sensitivity analysis
Procedia PDF Downloads 382346 An Optimized Association Rule Mining Algorithm
Authors: Archana Singh, Jyoti Agarwal, Ajay Rana
Abstract:
Data Mining is an efficient technology to discover patterns in large databases. Association Rule Mining techniques are used to find the correlation between the various item sets in a database, and this co-relation between various item sets are used in decision making and pattern analysis. In recent years, the problem of finding association rules from large datasets has been proposed by many researchers. Various research papers on association rule mining (ARM) are studied and analyzed first to understand the existing algorithms. Apriori algorithm is the basic ARM algorithm, but it requires so many database scans. In DIC algorithm, less amount of database scan is needed but complex data structure lattice is used. The main focus of this paper is to propose a new optimized algorithm (Friendly Algorithm) and compare its performance with the existing algorithms A data set is used to find out frequent itemsets and association rules with the help of existing and proposed (Friendly Algorithm) and it has been observed that the proposed algorithm also finds all the frequent itemsets and essential association rules from databases as compared to existing algorithms in less amount of database scan. In the proposed algorithm, an optimized data structure is used i.e. Graph and Adjacency Matrix.Keywords: association rules, data mining, dynamic item set counting, FP-growth, friendly algorithm, graph
Procedia PDF Downloads 421345 Study on Impact of Existence of an Open Boundary Foreign Enclave and a 24-Hours Open Corridor for Foreigners inside Indian Territory
Authors: Debarshi Bhattacharya
Abstract:
In 2015, historic Land Boundary Agreement (LBA) executed between India and Bangladesh finally settled almost seven decades long overdue critical enclave problems of the two neighbouring countries. Enclaves within India and Bangladesh were the awful outcome of the partition of India in 1947. As a dire consequence, the populace within these enclaves enormously suffered from getting basic rights and opportunities and governmental support services till long 67 years after India’s independence and partition. As per LBA, 2015, 51 Bangladeshi (BD) enclaves inside Indian territory and 111 Indian enclaves inside Bangladesh territory were actually transferred to each other. But, by virtue of LBA, 1974 executed earlier between the two countries, one BD enclave situated inside India, namely Dohogram-Angarpota (D-A) twin enclave, had not yet been exchanged by means of LBA, 2015 and it still remains as an integral part, may not be contiguous, of Bangladesh completely surrounded by Indian territory. A study was undertaken through an extensive field survey to assess the impact of the existence of D-A BD enclave inside Indian territory from India’s perspective. Field survey was conducted for the purpose in the form of an interview, group discussion, questionnaire survey, personal interaction etc. to gather information from the Indian people residing adjacent to D-A enclave and Tin Bigha Corridor (TBC), people of D-A enclave, officials of Border Security Forces of India and Bangladesh, public representatives, representatives of political organizations etc. The issue of the existence of D-A BD enclave inside Indian territory seriously brought apprehension of future problems to the people of Kuchlibari Region of Mekhligunj Block, India, on its contiguity with Indian mainland due to 24-hour open access for the BD people through TBC. The anxiety of the local Indian people regarding threats to the national security of India as well as to the law and order issues of the locality due to the open border of D-A BD enclave in the region. On the other hand, it was observed that 24 hours opening of TBC brought significant positive changes to the people of D-A BD enclave in terms of their socio-economic condition and security status.Keywords: enclave, exchange of enclaves, land boundary agreement, Dohogram-Angarpota (D-A) Bangladeshi (BD) enclave, Tin Bigha Corridor
Procedia PDF Downloads 85344 Domain specific Ontology-Based Knowledge Extraction Using R-GNN and Large Language Models
Authors: Andrey Khalov
Abstract:
The rapid proliferation of unstructured data in IT infrastructure management demands innovative approaches for extracting actionable knowledge. This paper presents a framework for ontology-based knowledge extraction that combines relational graph neural networks (R-GNN) with large language models (LLMs). The proposed method leverages the DOLCE framework as the foundational ontology, extending it with concepts from ITSMO for domain-specific applications in IT service management and outsourcing. A key component of this research is the use of transformer-based models, such as DeBERTa-v3-large, for automatic entity and relationship extraction from unstructured texts. Furthermore, the paper explores how transfer learning techniques can be applied to fine-tune large language models (LLaMA) for using to generate synthetic datasets to improve precision in BERT-based entity recognition and ontology alignment. The resulting IT Ontology (ITO) serves as a comprehensive knowledge base that integrates domain-specific insights from ITIL processes, enabling more efficient decision-making. Experimental results demonstrate significant improvements in knowledge extraction and relationship mapping, offering a cutting-edge solution for enhancing cognitive computing in IT service environments.Keywords: ontology mapping, R-GNN, knowledge extraction, large language models, NER, knowlege graph
Procedia PDF Downloads 16343 Classifying and Predicting Efficiencies Using Interval DEA Grid Setting
Authors: Yiannis G. Smirlis
Abstract:
The classification and the prediction of efficiencies in Data Envelopment Analysis (DEA) is an important issue, especially in large scale problems or when new units frequently enter the under-assessment set. In this paper, we contribute to the subject by proposing a grid structure based on interval segmentations of the range of values for the inputs and outputs. Such intervals combined, define hyper-rectangles that partition the space of the problem. This structure, exploited by Interval DEA models and a dominance relation, acts as a DEA pre-processor, enabling the classification and prediction of efficiency scores, without applying any DEA models.Keywords: data envelopment analysis, interval DEA, efficiency classification, efficiency prediction
Procedia PDF Downloads 164342 Examining Social Connectivity through Email Network Analysis: Study of Librarians' Emailing Groups in Pakistan
Authors: Muhammad Arif Khan, Haroon Idrees, Imran Aziz, Sidra Mushtaq
Abstract:
Social platforms like online discussion and mailing groups are well aligned with academic as well as professional learning spaces. Professional communities are increasingly moving to online forums for sharing and capturing the intellectual abilities. This study investigated dynamics of social connectivity of yahoo mailing groups of Pakistani Library and Information Science (LIS) professionals using Graph Theory technique. Design/Methodology: Social Network Analysis is the increasingly concerned domain for scientists in identifying whether people grow together through online social interaction or, whether they just reflect connectivity. We have conducted a longitudinal study using Network Graph Theory technique to analyze the large data-set of email communication. The data was collected from three yahoo mailing groups using network analysis software over a period of six months i.e. January to June 2016. Findings of the network analysis were reviewed through focus group discussion with LIS experts and selected respondents of the study. Data were analyzed in Microsoft Excel and network diagrams were visualized using NodeXL and ORA-Net Scene package. Findings: Findings demonstrate that professionals and students exhibit intellectual growth the more they get tied within a network by interacting and participating in communication through online forums. The study reports on dynamics of the large network by visualizing the email correspondence among group members in a network consisting vertices (members) and edges (randomized correspondence). The model pair wise relationship between group members was illustrated to show characteristics, reasons, and strength of ties. Connectivity of nodes illustrated the frequency of communication among group members through examining node coupling, diffusion of networks, and node clustering has been demonstrated in-depth. Network analysis was found to be a useful technique in investigating the dynamics of the large network.Keywords: emailing networks, network graph theory, online social platforms, yahoo mailing groups
Procedia PDF Downloads 239341 Malware Beaconing Detection by Mining Large-scale DNS Logs for Targeted Attack Identification
Authors: Andrii Shalaginov, Katrin Franke, Xiongwei Huang
Abstract:
One of the leading problems in Cyber Security today is the emergence of targeted attacks conducted by adversaries with access to sophisticated tools. These attacks usually steal senior level employee system privileges, in order to gain unauthorized access to confidential knowledge and valuable intellectual property. Malware used for initial compromise of the systems are sophisticated and may target zero-day vulnerabilities. In this work we utilize common behaviour of malware called ”beacon”, which implies that infected hosts communicate to Command and Control servers at regular intervals that have relatively small time variations. By analysing such beacon activity through passive network monitoring, it is possible to detect potential malware infections. So, we focus on time gaps as indicators of possible C2 activity in targeted enterprise networks. We represent DNS log files as a graph, whose vertices are destination domains and edges are timestamps. Then by using four periodicity detection algorithms for each pair of internal-external communications, we check timestamp sequences to identify the beacon activities. Finally, based on the graph structure, we infer the existence of other infected hosts and malicious domains enrolled in the attack activities.Keywords: malware detection, network security, targeted attack, computational intelligence
Procedia PDF Downloads 264340 The Malfatti’s Problem in Reuleaux Triangle
Authors: Ching-Shoei Chiang
Abstract:
The Malfatti’s Problem is to ask for fitting 3 circles into a right triangle such that they are tangent to each other, and each circle is also tangent to a pair of the triangle’s side. This problem has been extended to any triangle (called general Malfatti’s Problem). Furthermore, the problem has been extended to have 1+2+…+n circles, we call it extended general Malfatti’s problem, these circles whose tangency graph, using the center of circles as vertices and the edge connect two circles center if these two circles tangent to each other, has the structure as Pascal’s triangle, and the exterior circles of these circles tangent to three sides of the triangle. In the extended general Malfatti’s problem, there are closed-form solutions for n=1, 2, and the problem becomes complex when n is greater than 2. In solving extended general Malfatti’s problem (n>2), we initially give values to the radii of all circles. From the tangency graph and current radii, we can compute angle value between two vectors. These vectors are from the center of the circle to the tangency points with surrounding elements, and these surrounding elements can be the boundary of the triangle or other circles. For each circle C, there are vectors from its center c to its tangency point with its neighbors (count clockwise) pi, i=0, 1,2,..,n. We add all angles between cpi to cp(i+1) mod (n+1), i=0,1,..,n, call it sumangle(C) for circle C. Using sumangle(C), we can reduce/enlarge the radii for all circles in next iteration, until sumangle(C) is equal to 2πfor all circles. With a similar idea, this paper proposed an algorithm to find the radii of circles whose tangency has the structure of Pascal’s triangle, and the exterior circles of these circles are tangent to the unit Realeaux Triangle.Keywords: Malfatti’s problem, geometric constraint solver, computer-aided geometric design, circle packing, data visualization
Procedia PDF Downloads 132