Search results for: graph matching
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 932

Search results for: graph matching

662 Parameter Estimation for Contact Tracing in Graph-Based Models

Authors: Augustine Okolie, Johannes Müller, Mirjam Kretzchmar

Abstract:

We adopt a maximum-likelihood framework to estimate parameters of a stochastic susceptible-infected-recovered (SIR) model with contact tracing on a rooted random tree. Given the number of detectees per index case, our estimator allows to determine the degree distribution of the random tree as well as the tracing probability. Since we do not discover all infectees via contact tracing, this estimation is non-trivial. To keep things simple and stable, we develop an approximation suited for realistic situations (contract tracing probability small, or the probability for the detection of index cases small). In this approximation, the only epidemiological parameter entering the estimator is the basic reproduction number R0. The estimator is tested in a simulation study and applied to covid-19 contact tracing data from India. The simulation study underlines the efficiency of the method. For the empirical covid-19 data, we are able to compare different degree distributions and perform a sensitivity analysis. We find that particularly a power-law and a negative binomial degree distribution meet the data well and that the tracing probability is rather large. The sensitivity analysis shows no strong dependency on the reproduction number.

Keywords: stochastic SIR model on graph, contact tracing, branching process, parameter inference

Procedia PDF Downloads 70
661 Language Development and Growing Spanning Trees in Children Semantic Network

Authors: Somayeh Sadat Hashemi Kamangar, Fatemeh Bakouie, Shahriar Gharibzadeh

Abstract:

In this study, we target to exploit Maximum Spanning Trees (MST) of children's semantic networks to investigate their language development. To do so, we examine the graph-theoretic properties of word-embedding networks. The networks are made of words children learn prior to the age of 30 months as the nodes and the links which are built from the cosine vector similarity of words normatively acquired by children prior to two and a half years of age. These networks are weighted graphs and the strength of each link is determined by the numerical similarities of the two words (nodes) on the sides of the link. To avoid changing the weighted networks to the binaries by setting a threshold, constructing MSTs might present a solution. MST is a unique sub-graph that connects all the nodes in such a way that the sum of all the link weights is maximized without forming cycles. MSTs as the backbone of the semantic networks are suitable to examine developmental changes in semantic network topology in children. From these trees, several parameters were calculated to characterize the developmental change in network organization. We showed that MSTs provides an elegant method sensitive to capture subtle developmental changes in semantic network organization.

Keywords: maximum spanning trees, word-embedding, semantic networks, language development

Procedia PDF Downloads 130
660 High-Fidelity Materials Screening with a Multi-Fidelity Graph Neural Network and Semi-Supervised Learning

Authors: Akeel A. Shah, Tong Zhang

Abstract:

Computational approaches to learning the properties of materials are commonplace, motivated by the need to screen or design materials for a given application, e.g., semiconductors and energy storage. Experimental approaches can be both time consuming and costly. Unfortunately, computational approaches such as ab-initio electronic structure calculations and classical or ab-initio molecular dynamics are themselves can be too slow for the rapid evaluation of materials, often involving thousands to hundreds of thousands of candidates. Machine learning assisted approaches have been developed to overcome the time limitations of purely physics-based approaches. These approaches, on the other hand, require large volumes of data for training (hundreds of thousands on many standard data sets such as QM7b). This means that they are limited by how quickly such a large data set of physics-based simulations can be established. At high fidelity, such as configuration interaction, composite methods such as G4, and coupled cluster theory, gathering such a large data set can become infeasible, which can compromise the accuracy of the predictions - many applications require high accuracy, for example band structures and energy levels in semiconductor materials and the energetics of charge transfer in energy storage materials. In order to circumvent this problem, multi-fidelity approaches can be adopted, for example the Δ-ML method, which learns a high-fidelity output from a low-fidelity result such as Hartree-Fock or density functional theory (DFT). The general strategy is to learn a map between the low and high fidelity outputs, so that the high-fidelity output is obtained a simple sum of the physics-based low-fidelity and correction, Although this requires a low-fidelity calculation, it typically requires far fewer high-fidelity results to learn the correction map, and furthermore, the low-fidelity result, such as Hartree-Fock or semi-empirical ZINDO, is typically quick to obtain, For high-fidelity outputs the result can be an order of magnitude or more in speed up. In this work, a new multi-fidelity approach is developed, based on a graph convolutional network (GCN) combined with semi-supervised learning. The GCN allows for the material or molecule to be represented as a graph, which is known to improve accuracy, for example SchNet and MEGNET. The graph incorporates information regarding the numbers of, types and properties of atoms; the types of bonds; and bond angles. They key to the accuracy in multi-fidelity methods, however, is the incorporation of low-fidelity output to learn the high-fidelity equivalent, in this case by learning their difference. Semi-supervised learning is employed to allow for different numbers of low and high-fidelity training points, by using an additional GCN-based low-fidelity map to predict high fidelity outputs. It is shown on 4 different data sets that a significant (at least one order of magnitude) increase in accuracy is obtained, using one to two orders of magnitude fewer low and high fidelity training points. One of the data sets is developed in this work, pertaining to 1000 simulations of quinone molecules (up to 24 atoms) at 5 different levels of fidelity, furnishing the energy, dipole moment and HOMO/LUMO.

Keywords: .materials screening, computational materials, machine learning, multi-fidelity, graph convolutional network, semi-supervised learning

Procedia PDF Downloads 12
659 Topological Language for Classifying Linear Chord Diagrams via Intersection Graphs

Authors: Michela Quadrini

Abstract:

Chord diagrams occur in mathematics, from the study of RNA to knot theory. They are widely used in theory of knots and links for studying the finite type invariants, whereas in molecular biology one important motivation to study chord diagrams is to deal with the problem of RNA structure prediction. An RNA molecule is a linear polymer, referred to as the backbone, that consists of four types of nucleotides. Each nucleotide is represented by a point, whereas each chord of the diagram stands for one interaction for Watson-Crick base pairs between two nonconsecutive nucleotides. A chord diagram is an oriented circle with a set of n pairs of distinct points, considered up to orientation preserving diffeomorphisms of the circle. A linear chord diagram (LCD) is a special kind of graph obtained cutting the oriented circle of a chord diagram. It consists of a line segment, called its backbone, to which are attached a number of chords with distinct endpoints. There is a natural fattening on any linear chord diagram; the backbone lies on the real axis, while all the chords are in the upper half-plane. Each linear chord diagram has a natural genus of its associated surface. To each chord diagram and linear chord diagram, it is possible to associate the intersection graph. It consists of a graph whose vertices correspond to the chords of the diagram, whereas the chord intersections are represented by a connection between the vertices. Such intersection graph carries a lot of information about the diagram. Our goal is to define an LCD equivalence class in terms of identity of intersection graphs, from which many chord diagram invariants depend. For studying these invariants, we introduce a new representation of Linear Chord Diagrams based on a set of appropriate topological operators that permits to model LCD in terms of the relations among chords. Such set is composed of: crossing, nesting, and concatenations. The crossing operator is able to generate the whole space of linear chord diagrams, and a multiple context free grammar able to uniquely generate each LDC starting from a linear chord diagram adding a chord for each production of the grammar is defined. In other words, it allows to associate a unique algebraic term to each linear chord diagram, while the remaining operators allow to rewrite the term throughout a set of appropriate rewriting rules. Such rules define an LCD equivalence class in terms of the identity of intersection graphs. Starting from a modelled RNA molecule and the linear chord, some authors proposed a topological classification and folding. Our LCD equivalence class could contribute to the RNA folding problem leading to the definition of an algorithm that calculates the free energy of the molecule more accurately respect to the existing ones. Such LCD equivalence class could be useful to obtain a more accurate estimate of link between the crossing number and the topological genus and to study the relation among other invariants.

Keywords: chord diagrams, linear chord diagram, equivalence class, topological language

Procedia PDF Downloads 194
658 Comparison of Unit Hydrograph Models to Simulate Flood Events at the Field Scale

Authors: Imene Skhakhfa, Lahbaci Ouerdachi

Abstract:

To ensure the overall coherence of simulated results, it is necessary to develop a robust validation process. In many applications, it is no longer content to calibrate and validate the model only in relation to the hydro graph measured at the outlet, but we try to better simulate the functioning of the watershed in space. Therefore the timing also performs compared to other variables such as water level measurements in intermediate stations or groundwater levels. As part of this work, we limit ourselves to modeling flood of short duration for which the process of evapotranspiration is negligible. The main parameters to identify the models are related to the method of unit hydro graph (HU). Three different models were tested: SNYDER, CLARK and SCS. These models differ in their mathematical structure and parameters to be calibrated while hydrological data are the same, the initial water content and precipitation. The models are compared on the basis of their performance in terms six objective criteria, three global criteria and three criteria representing volume, peak flow, and the mean square error. The first type of criteria gives more weight to strong events whereas the second considers all events to be of equal weight. The results show that the calibrated parameter values are dependent and also highlight the problems associated with the simulation of low flow events and intermittent precipitation.

Keywords: model calibration, intensity, runoff, hydrograph

Procedia PDF Downloads 475
657 Real-Time Data Stream Partitioning over a Sliding Window in Real-Time Spatial Big Data

Authors: Sana Hamdi, Emna Bouazizi, Sami Faiz

Abstract:

In recent years, real-time spatial applications, like location-aware services and traffic monitoring, have become more and more important. Such applications result dynamic environments where data as well as queries are continuously moving. As a result, there is a tremendous amount of real-time spatial data generated every day. The growth of the data volume seems to outspeed the advance of our computing infrastructure. For instance, in real-time spatial Big Data, users expect to receive the results of each query within a short time period without holding in account the load of the system. But with a huge amount of real-time spatial data generated, the system performance degrades rapidly especially in overload situations. To solve this problem, we propose the use of data partitioning as an optimization technique. Traditional horizontal and vertical partitioning can increase the performance of the system and simplify data management. But they remain insufficient for real-time spatial Big data; they can’t deal with real-time and stream queries efficiently. Thus, in this paper, we propose a novel data partitioning approach for real-time spatial Big data named VPA-RTSBD (Vertical Partitioning Approach for Real-Time Spatial Big data). This contribution is an implementation of the Matching algorithm for traditional vertical partitioning. We find, firstly, the optimal attribute sequence by the use of Matching algorithm. Then, we propose a new cost model used for database partitioning, for keeping the data amount of each partition more balanced limit and for providing a parallel execution guarantees for the most frequent queries. VPA-RTSBD aims to obtain a real-time partitioning scheme and deals with stream data. It improves the performance of query execution by maximizing the degree of parallel execution. This affects QoS (Quality Of Service) improvement in real-time spatial Big Data especially with a huge volume of stream data. The performance of our contribution is evaluated via simulation experiments. The results show that the proposed algorithm is both efficient and scalable, and that it outperforms comparable algorithms.

Keywords: real-time spatial big data, quality of service, vertical partitioning, horizontal partitioning, matching algorithm, hamming distance, stream query

Procedia PDF Downloads 147
656 An Insite to the Probabilistic Assessment of Reserves in Conventional Reservoirs

Authors: Sai Sudarshan, Harsh Vyas, Riddhiman Sherlekar

Abstract:

The oil and gas industry has been unwilling to adopt stochastic definition of reserves. Nevertheless, Monte Carlo simulation methods have gained acceptance by engineers, geoscientists and other professionals who want to evaluate prospects or otherwise analyze problems that involve uncertainty. One of the common applications of Monte Carlo simulation is the estimation of recoverable hydrocarbon from a reservoir.Monte Carlo Simulation makes use of random samples of parameters or inputs to explore the behavior of a complex system or process. It finds application whenever one needs to make an estimate, forecast or decision where there is significant uncertainty. First, the project focuses on performing Monte-Carlo Simulation on a given data set using U. S Department of Energy’s MonteCarlo Software, which is a freeware e&p tool. Further, an algorithm for simulation has been developed for MATLAB and program performs simulation by prompting user for input distributions and parameters associated with each distribution (i.e. mean, st.dev, min., max., most likely, etc.). It also prompts user for desired probability for which reserves are to be calculated. The algorithm so developed and tested in MATLAB further finds implementation in Python where existing libraries on statistics and graph plotting have been imported to generate better outcome. With PyQt designer, codes for a simple graphical user interface have also been written. The graph so plotted is then validated with already available results from U.S DOE MonteCarlo Software.

Keywords: simulation, probability, confidence interval, sensitivity analysis

Procedia PDF Downloads 368
655 An Optimized Association Rule Mining Algorithm

Authors: Archana Singh, Jyoti Agarwal, Ajay Rana

Abstract:

Data Mining is an efficient technology to discover patterns in large databases. Association Rule Mining techniques are used to find the correlation between the various item sets in a database, and this co-relation between various item sets are used in decision making and pattern analysis. In recent years, the problem of finding association rules from large datasets has been proposed by many researchers. Various research papers on association rule mining (ARM) are studied and analyzed first to understand the existing algorithms. Apriori algorithm is the basic ARM algorithm, but it requires so many database scans. In DIC algorithm, less amount of database scan is needed but complex data structure lattice is used. The main focus of this paper is to propose a new optimized algorithm (Friendly Algorithm) and compare its performance with the existing algorithms A data set is used to find out frequent itemsets and association rules with the help of existing and proposed (Friendly Algorithm) and it has been observed that the proposed algorithm also finds all the frequent itemsets and essential association rules from databases as compared to existing algorithms in less amount of database scan. In the proposed algorithm, an optimized data structure is used i.e. Graph and Adjacency Matrix.

Keywords: association rules, data mining, dynamic item set counting, FP-growth, friendly algorithm, graph

Procedia PDF Downloads 407
654 Factor Study Affecting Visual Awareness on Dynamic Object Monitoring

Authors: Terry Liang Khin Teo, Sun Woh Lye, Kai Lun Brendon Goh

Abstract:

As applied to dynamic monitoring situations, the prevailing approach to situation awareness (SA) assumes that the relevant areas of interest (AOI) be perceived before that information can be processed further to affect decision-making and, thereafter, action. It is not entirely clear whether this is the case. This study seeks to investigate the monitoring of dynamic objects through matching eye fixations with the relevant AOIs in boundary-crossing scenarios. By this definition, a match is where a fixation is registered on the AOI. While many factors may affect monitoring characteristics, traffic simulations were designed in this study to explore two factors, namely: the number of inbounds/outbound traffic transfers and the number of entry and/or exit points in a radar monitoring sector. These two factors were graded into five levels of difficulty ranging from low to high traffic flow numbers. Combined permutation in terms of levels of difficulty of these two factors yielded a total of thirty scenarios. Through this, results showed that changes in the traffic flow numbers on transfer resulted in greater variations having match limits ranging from 29%-100%, as compared to the number of sector entry/exit points of range limit from 80%-100%. The subsequent analysis is able to determine the type and combination of traffic scenarios where imperfect matching is likely to occur.

Keywords: air traffic simulation, eye-tracking, visual monitoring, focus attention

Procedia PDF Downloads 48
653 The Impact of Informal Care on Health Behavior among Older People with Chronic Diseases: A Study in China Using Propensity Score Matching

Authors: Hong Wu, Naiji Lu

Abstract:

Improvement of health behavior among people with chronic diseases is vital for increasing longevity and enhancing quality of life. This paper researched the causal effects of informal care on the compliance with doctor’s health advices – smoking control, dietetic regulation, weight control and keep exercising – among older people with chronic diseases in China, which is facing the challenge of aging. We addressed the selection bias by using propensity score matching in the estimation process. We used the 2011-2012 national baseline data of the China Health and Retirement Longitudinal Study. Our results showed informal care can help improve health behavior of older people. First, informal care improved the compliance of smoking controls: whether smoke, frequency of smoking, and the time lag between wake up and the first cigarette was all lower for these older people with informal care; Second, for dietetic regulation, older people with informal care had more meals every day than older people without informal care; Third, three variables: BMI, whether gain weight and whether lose weight were used to measure the outcome of weight control. There were no significant difference between group with informal care and that without for BMI and the possibility of losing weight. Older people with informal care had lower possibility of gain weight than that without; Last, for the advice of keeping exercising, informal care increased the probability of walking exercise, however, the difference between groups for moderate and vigorous exercise were not significant. Our results indicate policy makers who aim to decrease accidents should take informal care to elders into account and provide an appropriate policy to meet the demand of informal care. Our birth policy and postponed retirement policy may decrease the informal caregiving hours, so adjustments of these policies are important and urgent to meet the current situation of aged tendency of population. In addition, government could give more support to develop organizations to provide formal care, such as nursing home. We infer that formal care is also useful for health behavior improvements.

Keywords: chronic diseases, compliance, CHARLS, health advice, informal care, older people, propensity score matching

Procedia PDF Downloads 398
652 An Information Matrix Goodness-of-Fit Test of the Conditional Logistic Model for Matched Case-Control Studies

Authors: Li-Ching Chen

Abstract:

The case-control design has been widely applied in clinical and epidemiological studies to investigate the association between risk factors and a given disease. The retrospective design can be easily implemented and is more economical over prospective studies. To adjust effects for confounding factors, methods such as stratification at the design stage and may be adopted. When some major confounding factors are difficult to be quantified, a matching design provides an opportunity for researchers to control the confounding effects. The matching effects can be parameterized by the intercepts of logistic models and the conditional logistic regression analysis is then adopted. This study demonstrates an information-matrix-based goodness-of-fit statistic to test the validity of the logistic regression model for matched case-control data. The asymptotic null distribution of this proposed test statistic is inferred. It needs neither to employ a simulation to evaluate its critical values nor to partition covariate space. The asymptotic power of this test statistic is also derived. The performance of the proposed method is assessed through simulation studies. An example of the real data set is applied to illustrate the implementation of the proposed method as well.

Keywords: conditional logistic model, goodness-of-fit, information matrix, matched case-control studies

Procedia PDF Downloads 278
651 Examining Social Connectivity through Email Network Analysis: Study of Librarians' Emailing Groups in Pakistan

Authors: Muhammad Arif Khan, Haroon Idrees, Imran Aziz, Sidra Mushtaq

Abstract:

Social platforms like online discussion and mailing groups are well aligned with academic as well as professional learning spaces. Professional communities are increasingly moving to online forums for sharing and capturing the intellectual abilities. This study investigated dynamics of social connectivity of yahoo mailing groups of Pakistani Library and Information Science (LIS) professionals using Graph Theory technique. Design/Methodology: Social Network Analysis is the increasingly concerned domain for scientists in identifying whether people grow together through online social interaction or, whether they just reflect connectivity. We have conducted a longitudinal study using Network Graph Theory technique to analyze the large data-set of email communication. The data was collected from three yahoo mailing groups using network analysis software over a period of six months i.e. January to June 2016. Findings of the network analysis were reviewed through focus group discussion with LIS experts and selected respondents of the study. Data were analyzed in Microsoft Excel and network diagrams were visualized using NodeXL and ORA-Net Scene package. Findings: Findings demonstrate that professionals and students exhibit intellectual growth the more they get tied within a network by interacting and participating in communication through online forums. The study reports on dynamics of the large network by visualizing the email correspondence among group members in a network consisting vertices (members) and edges (randomized correspondence). The model pair wise relationship between group members was illustrated to show characteristics, reasons, and strength of ties. Connectivity of nodes illustrated the frequency of communication among group members through examining node coupling, diffusion of networks, and node clustering has been demonstrated in-depth. Network analysis was found to be a useful technique in investigating the dynamics of the large network.

Keywords: emailing networks, network graph theory, online social platforms, yahoo mailing groups

Procedia PDF Downloads 225
650 Measuring the Height of a Person in Closed Circuit Television Video Footage Using 3D Human Body Model

Authors: Dojoon Jung, Kiwoong Moon, Joong Lee

Abstract:

The height of criminals is one of the important clues that can determine the scope of the suspect's search or exclude the suspect from the search target. Although measuring the height of criminals by video alone is limited by various reasons, the 3D data of the scene and the Closed Circuit Television (CCTV) footage are matched, the height of the criminal can be measured. However, it is still difficult to measure the height of CCTV footage in the non-contact type measurement method because of variables such as position, posture, and head shape of criminals. In this paper, we propose a method of matching the CCTV footage with the 3D data on the crime scene and measuring the height of the person using the 3D human body model in the matched data. In the proposed method, the height is measured by using 3D human model in various scenes of the person in the CCTV footage, and the measurement value of the target person is corrected by the measurement error of the replay CCTV footage of the reference person. We tested for 20 people's walking CCTV footage captured from an indoor and an outdoor and corrected the measurement values with 5 reference persons. Experimental results show that the measurement error (true value-measured value) average is 0.45 cm, and this method is effective for the measurement of the person's height in CCTV footage.

Keywords: human height, CCTV footage, 2D/3D matching, 3D human body model

Procedia PDF Downloads 241
649 Malware Beaconing Detection by Mining Large-scale DNS Logs for Targeted Attack Identification

Authors: Andrii Shalaginov, Katrin Franke, Xiongwei Huang

Abstract:

One of the leading problems in Cyber Security today is the emergence of targeted attacks conducted by adversaries with access to sophisticated tools. These attacks usually steal senior level employee system privileges, in order to gain unauthorized access to confidential knowledge and valuable intellectual property. Malware used for initial compromise of the systems are sophisticated and may target zero-day vulnerabilities. In this work we utilize common behaviour of malware called ”beacon”, which implies that infected hosts communicate to Command and Control servers at regular intervals that have relatively small time variations. By analysing such beacon activity through passive network monitoring, it is possible to detect potential malware infections. So, we focus on time gaps as indicators of possible C2 activity in targeted enterprise networks. We represent DNS log files as a graph, whose vertices are destination domains and edges are timestamps. Then by using four periodicity detection algorithms for each pair of internal-external communications, we check timestamp sequences to identify the beacon activities. Finally, based on the graph structure, we infer the existence of other infected hosts and malicious domains enrolled in the attack activities.

Keywords: malware detection, network security, targeted attack, computational intelligence

Procedia PDF Downloads 249
648 The Malfatti’s Problem in Reuleaux Triangle

Authors: Ching-Shoei Chiang

Abstract:

The Malfatti’s Problem is to ask for fitting 3 circles into a right triangle such that they are tangent to each other, and each circle is also tangent to a pair of the triangle’s side. This problem has been extended to any triangle (called general Malfatti’s Problem). Furthermore, the problem has been extended to have 1+2+…+n circles, we call it extended general Malfatti’s problem, these circles whose tangency graph, using the center of circles as vertices and the edge connect two circles center if these two circles tangent to each other, has the structure as Pascal’s triangle, and the exterior circles of these circles tangent to three sides of the triangle. In the extended general Malfatti’s problem, there are closed-form solutions for n=1, 2, and the problem becomes complex when n is greater than 2. In solving extended general Malfatti’s problem (n>2), we initially give values to the radii of all circles. From the tangency graph and current radii, we can compute angle value between two vectors. These vectors are from the center of the circle to the tangency points with surrounding elements, and these surrounding elements can be the boundary of the triangle or other circles. For each circle C, there are vectors from its center c to its tangency point with its neighbors (count clockwise) pi, i=0, 1,2,..,n. We add all angles between cpi to cp(i+1) mod (n+1), i=0,1,..,n, call it sumangle(C) for circle C. Using sumangle(C), we can reduce/enlarge the radii for all circles in next iteration, until sumangle(C) is equal to 2πfor all circles. With a similar idea, this paper proposed an algorithm to find the radii of circles whose tangency has the structure of Pascal’s triangle, and the exterior circles of these circles are tangent to the unit Realeaux Triangle.

Keywords: Malfatti’s problem, geometric constraint solver, computer-aided geometric design, circle packing, data visualization

Procedia PDF Downloads 120
647 Brain Tumor Segmentation Based on Minimum Spanning Tree

Authors: Simeon Mayala, Ida Herdlevær, Jonas Bull Haugsøen, Shamundeeswari Anandan, Sonia Gavasso, Morten Brun

Abstract:

In this paper, we propose a minimum spanning tree-based method for segmenting brain tumors. The proposed method performs interactive segmentation based on the minimum spanning tree without tuning parameters. The steps involve preprocessing, making a graph, constructing a minimum spanning tree, and a newly implemented way of interactively segmenting the region of interest. In the preprocessing step, a Gaussian filter is applied to 2D images to remove the noise. Then, the pixel neighbor graph is weighted by intensity differences and the corresponding minimum spanning tree is constructed. The image is loaded in an interactive window for segmenting the tumor. The region of interest and the background are selected by clicking to split the minimum spanning tree into two trees. One of these trees represents the region of interest and the other represents the background. Finally, the segmentation given by the two trees is visualized. The proposed method was tested by segmenting two different 2D brain T1-weighted magnetic resonance image data sets. The comparison between our results and the standard gold segmentation confirmed the validity of the minimum spanning tree approach. The proposed method is simple to implement and the results indicate that it is accurate and efficient.

Keywords: brain tumor, brain tumor segmentation, minimum spanning tree, segmentation, image processing

Procedia PDF Downloads 111
646 Myanmar Character Recognition Using Eight Direction Chain Code Frequency Features

Authors: Kyi Pyar Zaw, Zin Mar Kyu

Abstract:

Character recognition is the process of converting a text image file into editable and searchable text file. Feature Extraction is the heart of any character recognition system. The character recognition rate may be low or high depending on the extracted features. In the proposed paper, 25 features for one character are used in character recognition. Basically, there are three steps of character recognition such as character segmentation, feature extraction and classification. In segmentation step, horizontal cropping method is used for line segmentation and vertical cropping method is used for character segmentation. In the Feature extraction step, features are extracted in two ways. The first way is that the 8 features are extracted from the entire input character using eight direction chain code frequency extraction. The second way is that the input character is divided into 16 blocks. For each block, although 8 feature values are obtained through eight-direction chain code frequency extraction method, we define the sum of these 8 feature values as a feature for one block. Therefore, 16 features are extracted from that 16 blocks in the second way. We use the number of holes feature to cluster the similar characters. We can recognize the almost Myanmar common characters with various font sizes by using these features. All these 25 features are used in both training part and testing part. In the classification step, the characters are classified by matching the all features of input character with already trained features of characters.

Keywords: chain code frequency, character recognition, feature extraction, features matching, segmentation

Procedia PDF Downloads 304
645 Inclusive Business and Its Contribution to Farmers Wellbeing in Arsi Ethiopia: Empirical Evidence

Authors: Senait G. Worku, Ellen Mangnus

Abstract:

Inclusive business models which integrates low-income people with companies value chain in a commercially viable way has gained momentum for the perceived potential to contribute to poverty alleviation and food security in developing countries. This article investigates the impact of Community Revenue Enhancement through Technology Extension (CREATE) project of Heineken brewery on smallholder farmers’ wellbeing in Arsi zone Oromia regional state of Ethiopia. CREATE is a Public-Private Partnership (PPP) between Ministry of Foreign Affairs of the Netherlands and Heineken N.V. which source malt barely from smallholder farmers in three zones of Oromia. The study assessed the impact of CREATE on malt barley productivity, food security and new asset purchase in Arsi zone by comparing households that participate in the project with non-participating households using propensity score matching method. The finding indicated that households that participated in the CREATE project had higher malt barley productivity and purchased more new assets than non-participating households. However, there is no significant difference on food security status of participating and non-participating households indicating that the project has a profound impact on asset accumulation than on food security improvement.

Keywords: inclusive business, malt barley, propensity score matching, wellbeing

Procedia PDF Downloads 139
644 Iris Feature Extraction and Recognition Based on Two-Dimensional Gabor Wavelength Transform

Authors: Bamidele Samson Alobalorun, Ifedotun Roseline Idowu

Abstract:

Biometrics technologies apply the human body parts for their unique and reliable identification based on physiological traits. The iris recognition system is a biometric–based method for identification. The human iris has some discriminating characteristics which provide efficiency to the method. In order to achieve this efficiency, there is a need for feature extraction of the distinct features from the human iris in order to generate accurate authentication of persons. In this study, an approach for an iris recognition system using 2D Gabor for feature extraction is applied to iris templates. The 2D Gabor filter formulated the patterns that were used for training and equally sent to the hamming distance matching technique for recognition. A comparison of results is presented using two iris image subjects of different matching indices of 1,2,3,4,5 filter based on the CASIA iris image database. By comparing the two subject results, the actual computational time of the developed models, which is measured in terms of training and average testing time in processing the hamming distance classifier, is found with best recognition accuracy of 96.11% after capturing the iris localization or segmentation using the Daughman’s Integro-differential, the normalization is confined to the Daugman’s rubber sheet model.

Keywords: Daugman rubber sheet, feature extraction, Hamming distance, iris recognition system, 2D Gabor wavelet transform

Procedia PDF Downloads 51
643 A New Bound on the Average Information Ratio of Perfect Secret-Sharing Schemes for Access Structures Based on Bipartite Graphs of Larger Girth

Authors: Hui-Chuan Lu

Abstract:

In a perfect secret-sharing scheme, a dealer distributes a secret among a set of participants in such a way that only qualified subsets of participants can recover the secret and the joint share of the participants in any unqualified subset is statistically independent of the secret. The access structure of the scheme refers to the collection of all qualified subsets. In a graph-based access structures, each vertex of a graph G represents a participant and each edge of G represents a minimal qualified subset. The average information ratio of a perfect secret-sharing scheme realizing a given access structure is the ratio of the average length of the shares given to the participants to the length of the secret. The infimum of the average information ratio of all possible perfect secret-sharing schemes realizing an access structure is called the optimal average information ratio of that access structure. We study the optimal average information ratio of the access structures based on bipartite graphs. Based on some previous results, we give a bound on the optimal average information ratio for all bipartite graphs of girth at least six. This bound is the best possible for some classes of bipartite graphs using our approach.

Keywords: secret-sharing scheme, average information ratio, star covering, deduction, core cluster

Procedia PDF Downloads 351
642 A Network Economic Analysis of Friendship, Cultural Activity, and Homophily

Authors: Siming Xie

Abstract:

In social networks, the term homophily refers to the tendency of agents with similar characteristics to link with one another and is so robustly observed across many contexts and dimensions. The starting point of my research is the observation that the “type” of agents is not a single exogenous variable. Agents, despite their differences in race, religion, and other hard to alter characteristics, may share interests and engage in activities that cut across those predetermined lines. This research aims to capture the interactions of homophily effects in a model where agents have two-dimension characteristics (i.e., race and personal hobbies such as basketball, which one either likes or dislikes) and with biases in meeting opportunities and in favor of same-type friendships. A novel feature of my model is providing a matching process with biased meeting probability on different dimensions, which could help to understand the structuring process in multidimensional networks without missing layer interdependencies. The main contribution of this study is providing a welfare based matching process for agents with multi-dimensional characteristics. In particular, this research shows that the biases in meeting opportunities on one dimension would lead to the emergence of homophily on the other dimension. The objective of this research is to determine the pattern of homophily in network formations, which will shed light on our understanding of segregation and its remedies. By constructing a two-dimension matching process, this study explores a method to describe agents’ homophilous behavior in a social network with multidimension and construct a game in which the minorities and majorities play different strategies in a society. It also shows that the optimal strategy is determined by the relative group size, where society would suffer more from social segregation if the two racial groups have a similar size. The research also has political implications—cultivating the same characteristics among agents helps diminishing social segregation, but only if the minority group is small enough. This research includes both theoretical models and empirical analysis. Providing the friendship formation model, the author first uses MATLAB to perform iteration calculations, then derives corresponding mathematical proof on previous results, and last shows that the model is consistent with empirical evidence from high school friendships. The anonymous data comes from The National Longitudinal Study of Adolescent Health (Add Health).

Keywords: homophily, multidimension, social networks, friendships

Procedia PDF Downloads 158
641 Altered Network Organization in Mild Alzheimer's Disease Compared to Mild Cognitive Impairment Using Resting-State EEG

Authors: Chia-Feng Lu, Yuh-Jen Wang, Shin Teng, Yu-Te Wu, Sui-Hing Yan

Abstract:

Brain functional networks based on resting-state EEG data were compared between patients with mild Alzheimer’s disease (mAD) and matched patients with amnestic subtype of mild cognitive impairment (aMCI). We integrated the time–frequency cross mutual information (TFCMI) method to estimate the EEG functional connectivity between cortical regions and the network analysis based on graph theory to further investigate the alterations of functional networks in mAD compared with aMCI group. We aimed at investigating the changes of network integrity, local clustering, information processing efficiency, and fault tolerance in mAD brain networks for different frequency bands based on several topological properties, including degree, strength, clustering coefficient, shortest path length, and efficiency. Results showed that the disruptions of network integrity and reductions of network efficiency in mAD characterized by lower degree, decreased clustering coefficient, higher shortest path length, and reduced global and local efficiencies in the delta, theta, beta2, and gamma bands were evident. The significant changes in network organization can be used in assisting discrimination of mAD from aMCI in clinical.

Keywords: EEG, functional connectivity, graph theory, TFCMI

Procedia PDF Downloads 421
640 Hybrid Collaborative-Context Based Recommendations for Civil Affairs Operations

Authors: Patrick Cummings, Laura Cassani, Deirdre Kelliher

Abstract:

In this paper we present findings from a research effort to apply a hybrid collaborative-context approach for a system focused on Marine Corps civil affairs data collection, aggregation, and analysis called the Marine Civil Information Management System (MARCIMS). The goal of this effort is to provide operators with information to make sense of the interconnectedness of entities and relationships in their area of operation and discover existing data to support civil military operations. Our approach to build a recommendation engine was designed to overcome several technical challenges, including 1) ensuring models were robust to the relatively small amount of data collected by the Marine Corps civil affairs community; 2) finding methods to recommend novel data for which there are no interactions captured; and 3) overcoming confirmation bias by ensuring content was recommended that was relevant for the mission despite being obscure or less well known. We solve this by implementing a combination of collective matrix factorization (CMF) and graph-based random walks to provide recommendations to civil military operations users. We also present a method to resolve the challenge of computation complexity inherent from highly connected nodes through a precomputed process.

Keywords: Recommendation engine, collaborative filtering, context based recommendation, graph analysis, coverage, civil affairs operations, Marine Corps

Procedia PDF Downloads 115
639 Prospective Validation of the FibroTest Score in Assessing Liver Fibrosis in Hepatitis C Infection with Genotype 4

Authors: G. Shiha, S. Seif, W. Samir, K. Zalata

Abstract:

Prospective Validation of the FibroTest Score in assessing Liver Fibrosis in Hepatitis C Infection with Genotype 4 FibroTest (FT) is non-invasive score of liver fibrosis that combines the quantitative results of 5 serum biochemical markers (alpha-2-macroglobulin, haptoglobin, apolipoprotein A1, gamma glutamyl transpeptidase (GGT) and bilirubin) and adjusted with the patient's age and sex in a patented algorithm to generate a measure of fibrosis. FT has been validated in patients with chronic hepatitis C (CHC) (Halfon et al., Gastroenterol. Clin Biol.( 2008), 32 6suppl 1, 22-39). The validation of fibro test ( FT) in genotype IV is not well studied. Our aim was to evaluate the performance of FibroTest in an independent prospective cohort of hepatitis C patients with genotype 4. Subject was 122 patients with CHC. All liver biopsies were scored using METAVIR system. Our fibrosis score(FT) were measured, and the performance of the cut-off score were done using ROC curve. Among patients with advanced fibrosis, the FT was identically matched with the liver biopsy in 18.6%, overestimated the stage of fibrosis in 44.2% and underestimated the stage of fibrosis in 37.7% of cases. Also in patients with no/mild fibrosis, identical matching was detected in 39.2% of cases with overestimation in 48.1% and underestimation in 12.7%. So, the overall results of the test were identical matching, overestimation and underestimation in 32%, 46.7% and 21.3% respectively. Using ROC curve it was found that (FT) at the cut-off point of 0.555 could discriminate early from advanced stages of fibrosis with an area under ROC curve (AUC) of 0.72, sensitivity of 65%, specificity of 69%, PPV of 68%, NPV of 66% and accuracy of 67%. As FibroTest Score overestimates the stage of advanced fibrosis, it should not be considered as a reliable surrogate for liver biopsy in hepatitis C infection with genotype 4.

Keywords: fibrotest, chronic Hepatitis C, genotype 4, liver biopsy

Procedia PDF Downloads 404
638 Optimal and Critical Path Analysis of State Transportation Network Using Neo4J

Authors: Pallavi Bhogaram, Xiaolong Wu, Min He, Onyedikachi Okenwa

Abstract:

A transportation network is a realization of a spatial network, describing a structure which permits either vehicular movement or flow of some commodity. Examples include road networks, railways, air routes, pipelines, and many more. The transportation network plays a vital role in maintaining the vigor of the nation’s economy. Hence, ensuring the network stays resilient all the time, especially in the face of challenges such as heavy traffic loads and large scale natural disasters, is of utmost importance. In this paper, we used the Neo4j application to develop the graph. Neo4j is the world's leading open-source, NoSQL, a native graph database that implements an ACID-compliant transactional backend to applications. The Southern California network model is developed using the Neo4j application and obtained the most critical and optimal nodes and paths in the network using centrality algorithms. The edge betweenness centrality algorithm calculates the critical or optimal paths using Yen's k-shortest paths algorithm, and the node betweenness centrality algorithm calculates the amount of influence a node has over the network. The preliminary study results confirm that the Neo4j application can be a suitable tool to study the important nodes and the critical paths for the major congested metropolitan area.

Keywords: critical path, transportation network, connectivity reliability, network model, Neo4j application, edge betweenness centrality index

Procedia PDF Downloads 123
637 Revolutionizing Financial Forecasts: Enhancing Predictions with Graph Convolutional Networks (GCN) - Long Short-Term Memory (LSTM) Fusion

Authors: Ali Kazemi

Abstract:

Those within the volatile and interconnected international economic markets, appropriately predicting market trends, hold substantial fees for traders and financial establishments. Traditional device mastering strategies have made full-size strides in forecasting marketplace movements; however, monetary data's complicated and networked nature calls for extra sophisticated processes. This observation offers a groundbreaking method for monetary marketplace prediction that leverages the synergistic capability of Graph Convolutional Networks (GCNs) and Long Short-Term Memory (LSTM) networks. Our suggested algorithm is meticulously designed to forecast the traits of inventory market indices and cryptocurrency costs, utilizing a comprehensive dataset spanning from January 1, 2015, to December 31, 2023. This era, marked by sizable volatility and transformation in financial markets, affords a solid basis for schooling and checking out our predictive version. Our algorithm integrates diverse facts to construct a dynamic economic graph that correctly reflects market intricacies. We meticulously collect opening, closing, and high and low costs daily for key inventory marketplace indices (e.g., S&P 500, NASDAQ) and widespread cryptocurrencies (e.g., Bitcoin, Ethereum), ensuring a holistic view of marketplace traits. Daily trading volumes are also incorporated to seize marketplace pastime and liquidity, providing critical insights into the market's shopping for and selling dynamics. Furthermore, recognizing the profound influence of the monetary surroundings on financial markets, we integrate critical macroeconomic signs with hobby fees, inflation rates, GDP increase, and unemployment costs into our model. Our GCN algorithm is adept at learning the relational patterns amongst specific financial devices represented as nodes in a comprehensive market graph. Edges in this graph encapsulate the relationships based totally on co-movement styles and sentiment correlations, enabling our version to grasp the complicated community of influences governing marketplace moves. Complementing this, our LSTM algorithm is trained on sequences of the spatial-temporal illustration discovered through the GCN, enriched with historic fee and extent records. This lets the LSTM seize and expect temporal marketplace developments accurately. Inside the complete assessment of our GCN-LSTM algorithm across the inventory marketplace and cryptocurrency datasets, the version confirmed advanced predictive accuracy and profitability compared to conventional and opportunity machine learning to know benchmarks. Specifically, the model performed a Mean Absolute Error (MAE) of 0.85%, indicating high precision in predicting day-by-day charge movements. The RMSE was recorded at 1.2%, underscoring the model's effectiveness in minimizing tremendous prediction mistakes, which is vital in volatile markets. Furthermore, when assessing the model's predictive performance on directional market movements, it achieved an accuracy rate of 78%, significantly outperforming the benchmark models, averaging an accuracy of 65%. This high degree of accuracy is instrumental for techniques that predict the course of price moves. This study showcases the efficacy of mixing graph-based totally and sequential deep learning knowledge in economic marketplace prediction and highlights the fee of a comprehensive, records-pushed evaluation framework. Our findings promise to revolutionize investment techniques and hazard management practices, offering investors and economic analysts a powerful device to navigate the complexities of cutting-edge economic markets.

Keywords: financial market prediction, graph convolutional networks (GCNs), long short-term memory (LSTM), cryptocurrency forecasting

Procedia PDF Downloads 38
636 New-Born Children and Marriage Stability: An Evaluation of Divorce Risk Based on 2010-2018 China Family Panel Studies Data

Authors: Yuchao Yao

Abstract:

As two of the main characteristics of Chinese demographic trends, increasing divorce rates and decreasing fertility rates both shaped the population structure in the recent decade. Figuring out to what extent can be having a child make a difference in the divorce rate of a couple will not only draw a picture of Chinese families but also bring about a new perspective to evaluate the Chinese child-breeding policies. Based on China Family Panel Studies (CFPS) Data 2010-2018, this paper provides a systematic evaluation of how children influence a couple’s marital stability through a series of empirical models. Using survival analysis and propensity score matching (PSM) model, this paper finds that the number and age of children that a couple has mattered in consolidating marital relationship, and these effects vary little over time; during the last decade, newly having children can in fact decrease the possibility of divorce for Chinese couples; the such decreasing effect is largely due to the birth of a second child. As this is an inclusive attempt to study and compare not only the effects but also the causality of children on divorce risk in the last decade, the results of this research will do a good summary of the status quo of divorce in China. Furthermore, this paper provides implications for further reforming the current marriage and child-breeding policies.

Keywords: divorce risk, fertility, China, survival analysis, propensity score matching

Procedia PDF Downloads 68
635 Student Debt Loans and Labor Market Outcomes: A Lesson in Unintended Consequences

Authors: Sun-Ki Choi

Abstract:

The U.S. student loan policy was initiated to improve the equality of educational opportunity and help low-income families to provide higher education opportunities for their children. However, with the increase in the average student loan amount, college graduates with student loans experience problems and restrictions in their early-career choices. This study examines the early career labor market choices of college graduates who obtained student loans to finance their higher education. In this study, National Survey of College Graduates (NSCG) data for 2017 and 2019 was used to estimate the effects of student loans on the employment status and current job wages of graduates with student loans. In the analysis, two groups of workers, those with student loans and those without loans, were compared. Using basic models and Mahalanobis distance matching, it was found that graduates who rely on student loans to finance their education are more likely to participate in the labor market than those who do not. Moreover, in entry-level jobs, graduates with student loans receive lower salaries than those without student loans. College graduates make job-related decisions based on their current and future wages and fringe benefits. Graduates with student loans tend to demonstrate risk-averse behaviors due to their financial restrictions. Thus, student loan debt creates inequity in the early-career labor market for college graduates. Furthermore, this study has implications for policymakers and researchers in terms of the student loan policy.

Keywords: student loan, wage differential, unintended consequences, mahalanobis distance matching

Procedia PDF Downloads 105
634 Web Proxy Detection via Bipartite Graphs and One-Mode Projections

Authors: Zhipeng Chen, Peng Zhang, Qingyun Liu, Li Guo

Abstract:

With the Internet becoming the dominant channel for business and life, many IPs are increasingly masked using web proxies for illegal purposes such as propagating malware, impersonate phishing pages to steal sensitive data or redirect victims to other malicious targets. Moreover, as Internet traffic continues to grow in size and complexity, it has become an increasingly challenging task to detect the proxy service due to their dynamic update and high anonymity. In this paper, we present an approach based on behavioral graph analysis to study the behavior similarity of web proxy users. Specifically, we use bipartite graphs to model host communications from network traffic and build one-mode projections of bipartite graphs for discovering social-behavior similarity of web proxy users. Based on the similarity matrices of end-users from the derived one-mode projection graphs, we apply a simple yet effective spectral clustering algorithm to discover the inherent web proxy users behavior clusters. The web proxy URL may vary from time to time. Still, the inherent interest would not. So, based on the intuition, by dint of our private tools implemented by WebDriver, we examine whether the top URLs visited by the web proxy users are web proxies. Our experiment results based on real datasets show that the behavior clusters not only reduce the number of URLs analysis but also provide an effective way to detect the web proxies, especially for the unknown web proxies.

Keywords: bipartite graph, one-mode projection, clustering, web proxy detection

Procedia PDF Downloads 233
633 Quantum Graph Approach for Energy and Information Transfer through Networks of Cables

Authors: Mubarack Ahmed, Gabriele Gradoni, Stephen C. Creagh, Gregor Tanner

Abstract:

High-frequency cables commonly connect modern devices and sensors. Interestingly, the proportion of electric components is rising fast in an attempt to achieve lighter and greener devices. Modelling the propagation of signals through these cable networks in the presence of parameter uncertainty is a daunting task. In this work, we study the response of high-frequency cable networks using both Transmission Line and Quantum Graph (QG) theories. We have successfully compared the two theories in terms of reflection spectra using measurements on real, lossy cables. We have derived a generalisation of the vertex scattering matrix to include non-uniform networks – networks of cables with different characteristic impedances and propagation constants. The QG model implicitly takes into account the pseudo-chaotic behavior, at the vertices, of the propagating electric signal. We have successfully compared the asymptotic growth of eigenvalues of the Laplacian with the predictions of Weyl law. We investigate the nearest-neighbour level-spacing distribution of the resonances and compare our results with the predictions of Random Matrix Theory (RMT). To achieve this, we will compare our graphs with the generalisation of Wigner distribution for open systems. The problem of scattering from networks of cables can also provide an analogue model for wireless communication in highly reverberant environments. In this context, we provide a preliminary analysis of the statistics of communication capacity for communication across cable networks, whose eventual aim is to enable detailed laboratory testing of information transfer rates using software defined radio. We specialise this analysis in particular for the case of MIMO (Multiple-Input Multiple-Output) protocols. We have successfully validated our QG model with both TL model and laboratory measurements. The growth of Eigenvalues compares well with Weyl’s law and the level-spacing distribution agrees so well RMT predictions. The results we achieved in the MIMO application compares favourably with the prediction of a parallel on-going research (sponsored by NEMF21.)

Keywords: eigenvalues, multiple-input multiple-output, quantum graph, random matrix theory, transmission line

Procedia PDF Downloads 158