Search results for: nested named entity
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1022

Search results for: nested named entity

1022 A Chinese Nested Named Entity Recognition Model Based on Lexical Features

Authors: Shuo Liu, Dan Liu

Abstract:

In the field of named entity recognition, most of the research has been conducted around simple entities. However, for nested named entities, which still contain entities within entities, it has been difficult to identify them accurately due to their boundary ambiguity. In this paper, a hierarchical recognition model is constructed based on the grammatical structure and semantic features of Chinese text for boundary calculation based on lexical features. The analysis is carried out at different levels in terms of granularity, semantics, and lexicality, respectively, avoiding repetitive work to reduce computational effort and using the semantic features of words to calculate the boundaries of entities to improve the accuracy of the recognition work. The results of the experiments carried out on web-based microblogging data show that the model achieves an accuracy of 86.33% and an F1 value of 89.27% in recognizing nested named entities, making up for the shortcomings of some previous recognition models and improving the efficiency of recognition of nested named entities.

Keywords: coarse-grained, nested named entity, Chinese natural language processing, word embedding, T-SNE dimensionality reduction algorithm

Procedia PDF Downloads 96
1021 A Framework for Chinese Domain-Specific Distant Supervised Named Entity Recognition

Authors: Qin Long, Li Xiaoge

Abstract:

The Knowledge Graphs have now become a new form of knowledge representation. However, there is no consensus in regard to a plausible and definition of entities and relationships in the domain-specific knowledge graph. Further, in conjunction with several limitations and deficiencies, various domain-specific entities and relationships recognition approaches are far from perfect. Specifically, named entity recognition in Chinese domain is a critical task for the natural language process applications. However, a bottleneck problem with Chinese named entity recognition in new domains is the lack of annotated data. To address this challenge, a domain distant supervised named entity recognition framework is proposed. The framework is divided into two stages: first, the distant supervised corpus is generated based on the entity linking model of graph attention neural network; secondly, the generated corpus is trained as the input of the distant supervised named entity recognition model to train to obtain named entities. The link model is verified in the ccks2019 entity link corpus, and the F1 value is 2% higher than that of the benchmark method. The re-pre-trained BERT language model is added to the benchmark method, and the results show that it is more suitable for distant supervised named entity recognition tasks. Finally, it is applied in the computer field, and the results show that this framework can obtain domain named entities.

Keywords: distant named entity recognition, entity linking, knowledge graph, graph attention neural network

Procedia PDF Downloads 67
1020 The Role of Named Entity Recognition for Information Extraction

Authors: Girma Yohannis Bade, Olga Kolesnikova, Grigori Sidorov

Abstract:

Named entity recognition (NER) is a building block for information extraction. Though the information extraction process has been automated using a variety of techniques to find and extract a piece of relevant information from unstructured documents, the discovery of targeted knowledge still poses a number of research difficulties because of the variability and lack of structure in Web data. NER, a subtask of information extraction (IE), came to exist to smooth such difficulty. It deals with finding the proper names (named entities), such as the name of the person, country, location, organization, dates, and event in a document, and categorizing them as predetermined labels, which is an initial step in IE tasks. This survey paper presents the roles and importance of NER to IE from the perspective of different algorithms and application area domains. Thus, this paper well summarizes how researchers implemented NER in particular application areas like finance, medicine, defense, business, food science, archeology, and so on. It also outlines the three types of sequence labeling algorithms for NER such as feature-based, neural network-based, and rule-based. Finally, the state-of-the-art and evaluation metrics of NER were presented.

Keywords: the role of NER, named entity recognition, information extraction, sequence labeling algorithms, named entity application area

Procedia PDF Downloads 44
1019 “Octopub”: Geographical Sentiment Analysis Using Named Entity Recognition from Social Networks for Geo-Targeted Billboard Advertising

Authors: Oussama Hafferssas, Hiba Benyahia, Amina Madani, Nassima Zeriri

Abstract:

Although data nowadays has multiple forms; from text to images, and from audio to videos, yet text is still the most used one at a public level. At an academical and research level, and unlike other forms, text can be considered as the easiest form to process. Therefore, a brunch of Data Mining researches has been always under its shadow, called "Text Mining". Its concept is just like data mining’s, finding valuable patterns in data, from large collections and tremendous volumes of data, in this case: Text. Named entity recognition (NER) is one of Text Mining’s disciplines, it aims to extract and classify references such as proper names, locations, expressions of time and dates, organizations and more in a given text. Our approach "Octopub" does not aim to find new ways to improve named entity recognition process, rather than that it’s about finding a new, and yet smart way, to use NER in a way that we can extract sentiments of millions of people using Social Networks as a limitless information source, and Marketing for product promotion as the main domain of application.

Keywords: textmining, named entity recognition(NER), sentiment analysis, social media networks (SN, SMN), business intelligence(BI), marketing

Procedia PDF Downloads 557
1018 Why Do We Need Hierachical Linear Models?

Authors: Mustafa Aydın, Ali Murat Sunbul

Abstract:

Hierarchical or nested data structures usually are seen in many research areas. Especially, in the field of education, if we examine most of the studies, we can see the nested structures. Students in classes, classes in schools, schools in cities and cities in regions are similar nested structures. In a hierarchical structure, students being in the same class, sharing the same physical conditions and similar experiences and learning from the same teachers, they demonstrate similar behaviors between them rather than the students in other classes.

Keywords: hierarchical linear modeling, nested data, hierarchical structure, data structure

Procedia PDF Downloads 624
1017 A Boundary Fitted Nested Grid Model for Tsunami Computation along Penang Island in Peninsular Malaysia

Authors: Md. Fazlul Karim, Ahmad Izani Md. Ismail, Mohammed Ashaque Meah

Abstract:

This paper focuses on the development of a 2-D Boundary Fitted and Nested Grid (BFNG) model to compute the tsunami propagation of Indonesian tsunami 2004 along the coastal region of Penang in Peninsular Malaysia. In the presence of a curvilinear coastline, boundary fitted grids are suitable to represent the model boundaries accurately. On the other hand, when large gradient of velocity within a confined area is expected, the use of a nested grid system is appropriate to improve the numerical accuracy with the least grid numbers. This paper constructs a shallow water nested and orthogonal boundary fitted grid model and presents computational results of the tsunami impact on the Penang coast due to the Indonesian tsunami of 2004. The results of the numerical simulations are compared with available data.

Keywords: boundary fitted nested model, tsunami, Penang Island, 2004 Indonesian Tsunami

Procedia PDF Downloads 291
1016 Named Entity Recognition System for Tigrinya Language

Authors: Sham Kidane, Fitsum Gaim, Ibrahim Abdella, Sirak Asmerom, Yoel Ghebrihiwot, Simon Mulugeta, Natnael Ambassager

Abstract:

The lack of annotated datasets is a bottleneck to the progress of NLP in low-resourced languages. The work presented here consists of large-scale annotated datasets and models for the named entity recognition (NER) system for the Tigrinya language. Our manually constructed corpus comprises over 340K words tagged for NER, with over 118K of the tokens also having parts-of-speech (POS) tags, annotated with 12 distinct classes of entities, represented using several types of tagging schemes. We conducted extensive experiments covering convolutional neural networks and transformer models; the highest performance achieved is 88.8% weighted F1-score. These results are especially noteworthy given the unique challenges posed by Tigrinya’s distinct grammatical structure and complex word morphologies. The system can be an essential building block for the advancement of NLP systems in Tigrinya and other related low-resourced languages and serve as a bridge for cross-referencing against higher-resourced languages.

Keywords: Tigrinya NER corpus, TiBERT, TiRoBERTa, BiLSTM-CRF

Procedia PDF Downloads 63
1015 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance

Procedia PDF Downloads 114
1014 Integrated Nested Laplace Approximations For Quantile Regression

Authors: Kajingulu Malandala, Ranganai Edmore

Abstract:

The asymmetric Laplace distribution (ADL) is commonly used as the likelihood function of the Bayesian quantile regression, and it offers different families of likelihood method for quantile regression. Notwithstanding their popularity and practicality, ADL is not smooth and thus making it difficult to maximize its likelihood. Furthermore, Bayesian inference is time consuming and the selection of likelihood may mislead the inference, as the Bayes theorem does not automatically establish the posterior inference. Furthermore, ADL does not account for greater skewness and Kurtosis. This paper develops a new aspect of quantile regression approach for count data based on inverse of the cumulative density function of the Poisson, binomial and Delaporte distributions using the integrated nested Laplace Approximations. Our result validates the benefit of using the integrated nested Laplace Approximations and support the approach for count data.

Keywords: quantile regression, Delaporte distribution, count data, integrated nested Laplace approximation

Procedia PDF Downloads 132
1013 Corporate Law and Its View Point of Locking in Capital

Authors: Saad Saeed Althiabi

Abstract:

This paper discusses the corporate positioning and how it became popular as a way to systematize production because of the unique manner in which incorporation legalized organizers to secure financial capital through locking it in. The power to lock in capital comes from the fact that a corporate exists as a separate legal entity, whose survival and governance are separated from any of its participants. The law essentially creates a different legal person when a corporation is created. Although this idea has been played down in the legal learning of the last decades in favor of the view that a corporation is purely something through which natural persons interrelate, recent legal research has begun to reassess the importance of entity status. Entity status, under the law and the related separation of governance from input of financial capital through the configuration of a corporation, sanctioned corporate participants to do somewhat more than connect in a series of business transactions.

Keywords: corporate law, entity status, locking in capital, financial capital

Procedia PDF Downloads 522
1012 A Boundary-Fitted Nested Grid Model for Modeling Tsunami Propagation of 2004 Indonesian Tsunami along Southern Thailand

Authors: Fazlul Karim, Esa Al-Islam

Abstract:

Many problems in oceanography and environmental sciences require the solution of shallow water equations on physical domains having curvilinear coastlines and abrupt changes of ocean depth near the shore. Finite-difference technique for the shallow water equations representing the boundary as stair step may give inaccurate results near the coastline where results are of greatest interest for various applications. This suggests the use of methods which are capable of incorporating the irregular boundary in coastal belts. At the same time, large velocity gradient is expected near the beach and islands as water depth vary abruptly near the coast. A nested numerical scheme with fine resolution is the best resort to enhance the numerical accuracy with the least grid numbers for the region of interests where the velocity changes rapidly and which is unnecessary for the away of the region. This paper describes the development of a boundary fitted nested grid (BFNG) model to compute tsunami propagation of 2004 Indonesian tsunami in Southern Thailand coastal waters. In this paper, we develop a numerical model employing the shallow water nested model and an orthogonal boundary fitted grid to investigate the tsunami impact on the Southern Thailand due to the Indonesian tsunami of 2004. Comparisons of water surface elevation obtained from numerical simulations and field measurements are made.

Keywords: Indonesian tsunami of 2004, Boundary-fitted nested grid model, Southern Thailand, finite difference method

Procedia PDF Downloads 402
1011 Identification of Anaplasma Species in Cattle of Khouzestan Province from Iran by PCR

Authors: Ali Bagherpour

Abstract:

The aim of this study was to determinate the variety of Anaplasma species among cattle of Khuzestan province, Iran. From April 2013 to June 2013, a total of 200 blood samples were collected via the jugular vein from healthy cattle (100), randomly. The extracted DNA from blood cells were amplified by Anaplasma-all primers, which amplify an approximately 1468bp DNA fragment from region of 16S rRNA gene from various members of the genus Anaplasma. For raising the test sensivity, the PCR products were amplified with the primers, which were designed from the region flanked by the first primers. The amplified nested PCR product had an expected PCR product with 345 nucleotides in length. 44 out of 100 cattle blood samples were Anaplasma spp. positive by first PCR and nested PCR. All cattle positive samples were further analyzed for the presence of A. centrale, A. bovis and A. phagocytophilum by specific nested PCR. A.phagocytophilum was identified by specific nested PCR in 3% of cattle blood samples. The extracted DNA from positive Anaplasma spp. samples were amplified by Anaplasma marginale/ovis specific primers, which amplify an approximately 866bp DNA fragment from region of msp4 gene. 41 out of 100 cattle blood samples (41%) were positive for Anaplasma marginale and Anaplasma ovis, respectively.

Keywords: Iran, Khuzestan, Anaplasma species, Cattle, A. marginale, A. ovis, A. phagocytophilum, PCR

Procedia PDF Downloads 467
1010 An Alternative Framework of Multi-Resolution Nested Weighted Essentially Non-Oscillatory Schemes for Solving Euler Equations with Adaptive Order

Authors: Zhenming Wang, Jun Zhu, Yuchen Yang, Ning Zhao

Abstract:

In the present paper, an alternative framework is proposed to construct a class of finite difference multi-resolution nested weighted essentially non-oscillatory (WENO) schemes with an increasingly higher order of accuracy for solving inviscid Euler equations. These WENO schemes firstly obtain a set of reconstruction polynomials by a hierarchy of nested central spatial stencils, and then recursively achieve a higher order approximation through the lower-order precision WENO schemes. The linear weights of such WENO schemes can be set as any positive numbers with a requirement that their sum equals one and they will not pollute the optimal order of accuracy in smooth regions and could simultaneously suppress spurious oscillations near discontinuities. Numerical results obtained indicate that these alternative finite-difference multi-resolution nested WENO schemes with different accuracies are very robust with low dissipation and use as few reconstruction stencils as possible while maintaining the same efficiency, achieving the high-resolution property without any equivalent multi-resolution representation. Besides, its finite volume form is easier to implement in unstructured grids.

Keywords: finite-difference, WENO schemes, high order, inviscid Euler equations, multi-resolution

Procedia PDF Downloads 116
1009 Autonomous Flight Control for Multirotor by Alternative Input Output State Linearization with Nested Saturations

Authors: Yong Eun Yoon, Eric N. Johnson, Liling Ren

Abstract:

Multirotor is one of the most popular types of small unmanned aircraft systems and has already been used in many areas including transport, military, surveillance, and leisure. Together with its popularity, the needs for proper flight control is growing because in most applications it is required to conduct its missions autonomously, which is in many aspects based on autonomous flight control. There have been many studies about the flight control for multirotor, but there is still room for enhancements in terms of performance and efficiency. This paper presents an autonomous flight control method for multirotor based on alternative input output linearization coupled with nested saturations. With alternative choice of the output of the multirotor flight control system, we can reduce computational cost regarding Lie algebra, and the linearized system can be stabilized with the introduction of nested saturations with real poles of our own design. Stabilization of internal dynamics is also based on the nested saturations and accompanies the determination of part of desired states. In particular, outer control loops involving state variables which originally are not included in the output of the flight control system is naturally rendered through this internal dynamics stabilization. We can also observe that desired tilting angles are determined by error dynamics from outer loops. Simulation results show that in any tracking situations multirotor stabilizes itself with small time constants, preceded by tuning process for control parameters with relatively low degree of complexity. Future study includes control of piecewise linear behavior of multirotor with actuator saturations, and the optimal determination of desired states while tracking multiple waypoints.

Keywords: automatic flight control, input output linearization, multirotor, nested saturations

Procedia PDF Downloads 197
1008 Modeling the Risk Perception of Pedestrians Using a Nested Logit Structure

Authors: Babak Mirbaha, Mahmoud Saffarzadeh, Atieh Asgari Toorzani

Abstract:

Pedestrians are the most vulnerable road users since they do not have a protective shell. One of the most common collisions for them is pedestrian-vehicle at intersections. In order to develop appropriate countermeasures to improve safety for them, researches have to be conducted to identify the factors that affect the risk of getting involved in such collisions. More specifically, this study investigates factors such as the influence of walking alone or having a baby while crossing the street, the observable age of pedestrian, the speed of pedestrians and the speed of approaching vehicles on risk perception of pedestrians. A nested logit model was used for modeling the behavioral structure of pedestrians. The results show that the presence of more lanes at intersections and not being alone especially having a baby while crossing, decrease the probability of taking a risk among pedestrians. Also, it seems that teenagers show more risky behaviors in crossing the street in comparison to other age groups. Also, the speed of approaching vehicles was considered significant. The probability of risk taking among pedestrians decreases by increasing the speed of approaching vehicle in both the first and the second lanes of crossings.

Keywords: pedestrians, intersection, nested logit, risk

Procedia PDF Downloads 150
1007 A Named Data Networking Stack for Contiki-NG-OS

Authors: Sedat Bilgili, Alper K. Demir

Abstract:

The current Internet has become the dominant use with continuing growth in the home, medical, health, smart cities and industrial automation applications. Internet of Things (IoT) is an emerging technology to enable such applications in our lives. Moreover, Named Data Networking (NDN) is also emerging as a Future Internet architecture where it fits the communication needs of IoT networks. The aim of this study is to provide an NDN protocol stack implementation running on the Contiki operating system (OS). Contiki OS is an OS that is developed for constrained IoT devices. In this study, an NDN protocol stack that can work on top of IEEE 802.15.4 link and physical layers have been developed and presented.

Keywords: internet of things (IoT), named-data, named data networking (NDN), operating system

Procedia PDF Downloads 130
1006 Adopting the Two-Stage Nested Mixed Analysis of Variance Test to the Eco Indicator 99 to Evaluate Building Technologies under LCA Uncertainties

Authors: Svetlana Pushkar

Abstract:

Eco-indicator 99 (EI99) considers fundamental life cycle assessment (LCA) uncertainties via egalitarian/egalitarian (e/e), hierarchist/hierarchist (h/h), individualist/individualist (i/i), individualist/average (i/a), egalitarian/average (e/a), and hierarchist/average (h/a) methodological options. The objective of this study is to provide a reliable two-stage nested mixed balanced Analysis of Variance (ANOVA) test as a supplemental test to EI99 to address the problematic combination of similarly and not similarly produced materials usually found in building technologies. The robustness of the test was determined from both the “EI99 (all options)” stage (including e/e, i/i, h/h, e/a, i/a, and h/a - all methodological options) and the “EI99 (perspectives)” stage (including e/e, i/i, and h/h methodological options of EI99 - the methodological options with their particular weighting set or e/a, i/a, and h/a methodological options of EI99 - the methodological options with the average weighting set) of evaluating building technologies.

Keywords: building technologies, LCA uncertainty, Eco-indicator 99, two-stage nested mixed ANOVA test

Procedia PDF Downloads 277
1005 Molecular Identification of Pneumocystis SPP Isolated from Wild Rats in Tehran, Iran

Authors: Babak Rezavand

Abstract:

Pneumocystis carinii pneumonia (PCP) is one of the main causes of morbidity and mortality among immunocompromised and HIV-positive patients and remained one of the most important common opportunistic infections in these individuals in the world. Pneumocystis infection has been reported in many mammals. The aim of this study was to determine the Pneumocystis infection in wild rats as natural reservoirs of this organism in Tehran city, Iran. Fifty three rats (Rattus rattus) were live trapped in different areas of Tehran city, Iran. After isolation of their lung tissues and homogenization in sterile conditions, DNA was extracted. DNAs from all of the Pneumocystis species were amplified by pAZ102-H and pAZ102-E primers, and Nested PCR was performed using pAZ102-X and pAZ102-W primers from the initial PCR product for all the species of Pneumocystis. Amplification of the genome revealed the presence of Pneumocystis in the lungs of 17 rats (32%) through a PCR product with a bandwidth of 346 bp. In the Nested PCR amplification, from the PCR product of 53 rats, 64.2% of the samples were positive with a bandwidth of 261bp. Pneumocystis SPP infestation is highly prevalent among wild rats in Tehran city, indicating the existence of infection in the natural ecosystem of these rodents. As a host, rat plays an important role in the transmission of the microorganism in the world.

Keywords: pneumocystis SPP, rattus rattus, nested PCR, Tehran

Procedia PDF Downloads 178
1004 Fault-Tolerant Configuration for T-Type Nested Neutral Point Clamped Converter

Authors: S. Masoud Barakati, Mohsen Rahmani Haredasht

Abstract:

Recently, the use of T-type nested neutral point clamped (T-NNPC) converter has increased in medium voltage applications. However, the T-NNPC converter architecture's reliability and continuous operation are at risk by including semiconductor switches. Semiconductor switches are a prone option for open-circuit faults. As a result, fault-tolerant converters are required to improve the system's reliability and continuous functioning. This study's primary goal is to provide a fault-tolerant T-NNPC converter configuration. In the proposed design utilizing the cold reservation approach, a redundant phase is considered, which replaces the faulty phase once the fault is diagnosed in each phase. The suggested fault-tolerant configuration can be easily implemented in practical applications due to the use of a simple PWM control mechanism. The performance evaluation of the proposed configuration under different scenarios in the MATLAB-Simulink environment proves its efficiency.

Keywords: T-type nested neutral point clamped converter, reliability, continuous operation, open-circuit faults, fault-tolerant converters

Procedia PDF Downloads 84
1003 Identification of Anaplasma Species in Sheep of Khouzestan Province by PCR

Authors: Masoud Soltanialvar, Ali Bagherpour

Abstract:

The aim of this study was to determinate the variety of Anaplasma species among sheep of khouzestan province, Iran. From April 2013 to June 2013, a total of 200 blood samples were collected via the jugular vein from healthy sheep (100), randomly. The extracted DNA from blood cells were amplified by Anaplasma-all primers, which amplify an approximately 1468bp DNA fragment from region of 16S rRNA gene from various members of the genus Anaplasma. For raising the test sensivity, the PCR products were amplified with the primers, which were designed from the region flanked by the first primers. The amplified nested PCR product had an expected PCR product with 345 nucleotides in length. In 100 sheep blood samples, 7 samples were Anaplasma spp. positive by first PCR and nested PCR. The results showed that 2 of total 100 blood samples (2%) were A.phagocytophilum positive by specific nested PCR based on 16S rRNA gene. The extracted DNA from positive Anaplasma spp. samples were amplified by Anaplasma ovis specific primers, which amplify an approximately 866bp DNA fragment from region of msp4 gene. 5 out of 100 sheep blood samples (5%) were positive for Anaplasma ovis. This study is the first molecular detection of A. ovis and A.phagocytophilum from sheep in Iran.

Keywords: Iran, anaplasma species, sheep, A. ovis, A. phagocytophilum, PCR

Procedia PDF Downloads 500
1002 Management and Agreement Protocol in Computer Security

Authors: Abdulameer K. Hussain

Abstract:

When dealing with a cryptographic system we note that there are many activities performed by parties of this cryptographic system and the most prominent of these activities is the process of agreement between the parties involved in the cryptographic system on how to deal and perform the cryptographic system tasks to be more secure, more confident and reliable. The most common agreement among parties is a key agreement and other types of agreements. Despite the fact that there is an attempt from some quarters to find other effective agreement methods but these methods are limited to the traditional agreements. This paper presents different parameters to perform more effectively the task of the agreement, including the key alternative, the agreement on the encryption method used and the agreement to prevent the denial of the services. To manage and achieve these goals, this method proposes the existence of an control and monitoring entity to manage these agreements by collecting different statistical information of the opinions of the authorized parties in the cryptographic system. These statistics help this entity to take the proper decision about the agreement factors. This entity is called Agreement Manager (AM).

Keywords: agreement parameters, key agreement, key exchange, security management

Procedia PDF Downloads 383
1001 ESRA: An End-to-End System for Re-identification and Anonymization of Swiss Court Decisions

Authors: Joel Niklaus, Matthias Sturmer

Abstract:

The publication of judicial proceedings is a cornerstone of many democracies. It enables the court system to be made accountable by ensuring that justice is made in accordance with the laws. Equally important is privacy, as a fundamental human right (Article 12 in the Declaration of Human Rights). Therefore, it is important that the parties (especially minors, victims, or witnesses) involved in these court decisions be anonymized securely. Today, the anonymization of court decisions in Switzerland is performed either manually or semi-automatically using primitive software. While much research has been conducted on anonymization for tabular data, the literature on anonymization for unstructured text documents is thin and virtually non-existent for court decisions. In 2019, it has been shown that manual anonymization is not secure enough. In 21 of 25 attempted Swiss federal court decisions related to pharmaceutical companies, pharmaceuticals, and legal parties involved could be manually re-identified. This was achieved by linking the decisions with external databases using regular expressions. An automated re-identification system serves as an automated test for the safety of existing anonymizations and thus promotes the right to privacy. Manual anonymization is very expensive (recurring annual costs of over CHF 20M in Switzerland alone, according to an estimation). Consequently, many Swiss courts only publish a fraction of their decisions. An automated anonymization system reduces these costs substantially, further leading to more capacity for publishing court decisions much more comprehensively. For the re-identification system, topic modeling with latent dirichlet allocation is used to cluster an amount of over 500K Swiss court decisions into meaningful related categories. A comprehensive knowledge base with publicly available data (such as social media, newspapers, government documents, geographical information systems, business registers, online address books, obituary portal, web archive, etc.) is constructed to serve as an information hub for re-identifications. For the actual re-identification, a general-purpose language model is fine-tuned on the respective part of the knowledge base for each category of court decisions separately. The input to the model is the court decision to be re-identified, and the output is a probability distribution over named entities constituting possible re-identifications. For the anonymization system, named entity recognition (NER) is used to recognize the tokens that need to be anonymized. Since the focus lies on Swiss court decisions in German, a corpus for Swiss legal texts will be built for training the NER model. The recognized named entities are replaced by the category determined by the NER model and an identifier to preserve context. This work is part of an ongoing research project conducted by an interdisciplinary research consortium. Both a legal analysis and the implementation of the proposed system design ESRA will be performed within the next three years. This study introduces the system design of ESRA, an end-to-end system for re-identification and anonymization of Swiss court decisions. Firstly, the re-identification system tests the safety of existing anonymizations and thus promotes privacy. Secondly, the anonymization system substantially reduces the costs of manual anonymization of court decisions and thus introduces a more comprehensive publication practice.

Keywords: artificial intelligence, courts, legal tech, named entity recognition, natural language processing, ·privacy, topic modeling

Procedia PDF Downloads 106
1000 Optimization of Polymerase Chain Reaction Condition to Amplify Exon 9 of PIK3CA Gene in Preventing False Positive Detection Caused by Pseudogene Existence in Breast Cancer

Authors: Dina Athariah, Desriani Desriani, Bugi Ratno Budiarto, Abinawanto Abinawanto, Dwi Wulandari

Abstract:

Breast cancer is a regulated by many genes. Defect in PIK3CA gene especially at position of exon 9 (E542K and E545K), called hot spot mutation induce early transformation of breast cells. The early detection of breast cancer based on mutation profile of this hot spot region would be hampered by the existence of pseudogene, marked by its substitution mutation at base 1658 (E545A) and deletion at 1659 that have been previously proven in several cancers. To the best of the authors’ knowledge, until recently no studies have been reported about pseudogene phenomenon in breast cancer. Here, we reported PCR optimization to to obtain true exon 9 of PIK3CA gene from its pseudogene hence increasing the validity of data. Material and methods: two genomic DNA with Dev and En code were used in this experiment. Two pairs of primer were design for Standard PCR method. The size of PCR products for each primer is 200bp and 400bp. While other primer was designed for Nested-PCR followed with DNA sequencing method. For Nested-PCR, we optimized the annealing temperature in first and second run of PCR, and the PCR cycle for first run PCR (15x versus 25x). Result: standard PCR using both primer pairs designed is failed to detect the true PIK3CA gene, appearing a substitution mutation at 1658 and deletion at 1659 of PCR product in sequence chromatogram indicated pseudogene. Meanwhile, Nested-PCR with optimum condition (annealing temperature for the first round at 55oC, annealing temperatung for the second round at 60,7oC with 15x PCR cycles) and could detect the true PIK3CA gene. Dev sample were identified as WT while En sample contain one substitution mutation at position 545 of exon 9, indicating amino acid changing from E to K. For the conclusion, pseudogene also exists in breast cancer and the apllication of optimazed Nested-PCR in this study could detect the true exon 9 of PIK3CA gene.

Keywords: breast cancer, exon 9, hotspot mutation, PIK3CA, pseudogene

Procedia PDF Downloads 212
999 Cross-Knowledge Graph Relation Completion for Non-Isomorphic Cross-Lingual Entity Alignment

Authors: Yuhong Zhang, Dan Lu, Chenyang Bu, Peipei Li, Kui Yu, Xindong Wu

Abstract:

The Cross-Lingual Entity Alignment (CLEA) task aims to find the aligned entities that refer to the same identity from two knowledge graphs (KGs) in different languages. It is an effective way to enhance the performance of data mining for KGs with scarce resources. In real-world applications, the neighborhood structures of the same entities in different KGs tend to be non-isomorphic, which makes the representation of entities contain diverse semantic information and then poses a great challenge for CLEA. In this paper, we try to address this challenge from two perspectives. On the one hand, the cross-KG relation completion rules are designed with the alignment constraint of entities and relations to improve the topology isomorphism of two KGs. On the other hand, a representation method combining isomorphic weights is designed to include more isomorphic semantics for counterpart entities, which will benefit the CLEA. Experiments show that our model can improve the isomorphism of two KGs and the alignment performance, especially for two non-isomorphic KGs.

Keywords: knowledge graphs, cross-lingual entity alignment, non-isomorphic, relation completion

Procedia PDF Downloads 95
998 In Vitro Studies on Antimicrobial Activities of Lactic Acid Bacteria Isolated from Fresh Fruits for Biocontrol of Pathogens

Authors: Okolie Pius Ifeanyi, Emerenini Emilymary Chima

Abstract:

Aims: The study investigated the diversity and identities of Lactic Acid Bacteria (LAB) isolated from different fresh fruits using Molecular Nested PCR analysis and the efficacy of cell free supernatants from Lactic Acid Bacteria (LAB) isolated from fresh fruits for in vitro control of some tomato pathogens. Study Design: Nested PCR approach was used in this study employing universal 16S rRNA gene primers in the first round PCR and LAB specific Primers in the second round PCR with the view of generating specific Nested PCR products for the LAB diversity present in the samples. The inhibitory potentials of supernatant obtained from LAB isolates of fruits origin that were molecularly characterized were investigated against some tomato phytopathogens using agar-well method with the view to develop biological agents for some tomato disease causing organisms. Methodology: Gram positive, catalase negative strains of LAB were isolated from fresh fruits on Man Rogosa and Sharpe agar (Lab M) using streaking method. Isolates obtained were molecularly characterized by means of genomic DNA extraction kit (Norgen Biotek, Canada) method. Standard methods were used for Nested Polymerase Chain Reaction (PCR) amplification targeting the 16S rRNA gene using universal 16S rRNA gene and LAB specific primers, agarose gel electrophoresis, purification and sequencing of generated Nested PCR products (Macrogen Inc., USA). The partial sequences obtained were identified by blasting in the non-redundant nucleotide database of National Center for Biotechnology Information (NCBI). The antimicrobial activities of characterized LAB against some tomato phytopathogenic bacteria which include (Xanthomonas campestries, Erwinia caratovora, and Pseudomonas syringae) were obtained by using the agar well diffusion method. Results: The partial sequences obtained were deposited in the database of National Centre for Biotechnology Information (NCBI). Isolates were identified based upon the sequences as Weissella cibaria (4, 18.18%), Weissella confusa (3, 13.64%), Leuconostoc paramensenteroides (1, 4.55%), Lactobacillus plantarum (8, 36.36%), Lactobacillus paraplantarum (1, 4.55%) and Lactobacillus pentosus (1, 4.55%). The cell free supernatants of LAB from fresh fruits origin (Weissella cibaria, Weissella confusa, Leuconostoc paramensenteroides, Lactobacillus plantarum, Lactobacillus paraplantarum and Lactobacillus pentosus) can inhibits these bacteria by creating clear zones of inhibition around the wells containing cell free supernatants of the above mentioned strains of lactic acid bacteria. Conclusion: This study shows that potentially LAB can be quickly characterized by molecular methods to specie level by nested PCR analysis of the bacteria isolate genomic DNA using universal 16S rRNA primers and LAB specific primer. Tomato disease causing organisms can be most likely biologically controlled by using extracts from LAB. This finding will reduce the potential hazard from the use of chemical herbicides on plant.

Keywords: nested pcr, molecular characterization, 16s rRNA gene, lactic acid bacteria

Procedia PDF Downloads 370
997 Modeling User Context Using CEAR Diagram

Authors: Ravindra Dastikop, G. S. Thyagaraju, U. P. Kulkarni

Abstract:

Even though the number of context aware applications is increasing day by day along with the users, till today there is no generic programming paradigm for context aware applications. This situation could be remedied by design and developing the appropriate context modeling and programming paradigm for context aware applications. In this paper, we are proposing the static context model and metrics for validating the expressiveness and understandability of the model. The proposed context modeling is a way of describing a situation of user using context entities , attributes and relationships .The model which is an extended and hybrid version of ER model, ontology model and Graphical model is specifically meant for expressing and understanding the user situation in context aware environment. The model is useful for understanding context aware problems, preparing documentation and designing programs and databases. The model makes use of context entity attributes relationship (CEAR) diagram for representation of association between the context entities and attributes. We have identified a new set of graphical notations for improving the expressiveness and understandability of context from the end user perspective .

Keywords: user context, context entity, context entity attributes, situation, sensors, devices, relationships, actors, expressiveness, understandability

Procedia PDF Downloads 314
996 Multi-Stream Graph Attention Network for Recommendation with Knowledge Graph

Authors: Zhifei Hu, Feng Xia

Abstract:

In recent years, Graph neural network has been widely used in knowledge graph recommendation. The existing recommendation methods based on graph neural network extract information from knowledge graph through entity and relation, which may not be efficient in the way of information extraction. In order to better propose useful entity information for the current recommendation task in the knowledge graph, we propose an end-to-end Neural network Model based on multi-stream graph attentional Mechanism (MSGAT), which can effectively integrate the knowledge graph into the recommendation system by evaluating the importance of entities from both users and items. Specifically, we use the attention mechanism from the user's perspective to distil the domain nodes information of the predicted item in the knowledge graph, to enhance the user's information on items, and generate the feature representation of the predicted item. Due to user history, click items can reflect the user's interest distribution, we propose a multi-stream attention mechanism, based on the user's preference for entities and relationships, and the similarity between items to be predicted and entities, aggregate user history click item's neighborhood entity information in the knowledge graph and generate the user's feature representation. We evaluate our model on three real recommendation datasets: Movielens-1M (ML-1M), LFM-1B 2015 (LFM-1B), and Amazon-Book (AZ-book). Experimental results show that compared with the most advanced models, our proposed model can better capture the entity information in the knowledge graph, which proves the validity and accuracy of the model.

Keywords: graph attention network, knowledge graph, recommendation, information propagation

Procedia PDF Downloads 85
995 Kýklos Dimensional Geometry: Entity Specific Core Measurement System

Authors: Steven D. P Moore

Abstract:

A novel method referred to asKýklos(Ky) dimensional geometry is proposed as an entity specific core geometric dimensional measurement system. Ky geometric measures can constructscaled multi-dimensionalmodels using regular and irregular sets in IRn. This entity specific-derived geometric measurement system shares similar fractal methods in which a ‘fractal transformation operator’ is applied to a set S to produce a union of N copies. The Kýklos’ inputs use 1D geometry as a core measure. One-dimensional inputs include the radius interval of a circle/sphere or the semiminor/semimajor axes intervals of an ellipse or spheroid. These geometric inputs have finite values that can be measured by SI distance units. The outputs for each interval are divided and subdivided 1D subcomponents with a union equal to the interval geometry/length. Setting a limit of subdivision iterations creates a finite value for each 1Dsubcomponent. The uniqueness of this method is captured by allowing the simplest 1D inputs to define entity specific subclass geometric core measurements that can also be used to derive length measures. Current methodologies for celestial based measurement of time, as defined within SI units, fits within this methodology, thus combining spatial and temporal features into geometric core measures. The novel Ky method discussed here offers geometric measures to construct scaled multi-dimensional structures, even models. Ky classes proposed for consideration include celestial even subatomic. The application of this offers incredible possibilities, for example, geometric architecture that can represent scaled celestial models that incorporates planets (spheroids) and celestial motion (elliptical orbits).

Keywords: Kyklos, geometry, measurement, celestial, dimension

Procedia PDF Downloads 143
994 Accounting Policies in Polish and International Legal Regulations

Authors: Piotr Prewysz-Kwinto, Grazyna Voss

Abstract:

Accounting policies are a set of solutions compliant with legal regulations that an entity selects and adopts, and which guarantee a proper quality of financial statements. Those solutions may differ depending on whether the entity adopts national or international accounting standards. The aim of this article is to present accounting principles (policies) in Polish and international legal regulations and their adoption in selected Polish companies listed on the Warsaw Stock Exchange. The research method adopted in this work is the analysis and evaluation of legal conditions in Polish companies.

Keywords: accounting policies, international financial reporting standards, financial statement, method of measuring

Procedia PDF Downloads 342
993 Molecular Epidemiology of Egyptian Biomphalaria Snail: The Identification of Species, Diagnostic of the Parasite in Snails and Host Parasite Relationship

Authors: Hanaa M. Abu El Einin, Ahmed T. Sharaf El- Din

Abstract:

Biomphalaria snails play an integral role in the transmission of Schistosoma mansoni, the causative agent for human schistosomiasis. Two species of Biomphalaria were reported from Egypt, Biomphalaria alexandrina and Biomphalaria glabrata, and later on a hybrid of B. alexandrina and B. glabrata was reported in streams at Nile Delta. All were known to be excellent hosts of S. mansoni. Host-parasite relationship can be viewed in terms of snail susceptibility and parasite infectivity. The objective of this study will highlight the progress that has been made in using molecular approaches to describe the correct identification of snail species that participating in transmission of schistosomiasis, rapid diagnose of infection in addition to susceptibility and resistance type. Snails were identified using of molecular methods involving Randomly Amplified Polymorphic DNA (RAPD), Polymerase Chain Reaction, Restriction Fragment Length Polymorphisms (PCR-RFLP) and Species - specific- PCR. Molecular approaches to diagnose parasite in snails from Egypt: Nested PCR assay and small subunit (SSU) rRNA gene. Also RAPD PCR for study susceptible and resistance phenotype. The results showed that RAPD- PCR, PCR-RFLP and species-specific-PCR techniques were confirmed that: no evidence for the presence of B. glabrata in Egypt, All Biomphalaria snails collected identified as B. alexandrina snail i-e B alexandrinia is a common and no evidence for hybridization with B. glabrata. The adopted specific nested PCR assay revealed much higher sensitivity which enables the detection of S. mansoni infected snails down to 3 days post infection. Nested PCR method for detection of infected snails using S. mansoni fructose -1,6- bisphosphate aldolase (SMALDO) primer, these primers are specific only for S. mansoni and not cross reactive with other schistosomes or molluscan aldolases Nested PCR for such gene is sensitive enough to detect one cercariae. Genetic variations between B. alexandrina strains that are susceptible and resistant to Schistosoma infec¬tion using a RAPD-PCR showed that 39.8% of the examined snails collected from the field were resistant, while 60.2% of these snails showed high infection rates. In conclusion the genetics of the intermediate host plays a more important role in the epidemiological control of schistosomiasis.

Keywords: biomphalaria, molecular differentiation, parasite detection, schistosomiasis

Procedia PDF Downloads 171