Search results for: microarray data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25547

Search results for: microarray data mining

24797 Performance Analysis of Proprietary and Non-Proprietary Tools for Regression Testing Using Genetic Algorithm

Authors: K. Hema Shankari, R. Thirumalaiselvi, N. V. Balasubramanian

Abstract:

The present paper addresses to the research in the area of regression testing with emphasis on automated tools as well as prioritization of test cases. The uniqueness of regression testing and its cyclic nature is pointed out. The difference in approach between industry, with business model as basis, and academia, with focus on data mining, is highlighted. Test Metrics are discussed as a prelude to our formula for prioritization; a case study is further discussed to illustrate this methodology. An industrial case study is also described in the paper, where the number of test cases is so large that they have to be grouped as Test Suites. In such situations, a genetic algorithm proposed by us can be used to reconfigure these Test Suites in each cycle of regression testing. The comparison is made between a proprietary tool and an open source tool using the above-mentioned metrics. Our approach is clarified through several tables.

Keywords: APFD metric, genetic algorithm, regression testing, RFT tool, test case prioritization, selenium tool

Procedia PDF Downloads 434
24796 Monitoring the Pollution Status of the Goan Coast Using Genotoxicity Biomarkers in the Bivalve, Meretrix ovum

Authors: Avelyno D'Costa, S. K. Shyama, M. K. Praveen Kumar

Abstract:

The coast of Goa, India receives constant anthropogenic stress through its major rivers which carry mining rejects of iron and manganese ores from upstream mining sites and petroleum hydrocarbons from shipping and harbor-related activities which put the aquatic fauna such as bivalves at risk. The present study reports the pollution status of the Goan coast by the above xenobiotics employing genotoxicity studies. This is further supplemented by the quantification of total petroleum hydrocarbons (TPHs) and various trace metals (iron, manganese, copper, cadmium, and lead) in gills of the estuarine clam, Meretrix ovum as well as from the surrounding water and sediment, over a two-year sampling period, from January 2013 to December 2014. Bivalves were collected from a probable unpolluted site at Palolem and a probable polluted site at Vasco, based upon the anthropogenic activities at these sites. Genotoxicity was assessed in the gill cells using the comet assay and micronucleus test. The quantity of TPHs and trace metals present in gill tissue, water and sediments were analyzed using spectrofluorometry and atomic absorption spectrophotometry (AAS), respectively. The statistical significance of data was analyzed employing Student’s t-test. The relationship between DNA damage and pollutant concentrations was evaluated using multiple regression analysis. Significant DNA damage was observed in the bivalves collected from Vasco which is a region of high industrial activity. Concentrations of TPHs and trace metals (iron, manganese, and cadmium) were also found to be significantly high in gills of the bivalves collected from Vasco compared to those collected from Palolem. Further, the concentrations of these pollutants were also found to be significantly high in the water and sediments at Vasco compared to that of Palolem. This may be due to the lack of industrial activity at Palolem. A high positive correlation was observed between the pollutant levels and DNA damage in the bivalves collected from Vasco suggesting the genotoxic nature of these pollutants. Further, M. ovum can be used as a bioindicator species for monitoring the level of pollution of the estuarine/coastal regions by TPHs and trace metals.

Keywords: comet assay, metals, micronucleus test, total petroleum Hydrocarbons

Procedia PDF Downloads 236
24795 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: big data analysis, document classification, multi-category, text mining, topic analysis

Procedia PDF Downloads 271
24794 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 87
24793 Learning about the Strengths and Weaknesses of Urban Climate Action Plans

Authors: Prince Dacosta Aboagye, Ayyoob Sharifi

Abstract:

Cities respond to climate concerns mainly through their climate action plans (CAPs). A comprehensive content analysis of the dynamics in existing urban CAPs is not well represented in the literature. This literature void presents a difficulty in appreciating the strengths and weaknesses of urban CAPs. Here, we perform a qualitative content analysis (QCA) on CAPs from 278 cities worldwide and use text-mining tools to map and visualize the relevant data. Our analysis showed a decline in the number of CAPs developed and published following the global COVID-19 lockdown period. Evidently, megacities are leading the deep decarbonisation agenda. We also observed a transition from developing mainly mitigation-focused CAPs pre-COP21 to both mitigation and adaptation CAPs. A lack of inclusiveness in local climate planning was common among European and North American cities. The evidence is a catalyst for understanding the trends in existing urban CAPs to shape future urban climate planning.

Keywords: urban, climate action plans, strengths, weaknesses

Procedia PDF Downloads 94
24792 Hydrogeophysical Investigations And Mapping of Ingress Channels Along The Blesbokspruit Stream In The East Rand Basin Of The Witwatersrand, South Africa

Authors: Melvin Sethobya, Sithule Xanga, Sechaba Lenong, Lunga Nolakana, Gbenga Adesola

Abstract:

Mining has been the cornerstone of the South African economy for the last century. Most of the gold mining in South Africa was conducted within the Witwatersrand basin, which contributed to the rapid growth of the city of Johannesburg and capitulated the city to becoming the business and wealth capital of the country. But with gradual depletion of resources, a stoppage in the extraction of underground water from mines and other factors relating to survival of the mining operations over a lengthy period, most of the mines were abandoned and left to pollute the local waterways and groundwater with toxins, heavy metal residue and increased acid mine drainage ensued. The Department of Mineral Resources and Energy commissioned a project whose aim is to monitor, maintain, and mitigate the adverse environmental impacts of polluted water mine water flowing into local streams affecting local ecosystems and livelihoods downstream. As part of mitigation efforts, the diagnosis and monitoring of groundwater or surface water polluted sites has become important. Geophysical surveys, in particular, Resistivity and Magnetics surveys, were selected as some of most suitable techniques for investigation of local ingress points along of one the major streams cutting through the Witwatersrand basin, namely the Blesbokspruit, which is found in the eastern part of the basin. The aim of the surveys was to provide information that could be used to assist in determining possible water loss/ ingress from the Blesbokspriut stream. Modelling of geophysical surveys results offered an in-depth insight into the interaction and pathways of polluted water through mapping of possible ingress channels near the Blesbokspruit. The resistivity - depth profile of the surveyed site exhibit a three(3) layered model with low resistivity values (10 to 200 Ω.m) overburden, which is underlain by a moderate resistivity weathered layer (>300 Ω.m), which sits on a more resistive crystalline bedrock (>500 Ω.m). Two locations of potential ingress channels were mapped across the two traverses at the site. The magnetic survey conducted at the site mapped a major NE-SW trending regional linearment with a strong magnetic signature, which was modeled to depth beyond 100m, with the potential to act as a conduit for dispersion of stream water away from the stream, as it shared a similar orientation with the potential ingress channels as mapped using the resistivity method.

Keywords: eletrictrical resistivity, magnetics survey, blesbokspruit, ingress

Procedia PDF Downloads 63
24791 A Ratio-Weighted Decision Tree Algorithm for Imbalance Dataset Classification

Authors: Doyin Afolabi, Phillip Adewole, Oladipupo Sennaike

Abstract:

Most well-known classifiers, including the decision tree algorithm, can make predictions on balanced datasets efficiently. However, the decision tree algorithm tends to be biased towards imbalanced datasets because of the skewness of the distribution of such datasets. To overcome this problem, this study proposes a weighted decision tree algorithm that aims to remove the bias toward the majority class and prevents the reduction of majority observations in imbalance datasets classification. The proposed weighted decision tree algorithm was tested on three imbalanced datasets- cancer dataset, german credit dataset, and banknote dataset. The specificity, sensitivity, and accuracy metrics were used to evaluate the performance of the proposed decision tree algorithm on the datasets. The evaluation results show that for some of the weights of our proposed decision tree, the specificity, sensitivity, and accuracy metrics gave better results compared to that of the ID3 decision tree and decision tree induced with minority entropy for all three datasets.

Keywords: data mining, decision tree, classification, imbalance dataset

Procedia PDF Downloads 133
24790 Research on the Risks of Railroad Receiving and Dispatching Trains Operators: Natural Language Processing Risk Text Mining

Authors: Yangze Lan, Ruihua Xv, Feng Zhou, Yijia Shan, Longhao Zhang, Qinghui Xv

Abstract:

Receiving and dispatching trains is an important part of railroad organization, and the risky evaluation of operating personnel is still reflected by scores, lacking further excavation of wrong answers and operating accidents. With natural language processing (NLP) technology, this study extracts the keywords and key phrases of 40 relevant risk events about receiving and dispatching trains and reclassifies the risk events into 8 categories, such as train approach and signal risks, dispatching command risks, and so on. Based on the historical risk data of personnel, the K-Means clustering method is used to classify the risk level of personnel. The result indicates that the high-risk operating personnel need to strengthen the training of train receiving and dispatching operations towards essential trains and abnormal situations.

Keywords: receiving and dispatching trains, natural language processing, risk evaluation, K-means clustering

Procedia PDF Downloads 89
24789 Credit Risk Assessment Using Rule Based Classifiers: A Comparative Study

Authors: Salima Smiti, Ines Gasmi, Makram Soui

Abstract:

Credit risk is the most important issue for financial institutions. Its assessment becomes an important task used to predict defaulter customers and classify customers as good or bad payers. To this objective, numerous techniques have been applied for credit risk assessment. However, to our knowledge, several evaluation techniques are black-box models such as neural networks, SVM, etc. They generate applicants’ classes without any explanation. In this paper, we propose to assess credit risk using rules classification method. Our output is a set of rules which describe and explain the decision. To this end, we will compare seven classification algorithms (JRip, Decision Table, OneR, ZeroR, Fuzzy Rule, PART and Genetic programming (GP)) where the goal is to find the best rules satisfying many criteria: accuracy, sensitivity, and specificity. The obtained results confirm the efficiency of the GP algorithm for German and Australian datasets compared to other rule-based techniques to predict the credit risk.

Keywords: credit risk assessment, classification algorithms, data mining, rule extraction

Procedia PDF Downloads 181
24788 To Handle Data-Driven Software Development Projects Effectively

Authors: Shahnewaz Khan

Abstract:

Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.

Keywords: data, data-driven projects, data science, NLP, software project

Procedia PDF Downloads 81
24787 An Automated Approach to the Nozzle Configuration of Polycrystalline Diamond Compact Drill Bits for Effective Cuttings Removal

Authors: R. Suresh, Pavan Kumar Nimmagadda, Ming Zo Tan, Shane Hart, Sharp Ugwuocha

Abstract:

Polycrystalline diamond compact (PDC) drill bits are extensively used in the oil and gas industry as well as the mining industry. Industry engineers continually improve upon PDC drill bit designs and hydraulic conditions. Optimized injection nozzles play a key role in improving the drilling performance and efficiency of these ever changing PDC drill bits. In the first part of this study, computational fluid dynamics (CFD) modelling is performed to investigate the hydrodynamic characteristics of drilling fluid flow around the PDC drill bit. An Open-source CFD software – OpenFOAM simulates the flow around the drill bit, based on the field input data. A specifically developed console application integrates the entire CFD process including, domain extraction, meshing, and solving governing equations and post-processing. The results from the OpenFOAM solver are then compared with that of the ANSYS Fluent software. The data from both software programs agree. The second part of the paper describes the parametric study of the PDC drill bit nozzle to determine the effect of parameters such as number of nozzles, nozzle velocity, nozzle radial position and orientations on the flow field characteristics and bit washing patterns. After analyzing a series of nozzle configurations, the best configuration is identified and recommendations are made for modifying the PDC bit design.

Keywords: ANSYS Fluent, computational fluid dynamics, nozzle configuration, OpenFOAM, PDC dill bit

Procedia PDF Downloads 420
24786 Geological Structure Identification in Semilir Formation: An Correlated Geological and Geophysical (Very Low Frequency) Data for Zonation Disaster with Current Density Parameters and Geological Surface Information

Authors: E. M. Rifqi Wilda Pradana, Bagus Bayu Prabowo, Meida Riski Pujiyati, Efraim Maykhel Hagana Ginting, Virgiawan Arya Hangga Reksa

Abstract:

The VLF (Very Low Frequency) method is an electromagnetic method that uses low frequencies between 10-30 KHz which results in a fairly deep penetration. In this study, the VLF method was used for zonation of disaster-prone areas by identifying geological structures in the form of faults. Data acquisition was carried out in Trimulyo Region, Jetis District, Bantul Regency, Special Region of Yogyakarta, Indonesia with 8 measurement paths. This study uses wave transmitters from Japan and Australia to obtain Tilt and Elipt values that can be used to create RAE (Rapat Arus Ekuivalen or Current Density) sections that can be used to identify areas that are easily crossed by electric current. This section will indicate the existence of a geological structure in the form of faults in the study area which is characterized by a high RAE value. In data processing of VLF method, it is obtained Tilt vs Elliptical graph and Moving Average (MA) Tilt vs Moving Average (MA) Elipt graph of each path that shows a fluctuating pattern and does not show any intersection at all. Data processing uses Matlab software and obtained areas with low RAE values that are 0%-6% which shows medium with low conductivity and high resistivity and can be interpreted as sandstone, claystone, and tuff lithology which is part of the Semilir Formation. Whereas a high RAE value of 10% -16% which shows a medium with high conductivity and low resistivity can be interpreted as a fault zone filled with fluid. The existence of the fault zone is strengthened by the discovery of a normal fault on the surface with strike N550W and dip 630E at coordinates X= 433256 and Y= 9127722 so that the activities of residents in the zone such as housing, mining activities and other activities can be avoided to reduce the risk of natural disasters.

Keywords: current density, faults, very low frequency, zonation

Procedia PDF Downloads 172
24785 The Relationship Between Artificial Intelligence, Data Science, and Privacy

Authors: M. Naidoo

Abstract:

Artificial intelligence often requires large amounts of good quality data. Within important fields, such as healthcare, the training of AI systems predominately relies on health and personal data; however, the usage of this data is complicated by various layers of law and ethics that seek to protect individuals’ privacy rights. This research seeks to establish the challenges AI and data sciences pose to (i) informational rights, (ii) privacy rights, and (iii) data protection. To solve some of the issues presented, various methods are suggested, such as embedding values in technological development, proper balancing of rights and interests, and others.

Keywords: artificial intelligence, data science, law, policy

Procedia PDF Downloads 104
24784 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: simulation data, data summarization, spatial histograms, exploration, visualization

Procedia PDF Downloads 175
24783 Identification of Conserved Domains and Motifs for GRF Gene Family

Authors: Jafar Ahmadi, Nafiseh Noormohammadi, Sedegeh Fabriki Ourang

Abstract:

GRF, Growth regulating factor, genes encode a novel class of plant-specific transcription factors. The GRF proteins play a role in the regulation of cell numbers in young and growing tissues and may act as transcription activations in growth and development of plants. Identification of GRF genes and their expression are important in plants to performance of the growth and development of various organs. In this study, to better understanding the structural and functional differences of GRFs family, 45 GRF proteins sequences in A. thaliana, Z. mays, O. sativa, B. napus, B. rapa, H. vulgare, and S. bicolor, have been collected and analyzed through bioinformatics data mining. As a result, in secondary structure of GRFs, the number of alpha helices was more than beta sheets and in all of them QLQ domains were completely in the biggest alpha helix. In all GRFs, QLQ, and WRC domains were completely protected except in AtGRF9. These proteins have no trans-membrane domain and due to have nuclear localization signals act in nuclear and they are component of unstable proteins in the test tube.

Keywords: domain, gene family, GRF, motif

Procedia PDF Downloads 456
24782 Smart in Performance: More to Practical Life than Hardware and Software

Authors: Faten Hatem

Abstract:

This paper promotes the importance of focusing on spatial aspects and affective factors that impact smart urbanism. This helps to better inform city governance, spatial planning, and policymaking to focus on what Smart does and what it can achieve for cities in terms of performance rather than on using the notion for prestige in a worldwide trend towards becoming a smart city. By illustrating how this style of practice compromises the social aspects and related elements of space making through an interdisciplinary comparative approach, the paper clarifies the impact of this compromise on the overall smart city performance. In response, this paper recognizes the importance of establishing a new meaning for urban progress by moving beyond improving basic services of the city to enhance the actual human experience which is essential for the development of authentic smart cities. The topic is presented under five overlooked areas that discuss the relation between smart cities’ potential and efficiency paradox, the social aspect, connectedness with nature, the human factor, and untapped resources. However, these themes are not meant to be discussed in silos, instead, they are presented to collectively examine smart cities in performance, arguing there is more to the practical life of smart cities than software and hardware inventions. The study is based on a case study approach, presenting Milton Keynes as a living example to learn from while engaging with various methods for data collection including multi-disciplinary semi-structured interviews, field observations, and data mining.

Keywords: smart design, the human in the city, human needs and urban planning, sustainability, smart cities, smart

Procedia PDF Downloads 99
24781 Electrochemical APEX for Genotyping MYH7 Gene: A Low Cost Strategy for Minisequencing of Disease Causing Mutations

Authors: Ahmed M. Debela, Mayreli Ortiz , Ciara K. O´Sullivan

Abstract:

The completion of the human genome Project (HGP) has paved the way for mapping the diversity in the overall genome sequence which helps to understand the genetic causes of inherited diseases and susceptibility to drugs or environmental toxins. Arrayed primer extension (APEX) is a microarray based minisequencing strategy for screening disease causing mutations. It is derived from Sanger DNA sequencing and uses fluorescently dideoxynucleotides (ddNTPs) for termination of a growing DNA strand from a primer with its 3´- end designed immediately upstream of a site where single nucleotide polymorphism (SNP) occurs. The use of DNA polymerase offers a very high accuracy and specificity to APEX which in turn happens to be a method of choice for multiplex SNP detection. Coupling the high specificity of this method with the high sensitivity, low cost and compatibility for miniaturization of electrochemical techniques would offer an excellent platform for detection of mutation as well as sequencing of DNA templates. We are developing an electrochemical APEX for the analysis of SNPs found in the MYH7 gene for group of cardiomyopathy patients. ddNTPs were labeled with four different redox active compounds with four distinct potentials. Thiolated oligonucleotide probes were immobilised on gold and glassy carbon substrates which are followed by hybridisation with complementary target DNA just adjacent to the base to be extended by polymerase. Electrochemical interrogation was performed after the incorporation of the redox labelled dedioxynucleotide. The work involved the synthesis and characterisation of the redox labelled ddNTPs, optimisation and characterisation of surface functionalisation strategies and the nucleotide incorporation assays.

Keywords: array based primer extension, labelled ddNTPs, electrochemical, mutations

Procedia PDF Downloads 245
24780 Intelligent Software Architecture and Automatic Re-Architecting Based on Machine Learning

Authors: Gebremeskel Hagos Gebremedhin, Feng Chong, Heyan Huang

Abstract:

Software system is the combination of architecture and organized components to accomplish a specific function or set of functions. A good software architecture facilitates application system development, promotes achievement of functional requirements, and supports system reconfiguration. We describe three studies demonstrating the utility of our architecture in the subdomain of mobile office robots and identify software engineering principles embodied in the architecture. The main aim of this paper is to analyze prove architecture design and automatic re-architecting using machine learning. Intelligence software architecture and automatic re-architecting process is reorganizing in to more suitable one of the software organizational structure system using the user access dataset for creating relationship among the components of the system. The 3-step approach of data mining was used to analyze effective recovery, transformation and implantation with the use of clustering algorithm. Therefore, automatic re-architecting without changing the source code is possible to solve the software complexity problem and system software reuse.

Keywords: intelligence, software architecture, re-architecting, software reuse, High level design

Procedia PDF Downloads 117
24779 Crime Prevention with Artificial Intelligence

Authors: Mehrnoosh Abouzari, Shahrokh Sahraei

Abstract:

Today, with the increase in quantity and quality and variety of crimes, the discussion of crime prevention has faced a serious challenge that human resources alone and with traditional methods will not be effective. One of the developments in the modern world is the presence of artificial intelligence in various fields, including criminal law. In fact, the use of artificial intelligence in criminal investigations and fighting crime is a necessity in today's world. The use of artificial intelligence is far beyond and even separate from other technologies in the struggle against crime. Second, its application in criminal science is different from the discussion of prevention and it comes to the prediction of crime. Crime prevention in terms of the three factors of the offender, the offender and the victim, following a change in the conditions of the three factors, based on the perception of the criminal being wise, and therefore increasing the cost and risk of crime for him in order to desist from delinquency or to make the victim aware of self-care and possibility of exposing him to danger or making it difficult to commit crimes. While the presence of artificial intelligence in the field of combating crime and social damage and dangers, like an all-seeing eye, regardless of time and place, it sees the future and predicts the occurrence of a possible crime, thus prevent the occurrence of crimes. The purpose of this article is to collect and analyze the studies conducted on the use of artificial intelligence in predicting and preventing crime. How capable is this technology in predicting crime and preventing it? The results have shown that the artificial intelligence technologies in use are capable of predicting and preventing crime and can find patterns in the data set. find large ones in a much more efficient way than humans. In crime prediction and prevention, the term artificial intelligence can be used to refer to the increasing use of technologies that apply algorithms to large sets of data to assist or replace police. The use of artificial intelligence in our debate is in predicting and preventing crime, including predicting the time and place of future criminal activities, effective identification of patterns and accurate prediction of future behavior through data mining, machine learning and deep learning, and data analysis, and also the use of neural networks. Because the knowledge of criminologists can provide insight into risk factors for criminal behavior, among other issues, computer scientists can match this knowledge with the datasets that artificial intelligence uses to inform them.

Keywords: artificial intelligence, criminology, crime, prevention, prediction

Procedia PDF Downloads 75
24778 Cross-Knowledge Graph Relation Completion for Non-Isomorphic Cross-Lingual Entity Alignment

Authors: Yuhong Zhang, Dan Lu, Chenyang Bu, Peipei Li, Kui Yu, Xindong Wu

Abstract:

The Cross-Lingual Entity Alignment (CLEA) task aims to find the aligned entities that refer to the same identity from two knowledge graphs (KGs) in different languages. It is an effective way to enhance the performance of data mining for KGs with scarce resources. In real-world applications, the neighborhood structures of the same entities in different KGs tend to be non-isomorphic, which makes the representation of entities contain diverse semantic information and then poses a great challenge for CLEA. In this paper, we try to address this challenge from two perspectives. On the one hand, the cross-KG relation completion rules are designed with the alignment constraint of entities and relations to improve the topology isomorphism of two KGs. On the other hand, a representation method combining isomorphic weights is designed to include more isomorphic semantics for counterpart entities, which will benefit the CLEA. Experiments show that our model can improve the isomorphism of two KGs and the alignment performance, especially for two non-isomorphic KGs.

Keywords: knowledge graphs, cross-lingual entity alignment, non-isomorphic, relation completion

Procedia PDF Downloads 118
24777 Passive Attenuation of Nitrogen Species at Northern Mine Sites

Authors: Patrick Mueller, Alan Martin, Justin Stockwell, Robert Goldblatt

Abstract:

Elevated concentrations of inorganic nitrogen (N) compounds (nitrate, nitrite, and ammonia) are a ubiquitous feature to mine-influenced drainages due to the leaching of blasting residues and use of cyanide in the milling of gold ores. For many mines, the management of N is a focus for environmental protection, therefore understanding the factors controlling the speciation and behavior of N is central to effective decision making. In this paper, the passive attenuation of ammonia and nitrite is described for three northern water bodies (two lakes and a tailings pond) influenced by mining activities. In two of the water bodies, inorganic N compounds originate from explosives residues in mine water and waste rock. The third water body is a decommissioned tailings impoundment, with N compounds largely originating from the breakdown of cyanide compounds used in the processing of gold ores. Empirical observations from water quality monitoring indicate nitrification (the oxidation of ammonia to nitrate) occurs in all three waterbodies, where enrichment of nitrate occurs commensurately with ammonia depletion. The N species conversions in these systems occurred more rapidly than chemical oxidation kinetics permit, indicating that microbial mediated conversion was occurring, despite the cool water temperatures. While nitrification of ammonia and nitrite to nitrate was the primary process, in all three waterbodies nitrite was consistently present at approximately 0.5 to 2.0 % of total N, even following ammonia depletion. The persistence of trace amounts of nitrite under these conditions suggests the co-occurrence denitrification processes in the water column and/or underlying substrates. The implications for N management in mine waters are discussed.

Keywords: explosives, mining, nitrification, water

Procedia PDF Downloads 317
24776 Monoallelic and Biallelic Deletions of 13q14 in a Group of 36 CLL Patients Investigated by CGH Haematological Cancer and SNP Array (8x60K)

Authors: B. Grygalewicz, R. Woroniecka, J. Rygier, K. Borkowska, A. Labak, B. Nowakowska, B. Pienkowska-Grela

Abstract:

Introduction: Chronic lymphocytic leukemia (CLL) is the most common form of adult leukemia in the Western world. Hemizygous and or homozygous loss at 13q14 occur in more than half of cases and constitute the most frequent chromosomal abnormality in CLL. It is believed that deletions 13q14 play a role in CLL pathogenesis. Two microRNA genes miR-15a and miR- 16-1 are targets of 13q14 deletions and plays a tumor suppressor role by targeting antiapoptotic BCL2 gene. Deletion size, as a single change detected in FISH analysis, has haprognostic significance. Patients with small deletions, without RB1 gene involvement, have the best prognosis and the longest overall survival time (OS 133 months). In patients with bigger deletion region, containing RB1 gene, prognosis drops to intermediate, like in patients with normal karyotype and without changes in FISH with overall survival 111 months. Aim: Precise delineation of 13q14 deletions regions in two groups of CLL patients, with mono- and biallelic deletions and qualifications of their prognostic significance. Methods: Detection of 13q14 deletions was performed by FISH analysis with CLL probe panel (D13S319, LAMP1, TP53, ATM, CEP-12). Accurate deletion size detection was performed by CGH Haematological Cancer and SNP array (8x60K). Results: Our investigated group of CLL patients with the 13q14 deletion, detected by FISH analysis, comprised two groups: 18 patients with monoallelic deletions and 18 patients with biallelic deletions. In FISH analysis, in the monoallelic group the range of cells with deletion, was 43% to 97%, while in biallelic group deletion was detected in 11% to 94% of cells. Microarray analysis revealed precise deletion regions. In the monoallelic group, the range of size was 348,12 Kb to 34,82 Mb, with median deletion size 7,93 Mb. In biallelic group discrepancy of total deletions, size was 135,27 Kb to 33,33 Mb, with median deletion size 2,52 Mb. The median size of smaller deletion regions on one copy chromosome 13 was 1,08 Mb while the average region of bigger deletion on the second chromosome 13 was 4,04 Mb. In the monoallelic group, in 8/18 deletion region covered RB1 gene. In the biallelic group, in 4/18 cases, revealed deletion on one copy of biallelic deletion and in 2/18 showed deletion of RB1 gene on both deleted 13q14 regions. All minimal deleted regions included miR-15a and miR-16-1 genes. Genetic results will be correlated with clinical data. Conclusions: Application of CGH microarrays technique in CLL allows accurately delineate the size of 13q14 deletion regions, what have a prognostic value. All deleted regions included miR15a and miR-16-1, what confirms the essential role of these genes in CLL pathogenesis. In our investigated groups of CLL patients with mono- and biallelic 13q14 deletions, patients with biallelic deletion presented smaller deletion sizes (2,52 Mb vs 7,93 Mb), what is connected with better prognosis.

Keywords: CLL, deletion 13q14, CGH microarrays, SNP array

Procedia PDF Downloads 254
24775 Big Data: Concepts, Technologies and Applications in the Public Sector

Authors: A. Alexandru, C. A. Alexandru, D. Coardos, E. Tudora

Abstract:

Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data types, (2) accumulation speed, and (3) data volume. This paper presents the main concepts related to the BD paradigm, and introduces architectures and technologies for BD and BD sets. The integration of BD with the Hadoop Framework is also underlined. BD has attracted a lot of attention in the public sector due to the newly emerging technologies that allow the availability of network access. The volume of different types of data has exponentially increased. Some applications of BD in the public sector in Romania are briefly presented.

Keywords: big data, big data analytics, Hadoop, cloud

Procedia PDF Downloads 308
24774 Semantic Data Schema Recognition

Authors: Aïcha Ben Salem, Faouzi Boufares, Sebastiao Correia

Abstract:

The subject covered in this paper aims at assisting the user in its quality approach. The goal is to better extract, mix, interpret and reuse data. It deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.

Keywords: schema recognition, semantic data profiling, meta-categorisation, semantic dependencies inter columns

Procedia PDF Downloads 416
24773 Access Control System for Big Data Application

Authors: Winfred Okoe Addy, Jean Jacques Dominique Beraud

Abstract:

Access control systems (ACs) are some of the most important components in safety areas. Inaccuracies of regulatory frameworks make personal policies and remedies more appropriate than standard models or protocols. This problem is exacerbated by the increasing complexity of software, such as integrated Big Data (BD) software for controlling large volumes of encrypted data and resources embedded in a dedicated BD production system. This paper proposes a general access control strategy system for the diffusion of Big Data domains since it is crucial to secure the data provided to data consumers (DC). We presented a general access control circulation strategy for the Big Data domain by describing the benefit of using designated access control for BD units and performance and taking into consideration the need for BD and AC system. We then presented a generic of Big Data access control system to improve the dissemination of Big Data.

Keywords: access control, security, Big Data, domain

Procedia PDF Downloads 132
24772 A Data Envelopment Analysis Model in a Multi-Objective Optimization with Fuzzy Environment

Authors: Michael Gidey Gebru

Abstract:

Most of Data Envelopment Analysis models operate in a static environment with input and output parameters that are chosen by deterministic data. However, due to ambiguity brought on shifting market conditions, input and output data are not always precisely gathered in real-world scenarios. Fuzzy numbers can be used to address this kind of ambiguity in input and output data. Therefore, this work aims to expand crisp Data Envelopment Analysis into Data Envelopment Analysis with fuzzy environment. In this study, the input and output data are regarded as fuzzy triangular numbers. Then, the Data Envelopment Analysis model with fuzzy environment is solved using a multi-objective method to gauge the Decision Making Units' efficiency. Finally, the developed Data Envelopment Analysis model is illustrated with an application on real data 50 educational institutions.

Keywords: efficiency, Data Envelopment Analysis, fuzzy, higher education, input, output

Procedia PDF Downloads 56
24771 Efficient Fuzzy Classified Cryptographic Model for Intelligent Encryption Technique towards E-Banking XML Transactions

Authors: Maher Aburrous, Adel Khelifi, Manar Abu Talib

Abstract:

Transactions performed by financial institutions on daily basis require XML encryption on large scale. Encrypting large volume of message fully will result both performance and resource issues. In this paper a novel approach is presented for securing financial XML transactions using classification data mining (DM) algorithms. Our strategy defines the complete process of classifying XML transactions by using set of classification algorithms, classified XML documents processed at later stage using element-wise encryption. Classification algorithms were used to identify the XML transaction rules and factors in order to classify the message content fetching important elements within. We have implemented four classification algorithms to fetch the importance level value within each XML document. Classified content is processed using element-wise encryption for selected parts with "High", "Medium" or “Low” importance level values. Element-wise encryption is performed using AES symmetric encryption algorithm and proposed modified algorithm for AES to overcome the problem of computational overhead, in which substitute byte, shift row will remain as in the original AES while mix column operation is replaced by 128 permutation operation followed by add round key operation. An implementation has been conducted using data set fetched from e-banking service to present system functionality and efficiency. Results from our implementation showed a clear improvement in processing time encrypting XML documents.

Keywords: XML transaction, encryption, Advanced Encryption Standard (AES), XML classification, e-banking security, fuzzy classification, cryptography, intelligent encryption

Procedia PDF Downloads 408
24770 Frequent-Pattern Tree Algorithm Application to S&P and Equity Indexes

Authors: E. Younsi, H. Andriamboavonjy, A. David, S. Dokou, B. Lemrabet

Abstract:

Software and time optimization are very important factors in financial markets, which are competitive fields, and emergence of new computer tools further stresses the challenge. In this context, any improvement of technical indicators which generate a buy or sell signal is a major issue. Thus, many tools have been created to make them more effective. This worry about efficiency has been leading in present paper to seek best (and most innovative) way giving largest improvement in these indicators. The approach consists in attaching a signature to frequent market configurations by application of frequent patterns extraction method which is here most appropriate to optimize investment strategies. The goal of proposed trading algorithm is to find most accurate signatures using back testing procedure applied to technical indicators for improving their performance. The problem is then to determine the signatures which, combined with an indicator, outperform this indicator alone. To do this, the FP-Tree algorithm has been preferred, as it appears to be the most efficient algorithm to perform this task.

Keywords: quantitative analysis, back-testing, computational models, apriori algorithm, pattern recognition, data mining, FP-tree

Procedia PDF Downloads 361
24769 Power Asymmetry and Major Corporate Social Responsibility Projects in Mhondoro-Ngezi District, Zimbabwe

Authors: A. T. Muruviwa

Abstract:

Empirical studies of the current CSR agenda have been dominated by literature from the North at the expense of the nations from the South where most TNCs are located. Therefore, owing to the limitations of the current discourse that is dominated by Western ideas such as voluntarism, philanthropy, business case and economic gains, scholars have been calling for a new CSR agenda that is South-centred and addresses the needs of developing nations. The development theme has dominated in the recent literature as scholars concerned with the relationship between business and society have tried to understand its relationship with CSR. Despite a plethora of literature on the roles of corporations in local communities and the impact of CSR initiatives, there is lack of adequate empirical evidence to help us understand the nexus between CSR and development. For all the claims made about the positive and negative consequences of CSR, there is surprisingly little information about the outcomes it delivers. This study is a response to these claims made about the developmental aspect of CSR in developing countries. It offers some empirical bases for assessing the major CSR projects that have been fulfilled by a major mining company, Zimplats in Mhondoro-Ngezi Zimbabwe. The neo-liberal idea of capitalism and market dominations has empowered TNCs to stamp their authority in the developing countries. TNCs have made their mark in developing nations as they stamp their global private authority, rivalling or implicitly challenging the state in many functions. This dominance of corporate power raises great concerns over their tendencies of abuses in terms of environmental, social and human rights concerns as well as how to make them increasingly accountable. The hegemonic power of TNCs in the developing countries has had a tremendous impact on the overall CSR practices. While TNCs are key drivers of globalization they may be acting responsibly in their Global Northern home countries where there is a combination of legal mechanisms and the fear of civil society activism associated with corporate scandals. Using a triangulated approach in which both qualitative and quantitative methods were used the study found out that most CSR projects in Zimbabwe are dominated and directed by Zimplats because of the power it possesses. Most of the major CSR projects are beneficial to the mining company as they serve the business plans of the mining company. What was deduced from the study is that the infrastructural development initiatives by Zimplats confirm that CSR is a tool to advance business obligations. This shows that although proponents of CSR might claim that business has a mandate for social obligations to society, we need not to forget the dominant idea that the primary function of CSR is to enhance the firm’s profitability.

Keywords: hegemonic power, projects, reciprocity, stakeholders

Procedia PDF Downloads 252
24768 Allele Mining for Rice Sheath Blight Resistance by Whole-Genome Association Mapping in a Tail-End Population

Authors: Naoki Yamamoto, Hidenobu Ozaki, Taiichiro Ookawa, Youming Liu, Kazunori Okada, Aiping Zheng

Abstract:

Rice sheath blight is one of the destructive fungal diseases in rice. We have thought that rice sheath blight resistance is a polygenic trait. Host-pathogen interactions and secondary metabolites such as lignin and phytoalexins are likely to be involved in defense against R. solani. However, to our knowledge, it is still unknown how sheath blight resistance can be enhanced in rice breeding. To seek for an alternative genetic factor that contribute to sheath blight resistance, we mined relevant allelic variations from rice core collections created in Japan. Based on disease lesion length on detached leaf sheath, we selected 30 varieties of the top tail-end and the bottom tail-end, respectively, from the core collections to perform genome-wide association mapping. Re-sequencing reads for these varieties were used for calling single nucleotide polymorphisms among the 60 varieties to create a SNP panel, which contained 1,137,131 homozygous variant sites after filitering. Association mapping highlighted a locus on the long arm of chromosome 11, which is co-localized with three sheath blight QTLs, qShB11-2-TX, qShB11, and qSBR-11-2. Based on the localization of the trait-associated alleles, we identified an ankyryn repeat-containing protein gene (ANK-M) as an uncharacterized candidate factor for rice sheath blight resistance. Allelic distributions for ANK-M in the whole rice population supported the reliability of trait-allele associations. Gene expression characteristics were checked to evaluiate the functionality of ANK-M. Since an ANK-M homolog (OsPIANK1) in rice seems a basal defense regulator against rice blast and bacterial leaf blight, ANK-M may also play a role in the rice immune system.

Keywords: allele mining, GWAS, QTL, rice sheath blight

Procedia PDF Downloads 76