Search results for: database replication
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1784

Search results for: database replication

1484 Analysis of Brownfield Soil Contamination Using Local Government Planning Data

Authors: Emma E. Hellawell, Susan J. Hughes

Abstract:

BBrownfield sites are currently being redeveloped for residential use. Information on soil contamination on these former industrial sites is collected as part of the planning process by the local government. This research project analyses this untapped resource of environmental data, using site investigation data submitted to a local Borough Council, in Surrey, UK. Over 150 site investigation reports were collected and interrogated to extract relevant information. This study involved three phases. Phase 1 was the development of a database for soil contamination information from local government reports. This database contained information on the source, history, and quality of the data together with the chemical information on the soil that was sampled. Phase 2 involved obtaining site investigation reports for development within the study area and extracting the required information for the database. Phase 3 was the data analysis and interpretation of key contaminants to evaluate typical levels of contaminants, their distribution within the study area, and relating these results to current guideline levels of risk for future site users. Preliminary results for a pilot study using a sample of the dataset have been obtained. This pilot study showed there is some inconsistency in the quality of the reports and measured data, and careful interpretation of the data is required. Analysis of the information has found high levels of lead in shallow soil samples, with mean and median levels exceeding the current guidance for residential use. The data also showed elevated (but below guidance) levels of potentially carcinogenic polyaromatic hydrocarbons. Of particular concern from the data was the high detection rate for asbestos fibers. These were found at low concentrations in 25% of the soil samples tested (however, the sample set was small). Contamination levels of the remaining chemicals tested were all below the guidance level for residential site use. These preliminary pilot study results will be expanded, and results for the whole local government area will be presented at the conference. The pilot study has demonstrated the potential for this extensive dataset to provide greater information on local contamination levels. This can help inform regulators and developers and lead to more targeted site investigations, improving risk assessments, and brownfield development.

Keywords: Brownfield development, contaminated land, local government planning data, site investigation

Procedia PDF Downloads 112
1483 The Automatisation of Dictionary-Based Annotation in a Parallel Corpus of Old English

Authors: Ana Elvira Ojanguren Lopez, Javier Martin Arista

Abstract:

The aims of this paper are to present the automatisation procedure adopted in the implementation of a parallel corpus of Old English, as well as, to assess the progress of automatisation with respect to tagging, annotation, and lemmatisation. The corpus consists of an aligned parallel text with word-for-word comparison Old English-English that provides the Old English segment with inflectional form tagging (gloss, lemma, category, and inflection) and lemma annotation (spelling, meaning, inflectional class, paradigm, word-formation and secondary sources). This parallel corpus is intended to fill a gap in the field of Old English, in which no parallel and/or lemmatised corpora are available, while the average amount of corpus annotation is low. With this background, this presentation has two main parts. The first part, which focuses on tagging and annotation, selects the layouts and fields of lexical databases that are relevant for these tasks. Most information used for the annotation of the corpus can be retrieved from the lexical and morphological database Nerthus and the database of secondary sources Freya. These are the sources of linguistic and metalinguistic information that will be used for the annotation of the lemmas of the corpus, including morphological and semantic aspects as well as the references to the secondary sources that deal with the lemmas in question. Although substantially adapted and re-interpreted, the lemmatised part of these databases draws on the standard dictionaries of Old English, including The Student's Dictionary of Anglo-Saxon, An Anglo-Saxon Dictionary, and A Concise Anglo-Saxon Dictionary. The second part of this paper deals with lemmatisation. It presents the lemmatiser Norna, which has been implemented on Filemaker software. It is based on a concordance and an index to the Dictionary of Old English Corpus, which comprises around three thousand texts and three million words. In its present state, the lemmatiser Norna can assign lemma to around 80% of textual forms on an automatic basis, by searching the index and the concordance for prefixes, stems and inflectional endings. The conclusions of this presentation insist on the limits of the automatisation of dictionary-based annotation in a parallel corpus. While the tagging and annotation are largely automatic even at the present stage, the automatisation of alignment is pending for future research. Lemmatisation and morphological tagging are expected to be fully automatic in the near future, once the database of secondary sources Freya and the lemmatiser Norna have been completed.

Keywords: corpus linguistics, historical linguistics, old English, parallel corpus

Procedia PDF Downloads 177
1482 Development of a Social Assistive Robot for Elderly Care

Authors: Edwin Foo, Woei Wen, Lui, Meijun Zhao, Shigeru Kuchii, Chin Sai Wong, Chung Sern Goh, Yi Hao He

Abstract:

This presentation presents an elderly care and assistive social robot development work. We named this robot JOS and he is restricted to table top operation. JOS is designed to have a maximum volume of 3600 cm3 with its base restricted to 250 mm and his mission is to provide companion, assist and help the elderly. In order for JOS to accomplish his mission, he will be equipped with perception, reaction and cognition capability. His appearance will be not human like but more towards cute and approachable type. JOS will also be designed to be neutral gender. However, the robot will still have eyes, eyelid and a mouth. For his eyes and eyelids, they will be built entirely with Robotis Dynamixel AX18 motor. To realize this complex task, JOS will be also be equipped with micro-phone array, vision camera and Intel i5 NUC computer and a powered by a 12 V lithium battery that will be self-charging. His face is constructed using 1 motor each for the eyelid, 2 motors for the eyeballs, 3 motors for the neck mechanism and 1 motor for the lips movement. The vision senor will be house on JOS forehead and the microphone array will be somewhere below the mouth. For the vision system, Omron latest OKAO vision sensor is used. It is a compact and versatile sensor that is only 60mm by 40mm in size and operates with only 5V supply. In addition, OKAO vision sensor is capable of identifying the user and recognizing the expression of the user. With these functions, JOS is able to track and identify the user. If he cannot recognize the user, JOS will ask the user if he would want him to remember the user. If yes, JOS will store the user information together with the capture face image into a database. This will allow JOS to recognize the user the next time the user is with JOS. In addition, JOS is also able to interpret the mood of the user through the facial expression of the user. This will allow the robot to understand the user mood and behavior and react according. Machine learning will be later incorporated to learn the behavior of the user so as to understand the mood of the user and requirement better. For the speech system, Microsoft speech and grammar engine is used for the speech recognition. In order to use the speech engine, we need to build up a speech grammar database that captures the commonly used words by the elderly. This database is built from research journals and literature on elderly speech and also interviewing elderly what do they want to robot to assist them with. Using the result from the interview and research from journal, we are able to derive a set of common words the elderly frequently used to request for the help. It is from this set that we build up our grammar database. In situation where there is more than one person near JOS, he is able to identify the person who is talking to him through an in-house developed microphone array structure. In order to make the robot more interacting, we have also included the capability for the robot to express his emotion to the user through the facial expressions by changing the position and movement of the eyelids and mouth. All robot emotions will be in response to the user mood and request. Lastly, we are expecting to complete this phase of project and test it with elderly and also delirium patient by Feb 2015.

Keywords: social robot, vision, elderly care, machine learning

Procedia PDF Downloads 420
1481 Antiviral Activity of Interleukin-11 in Response to Porcine Epidemic Diarrhea Virus Infection

Authors: Li Yuchen, Wu Qingxin, Jin Yuxing, Yang Qian

Abstract:

Interleukin-11 (IL-11), a well-known anti-inflammatory factor, helps to protect against intestinal epithelium damage caused by physical or chemical factors. However, little is known about the role of IL-11 during viral infection. Herein, high mRNA and protein levels of IL-11 were found in epithelial cells and jejunum of piglets during porcine epidemic diarrhea virus (PEDV) infection, and IL-11 expression was positively correlated with the level of viral infection. Pretreatment with recombinant porcine IL-11 (pIL-11) suppressed PEDV replication in Vero E6 cells, while IL-11 knockdown promoted viral infection. Furthermore, pIL-11 inhibited viral infection by preventing PEDV-mediated apoptosis of cells through activating the IL-11/STAT3 signal pathway. Conversely, application of a STAT3 phosphorylation inhibitor significantly antagonized the anti-apoptosis function of pIL-11 and counteracted its inhibition of PEDV. Our data suggested that that IL-11 is a novel PEDV-inducible cytokine, and its production enhances the anti-apoptosis ability of epithelial cells against PEDV infection. The potential uses of IL-11 as a novel therapeutic against devastating viral diarrhea in piglets deserves more attention and study.

Keywords: Interleukin-11, Porcine epidemic diarrhea virus, STAT3, anti-apoptosis

Procedia PDF Downloads 109
1480 Implicit and Explicit Mechanisms of Emotional Contagion

Authors: Andres Pinilla Palacios, Ricardo Tamayo

Abstract:

Emotional contagion is characterized as an automatic tendency to synchronize behaviors that facilitate emotional convergence among humans. It might thus play a pivotal role to understand the dynamics of key social interactions. However, a few research has investigated its potential mechanisms. We suggest two complementary but independent processes that may underlie emotional contagion. The efficient contagion hypothesis, based on fast and implicit bottom-up processes, modulated by familiarity and spread of activation in the emotional associative networks of memory. Secondly, the emotional contrast hypothesis, based on slow and explicit top-down processes guided by deliberated appraisal and hypothesis-testing. In order to assess these two hypotheses, an experiment with 39 participants was conducted. In the first phase, participants were induced (between-groups) to an emotional state (positive, neutral or negative) using a standardized video taken from the FilmStim database. In the second phase, participants classified and rated (within-subject) the emotional state of 15 faces (5 for each emotional state) taken from the POFA database. In the third phase, all participants were returned to a baseline emotional state using the same neutral video used in the first phase. In a fourth phase, participants classified and rated a new set of 15 faces. The accuracy in the identification and rating of emotions was partially explained by the efficient contagion hypothesis, but the speed with which these judgments were made was partially explained by the emotional contrast hypothesis. However, results are ambiguous, so a follow-up experiment is proposed in which emotional expressions and activation of the sympathetic system will be measured using EMG and EDA respectively.

Keywords: electromyography, emotional contagion, emotional valence, identification of emotions, imitation

Procedia PDF Downloads 285
1479 Wave Pressure Metering with the Specific Instrument and Measure Description Determined by the Shape and Surface of the Instrument including the Number of Sensors and Angle between Them

Authors: Branimir Jurun, Elza Jurun

Abstract:

Focus of this paper is description and functioning manner of the instrument for wave pressure metering. Moreover, an essential component of this paper is the proposal of a metering unit for the direct wave pressure measurement determined by the shape and surface of the instrument including the number of sensors and angle between them. Namely, far applied instruments by means of height, length, direction, wave time period and other components determine wave pressure on a particular area. This instrument, allows the direct measurement i.e. measurement without additional calculation, of the wave pressure expressed in a standardized unit of measure. That way the instrument has a standardized form, surface, number of sensors and the angle between them. In addition, it is made with the status that follows the wave and always is on the water surface. Database quality which is listed by the instrument is made possible by using the Arduino chip. This chip is programmed for receiving by two data from each of the sensors each second. From these data by a pre-defined manner a unique representative value is estimated. By this procedure all relevant wave pressure measurement results are directly and immediately registered. Final goal of establishing such a rich database is a comprehensive statistical analysis that ranges from multi-criteria analysis across different modeling and parameters testing to hypothesis accepting relating to the widest variety of man-made activities such as filling of beaches, security cages for aquaculture, bridges construction.

Keywords: instrument, metering, water, waves

Procedia PDF Downloads 236
1478 Using Analytical Hierarchy Process and TOPSIS Approaches in Designing a Finite Element Analysis Automation Program

Authors: Ming Wen, Nasim Nezamoddini

Abstract:

Sophisticated numerical simulations like finite element analysis (FEA) involve a complicated process from model setup to post-processing tasks that require replication of time-consuming steps. Utilizing FEA automation program simplifies the complexity of the involved steps while minimizing human errors in analysis set up, calculations, and results processing. One of the main challenges in designing FEA automation programs is to identify user requirements and link them to possible design alternatives. This paper presents a decision-making framework to design a Python based FEA automation program for modal analysis, frequency response analysis, and random vibration fatigue (RVF) analysis procedures. Analytical hierarchy process (AHP) and technique for order preference by similarity to ideal solution (TOPSIS) are applied to evaluate design alternatives considering the feedback received from experts and program users.

Keywords: finite element analysis, FEA, random vibration fatigue, process automation, analytical hierarchy process, AHP, TOPSIS, multiple-criteria decision-making, MCDM

Procedia PDF Downloads 86
1477 Laying Performance of Itik Pinas (Anas platyrynchos Linnaeus) as Affected by Garlic (Allium sativum) Powder in Drinking Water

Authors: Gianne Bianca P. Manalo, Ernesto A. Martin, Vanessa V. Velasco

Abstract:

The laying performance, egg quality, egg classification, and income over feed cost of Improved Philippine Mallard duck (Itik Pinas) were examined as influenced by garlic powder in drinking water. A total of 48 ducks (42 females and 6 males) were used in the study. The ducks were allocated into two treatments - with garlic powder (GP) and without garlic powder (control) in drinking water. Each treatment had three replicates with eight ducks (7 females and 1 male) per replication. The results showed that there was a significant (P = 0.03) difference in average egg weight where higher values were attained by ducks with GP (77.67 g ± 0.64) than the control (75.64 g ± 0.43). The supplementation of garlic powder in drinking water, however, did not affect the egg production, feed intake, FCR, egg mass, livability, egg quality and egg classification. The Itik Pinas with GP in drinking water had numerically higher income over feed cost than those without. GP in drinking water can be considered in raising Itik Pinas. Further studies on increasing level of GP and long feeding duration also merit consideration to substantiate the findings.

Keywords: phytogenic, garlic powder, Itik-Pinas, egg weight, egg production

Procedia PDF Downloads 56
1476 Stress-Strain Relation for Hybrid Fiber Reinforced Concrete at Elevated Temperature

Authors: Josef Novák, Alena Kohoutková

Abstract:

The performance of concrete structures in fire depends on several factors which include, among others, the change in material properties due to the fire. Today, fiber reinforced concrete (FRC) belongs to materials which have been widely used for various structures and elements. While the knowledge and experience with FRC behavior under ambient temperature is well-known, the effect of elevated temperature on its behavior has to be deeply investigated. This paper deals with an experimental investigation and stress‑strain relations for hybrid fiber reinforced concrete (HFRC) which contains siliceous aggregates, polypropylene and steel fibers. The main objective of the experimental investigation is to enhance a database of mechanical properties of concrete composites with addition of fibers subject to elevated temperature as well as to validate existing stress-strain relations for HFRC. Within the investigation, a unique heat transport test, compressive test and splitting tensile test were performed on 150 mm cubes heated up to 200, 400, and 600 °C with the aim to determine a time period for uniform heat distribution in test specimens and the mechanical properties of the investigated concrete composite, respectively. Both findings obtained from the presented experimental test as well as experimental data collected from scientific papers so far served for validating the computational accuracy of investigated stress-strain relations for HFRC which have been developed during last few years. Owing to the presence of steel and polypropylene fibers, HFRC becomes a unique material whose structural performance differs from conventional plain concrete when exposed to elevated temperature. Polypropylene fibers in HFRC lower the risk of concrete spalling as the fibers burn out shortly with increasing temperature due to low ignition point and as a consequence pore pressure decreases. On the contrary, the increase in the concrete porosity might affect the mechanical properties of the material. To validate this thought requires enhancing the existing result database which is very limited and does not contain enough data. As a result of the poor database, only few stress-strain relations have been developed so far to describe the structural performance of HFRC at elevated temperature. Moreover, many of them are inconsistent and need to be refined. Most of them also do not take into account the effect of both a fiber type and fiber content. Such approach might be vague especially when high amount of polypropylene fibers are used. Therefore, the existing relations should be validated in detail based on other experimental results.

Keywords: elevated temperature, fiber reinforced concrete, mechanical properties, stress strain relation

Procedia PDF Downloads 310
1475 Multi-Objective Optimization for the Green Vehicle Routing Problem: Approach to Case Study of the Newspaper Distribution Problem

Authors: Julio C. Ferreira, Maria T. A. Steiner

Abstract:

The aim of this work is to present a solution procedure referred to here as the Multi-objective Optimization for Green Vehicle Routing Problem (MOOGVRP) to provide solutions for a case study. The proposed methodology consists of three stages to resolve Scenario A. Stage 1 consists of the “treatment” of data; Stage 2 consists of applying mathematical models of the p-Median Capacitated Problem (with the objectives of minimization of distances and homogenization of demands between groups) and the Asymmetric Traveling Salesman Problem (with the objectives of minimizing distances and minimizing time). The weighted method was used as the multi-objective procedure. In Stage 3, an analysis of the results is conducted, taking into consideration the environmental aspects related to the case study, more specifically with regard to fuel consumption and air pollutant emission. This methodology was applied to a (partial) database that addresses newspaper distribution in the municipality of Curitiba, Paraná State, Brazil. The preliminary findings for Scenario A showed that it was possible to improve the distribution of the load, reduce the mileage and the greenhouse gas by 17.32% and the journey time by 22.58% in comparison with the current scenario. The intention for future works is to use other multi-objective techniques and an expanded version of the database and explore the triple bottom line of sustainability.

Keywords: Asymmetric Traveling Salesman Problem, Green Vehicle Routing Problem, Multi-objective Optimization, p-Median Capacitated Problem

Procedia PDF Downloads 89
1474 Visualization of Flow Behaviour in Micro-Cavities during Micro Injection Moulding

Authors: Reza Gheisari, Paulo J. Bartolo, Nicholas Goddard

Abstract:

Polymeric micro-cantilevers (Cs) are rapidly becoming popular for MEMS applications such as chemo- and bio-sensing as well as purely electromechanical applications such as microrelays. Polymer materials present suitable physical and chemical properties combined with low-cost mass production. Hence, micro-cantilevers made of polymers indicate much more biocompatibility and adaptability of rapid prototyping along with mechanical properties. This research studies the effects of three process and one size factors on the filling behaviour in micro cavity, and the role of each in the replication of micro parts using different polymer materials i.e. polypropylene (PP) SABIC 56M10 and acrylonitrile butadiene styrene (ABS) Magnum 8434. In particular, the following factors are considered: barrel temperature, mould temperature, injection speed and the thickness of micro features. The study revealed that the barrel temperature and the injection speed are the key factors affecting the flow length of micro features replicated in PP and ABS. For both materials, an increase of feature sizes improves the melt flow. However, the melt fill of micro features does not increase linearly with the increase of their thickness.

Keywords: flow length, micro cantilevers, micro injection moulding, microfabrication

Procedia PDF Downloads 365
1473 SPBAC: A Semantic Policy-Based Access Control for Database Query

Authors: Aaron Zhang, Alimire Kahaer, Gerald Weber, Nalin Arachchilage

Abstract:

Access control is an essential safeguard for the security of enterprise data, which controls users’ access to information resources and ensures the confidentiality and integrity of information resources [1]. Research shows that the more common types of access control now have shortcomings [2]. In this direction, to improve the existing access control, we have studied the current technologies in the field of data security, deeply investigated the previous data access control policies and their problems, identified the existing deficiencies, and proposed a new extension structure of SPBAC. SPBAC extension proposed in this paper aims to combine Policy-Based Access Control (PBAC) with semantics to provide logically connected, real-time data access functionality by establishing associations between enterprise data through semantics. Our design combines policies with linked data through semantics to create a "Semantic link" so that access control is no longer per-database and determines that users in each role should be granted access based on the instance policy, and improves the SPBAC implementation by constructing policies and defined attributes through the XACML specification, which is designed to extend on the original XACML model. While providing relevant design solutions, this paper hopes to continue to study the feasibility and subsequent implementation of related work at a later stage.

Keywords: access control, semantic policy-based access control, semantic link, access control model, instance policy, XACML

Procedia PDF Downloads 63
1472 Correlation between Resistance to Non-Specific Inhibitor and Mammalian Pathogenicity of an Egg Adapted H9N2 Virus

Authors: Chung-Young Lee, Se-Hee Ahn, Jun-Gu Choi, Youn-Jeong Lee, Hyuk-Joon Kwon, Jae-Hong Kim

Abstract:

A/chicken/Korea/01310/2001 (H9N2) (01310) was passaged through embryonated chicken eggs (ECEs) by 20 times (01310-E20), and it has been used for an inactivated oil emulsion vaccine in Korea. After sequential passages, 01310-E20 showed higher pathogenicity in ECEs and acquired multiple mutations including a potential N-glycosylation at position 133 (H3 numbering) in HA and 18aa-deletion in NA stalk. To evaluate the effect of these mutations on the mammalian pathogenicity and resistance to non-specific inhibitors, we generated four PR8-derived recombinant viruses with different combinations of HA and NA from 01310-E2 and 01310-E20 (rH2N2, rH2N20, rH20N2, and rH20N20). According to our results, recombinant viruses containing 01310 E20 HA showed higher growth property in MDCK cells and higher virulence on mice than those containing 01310 E2 HA regardless of NA. The hemagglutination activity of rH20N20 was less inhibited by egg white and mouse lung extract than that of other recombinant viruses. Thus, the increased pathogenicity of 01310-E20 may be related to both higher replication efficiency and resistance to non-specific inhibitors in mice.

Keywords: avian influenza virus, egg adaptation, H9N2, N-glycosylation, stalk deletion of neuraminidase

Procedia PDF Downloads 267
1471 Teaching about Justice With Justice: How Using Experiential, Learner Centered Literacy Methodology Enhances Learning of Justice Related Competencies for Young Children

Authors: Bruna Azzari Puga, Richard Roe, Andre Pagani de Souza

Abstract:

abstract outlines a proposed study to examine how and to what extent interactive, experiential, learner centered methodology develops learning of basic civic and democratic competencies among young children. It stems from the Literacy and Law course taught at Georgetown University Law Center in Washington, DC, since 1998. Law students, trained in best literacy practices and legal cases affecting literacy development, read “law related” children’s books and engage in interactive and extension activities with emerging readers. The law students write a monthly journal describing their experiences and a final paper: a conventional paper or a children’s book illuminating some aspect of literacy and law. This proposal is based on the recent adaptation of Literacy and Law to Brazil at Mackenzie Presbyterian University in São Paulo in three forms: first, a course similar to the US model, often conducted jointly online with Brazilian and US law students; second, a similar course that combines readings of children’s literature with activity based learning, with law students from a satellite Mackenzie campus, for young children from a vulnerable community near the city; and third, a course taught by law students at the main Mackenzie campus for 4th grade students at the Mackenzie elementary school, that is wholly activity and discourse based. The workings and outcomes of these courses are well documented by photographs, reports, lesson plans, and law student journals. The authors, faculty who teach the above courses at Mackenzie and Georgetown, observe that literacy, broadly defined as cognitive and expressive development through reading and discourse-based activities, can be influential in developing democratic civic skills, identifiable by explicit civic competencies. For example, children experience justice in the classroom through cooperation, creativity, diversity, fairness, systemic thinking, and appreciation for rules and their purposes. Moreover, the learning of civic skills as well as the literacy skills is enhanced through interactive, learner centered practices in which the learners experience literacy and civic development. This study will develop rubrics for individual and classroom teaching and supervision by examining 1) the children’s books and students diaries of participating law students and 2) the collection of photos and videos of classroom activities, and 3) faculty and supervisor observations and reports. These rubrics, and the lesson plans and activities which are employed to advance the higher levels of performance outcomes, will be useful in training and supervision and in further replication and promotion of this form of teaching and learning. Examples of outcomes include helping, cooperating and participating; appreciation of viewpoint diversity; knowledge and utilization of democratic processes, including due process, advocacy, individual and shared decision making, consensus building, and voting; establishing and valuing appropriate rules and a reasoned approach to conflict resolution. In conclusion, further development and replication of the learner centered literacy and law practices outlined here can lead to improved qualities of democratic teaching and learning supporting mutual respect, positivity, deep learning, and the common good – foundation qualities of a sustainable world.

Keywords: democracy, law, learner-centered, literacy

Procedia PDF Downloads 91
1470 Hindi Speech Synthesis by Concatenation of Recognized Hand Written Devnagri Script Using Support Vector Machines Classifier

Authors: Saurabh Farkya, Govinda Surampudi

Abstract:

Optical Character Recognition is one of the current major research areas. This paper is focussed on recognition of Devanagari script and its sound generation. This Paper consists of two parts. First, Optical Character Recognition of Devnagari handwritten Script. Second, speech synthesis of the recognized text. This paper shows an implementation of support vector machines for the purpose of Devnagari Script recognition. The Support Vector Machines was trained with Multi Domain features; Transform Domain and Spatial Domain or Structural Domain feature. Transform Domain includes the wavelet feature of the character. Structural Domain consists of Distance Profile feature and Gradient feature. The Segmentation of the text document has been done in 3 levels-Line Segmentation, Word Segmentation, and Character Segmentation. The pre-processing of the characters has been done with the help of various Morphological operations-Otsu's Algorithm, Erosion, Dilation, Filtration and Thinning techniques. The Algorithm was tested on the self-prepared database, a collection of various handwriting. Further, Unicode was used to convert recognized Devnagari text into understandable computer document. The document so obtained is an array of codes which was used to generate digitized text and to synthesize Hindi speech. Phonemes from the self-prepared database were used to generate the speech of the scanned document using concatenation technique.

Keywords: Character Recognition (OCR), Text to Speech (TTS), Support Vector Machines (SVM), Library of Support Vector Machines (LIBSVM)

Procedia PDF Downloads 470
1469 Local Binary Patterns-Based Statistical Data Analysis for Accurate Soccer Match Prediction

Authors: Mohammad Ghahramani, Fahimeh Saei Manesh

Abstract:

Winning a soccer game is based on thorough and deep analysis of the ongoing match. On the other hand, giant gambling companies are in vital need of such analysis to reduce their loss against their customers. In this research work, we perform deep, real-time analysis on every soccer match around the world that distinguishes our work from others by focusing on particular seasons, teams and partial analytics. Our contributions are presented in the platform called “Analyst Masters.” First, we introduce various sources of information available for soccer analysis for teams around the world that helped us record live statistical data and information from more than 50,000 soccer matches a year. Our second and main contribution is to introduce our proposed in-play performance evaluation. The third contribution is developing new features from stable soccer matches. The statistics of soccer matches and their odds before and in-play are considered in the image format versus time including the halftime. Local Binary patterns, (LBP) is then employed to extract features from the image. Our analyses reveal incredibly interesting features and rules if a soccer match has reached enough stability. For example, our “8-minute rule” implies if 'Team A' scores a goal and can maintain the result for at least 8 minutes then the match would end in their favor in a stable match. We could also make accurate predictions before the match of scoring less/more than 2.5 goals. We benefit from the Gradient Boosting Trees, GBT, to extract highly related features. Once the features are selected from this pool of data, the Decision trees decide if the match is stable. A stable match is then passed to a post-processing stage to check its properties such as betters’ and punters’ behavior and its statistical data to issue the prediction. The proposed method was trained using 140,000 soccer matches and tested on more than 100,000 samples achieving 98% accuracy to select stable matches. Our database from 240,000 matches shows that one can get over 20% betting profit per month using Analyst Masters. Such consistent profit outperforms human experts and shows the inefficiency of the betting market. Top soccer tipsters achieve 50% accuracy and 8% monthly profit in average only on regional matches. Both our collected database of more than 240,000 soccer matches from 2012 and our algorithm would greatly benefit coaches and punters to get accurate analysis.

Keywords: soccer, analytics, machine learning, database

Procedia PDF Downloads 214
1468 Sex Differentiation of Elm Nymphalid (Nymphalis polychloros Linnaeus, 1758) on Pupal Stage

Authors: Hanife Genç

Abstract:

This study was conducted to determine sex differentiation of laboratory reared Elm nymphalid (Nymphalis polychloros Linnaeus, 1758) by examining the morphological structure of pupal stage. Laboratory colony of elm nymphalid, reared on pear leaves, were used to set up experiments. It was performed with 5 replications having 8 pupae for each replication. Dorsal, ventral and lateral parts of external morphological structures of pupae were examined by Olympus SZX9 microscope and photographed. When fully grown, mature larvae wander the highest part of the rearing cage and pupae were formed hanging by cremaster. After completing prepupa stage about 1.5±0.3 days, they all pupated. Pupal stage was completed at 25±1°C about 4.38±1.20 days. Pupal weights were 0.483±0.05 g in females and 0.392±0.08 g (n=40) in males respectively. Pupal emergence rate was 95%, with 22 females and 16 males. Examinations of ventral parts of 8th, 9th, and 10th abdominal segments revealed that anal opening were found at 10th abdominal segment in both sexes, 3 lumbs were determined at 9th abdominal segments then the specific opening structure at 8th segment was only found on female pupae.

Keywords: sex differentiation, Nymphalis polychloros, pupa, Linnaeus

Procedia PDF Downloads 208
1467 Relationship Between Health Coverage and Emergency Disease Burden

Authors: Karim Hajjar, Luis Lillo, Diego Martinez, Manuel Hermosilla, Nicholas Risko

Abstract:

Objectives: This study examines the relationship between universal health coverage (UCH) and the burden of emergency diseases at a global level. Methods: Data on Disability-Adjusted Life Years (DALYs) from emergency conditions were extracted from the Institute for Health Metrics and Evaluation (IHME) database for the years 2015 and 2019. Data on UHC, measured using two variables, 1) coverage of essential health services and 2) proportion of population spending more than 10% of household income on out-of-pocket health care expenditure, was extracted from the World Bank Database for years preceding our outcome of interest. Linear regression was performed, analyzing the effect of the UHC variables on the DALYs of emergency diseases, controlling for other variables. Results: A total of 133 countries were included. 44.4% of the analyzed countries had coverage of essential health services index of at least 70/100, and 35.3% had at least 10% of their population spend greater than 10% of their household income on healthcare. For every point increase in the coverage of essential health services index, there was a 13-point reduction in DALYs of emergency medical diseases (95% CI -16, -11). Conversely, for every percent decrease in the population with large household expenditure on healthcare, there was a 0.48 increase in DALYs of emergency medical diseases (95% CI -5.6, 4.7). Conclusions: After adjusting for multiple variables, an increase in coverage of essential health services was significantly associated with improvement in DALYs for emergency conditions. There was, however, no association between catastrophic health expenditure and DALYs.

Keywords: emergency medicine, universal healthcare, global health, health economics

Procedia PDF Downloads 63
1466 Efficacy of Plant and Mushroom Based Bio-Products against the Red Poultry Mite, Dermanyssus gallinae (Mesostigmata: Dermanyssidae)

Authors: Muhammad Asif Qayyoum, Bilal Saeed Khan

Abstract:

Poultry red mites (Dermanyssus gallinae De Geer) are economically deleterious parasite of hens in poultry industry in all over the world. Due to lack of proper control managements and result of poor application of commercial products, D. gallinae get resistance and severe infestation in poultry birds. Laboratory experiment was planned for the control of D. gallinae by using different mushroom and plant extracts. We used control treatment (100 ml distilled water) and nine treatments (10 gr Lentinula adobas, Ganoderma lucidum and Pleurotus aryngii with 100 ml methanol, 1% and 2% Neemazal, 1.5% Gamma-T-ol, Echinacea Leaf , 1.5% Fungatol with neem spray and Methanol) with five replication having five mites each. Data collected after 12 and 24 hours every day till mites found dead in every treatment. The significant differences among the mean values were compared with the DUNCAN multiple range test. The efficacy (%) of each treatment was determined with the Abbott formula. All statistical analyses were conducted with the SPSS Version 12 program. Lentinula edodes (80%), Ganoderma lucidum (76%) and Fungatol+Neem spray (1.5%) (80%) were significant against D. gallinae within 3 days.

Keywords: mushroom extracts, plant extracts, D. gallinae, control

Procedia PDF Downloads 280
1465 Improve B-Tree Index’s Performance Using Lock-Free Hash Table

Authors: Zhanfeng Ma, Zhiping Xiong, Hu Yin, Zhengwei She, Aditya P. Gurajada, Tianlun Chen, Ying Li

Abstract:

Many RDBMS vendors use B-tree index to achieve high performance for point queries and range queries, and some of them also employ hash index to further enhance the performance as hash table is more efficient for point queries. However, there are extra overheads to maintain a separate hash index, for example, hash mapping for all data records must always be maintained, which results in more memory space consumption; locking, logging and other mechanisms are needed to guarantee ACID, which affects the concurrency and scalability of the system. To relieve the overheads, Hash Cached B-tree (HCB) index is proposed in this paper, which consists of a standard disk-based B-tree index and an additional in-memory lock-free hash table. Initially, only the B-tree index is constructed for all data records, the hash table is built on the fly based on runtime workload, only data records accessed by point queries are indexed using hash table, this helps reduce the memory footprint. Changes to hash table are done using compare-and-swap (CAS) without performing locking and logging, this helps improve the concurrency and avoid contention. The hash table is also optimized to be cache conscious. HCB index is implemented in SAP ASE database, compared with the standard B-tree index, early experiments and customer adoptions show significant performance improvement. This paper provides an overview of the design of HCB index and reports the experimental results.

Keywords: B-tree, compare-and-swap, lock-free hash table, point queries, range queries, SAP ASE database

Procedia PDF Downloads 261
1464 A Comparative Study of the Impact of Membership in International Climate Change Treaties and the Environmental Kuznets Curve (EKC) in Line with Sustainable Development Theories

Authors: Mojtaba Taheri, Saied Reza Ameli

Abstract:

In this research, we have calculated the effect of membership in international climate change treaties for 20 developed countries based on the human development index (HDI) and compared this effect with the process of pollutant reduction in the Environmental Kuznets Curve (EKC) theory. For this purpose, the data related to The real GDP per capita with 2010 constant prices is selected from the World Development Indicators (WDI) database. Ecological Footprint (ECOFP) is the amount of biologically productive land needed to meet human needs and absorb carbon dioxide emissions. It is measured in global hectares (gha), and the data retrieved from the Global Ecological Footprint (2021) database will be used, and we will proceed by examining step by step and performing several series of targeted statistical regressions. We will examine the effects of different control variables, including Energy Consumption Structure (ECS) will be counted as the share of fossil fuel consumption in total energy consumption and will be extracted from The United States Energy Information Administration (EIA) (2021) database. Energy Production (EP) refers to the total production of primary energy by all energy-producing enterprises in one country at a specific time. It is a comprehensive indicator that shows the capacity of energy production in the country, and the data for its 2021 version, like the Energy Consumption Structure, is obtained from (EIA). Financial development (FND) is defined as the ratio of private credit to GDP, and to some extent based on the stock market value, also as a ratio to GDP, and is taken from the (WDI) 2021 version. Trade Openness (TRD) is the sum of exports and imports of goods and services measured as a share of GDP, and we use the (WDI) data (2021) version. Urbanization (URB) is defined as the share of the urban population in the total population, and for this data, we used the (WDI) data source (2021) version. The descriptive statistics of all the investigated variables are presented in the results section. Related to the theories of sustainable development, Environmental Kuznets Curve (EKC) is more significant in the period of study. In this research, we use more than fourteen targeted statistical regressions to purify the net effects of each of the approaches and examine the results.

Keywords: climate change, globalization, environmental economics, sustainable development, international climate treaty

Procedia PDF Downloads 44
1463 A Robust Spatial Feature Extraction Method for Facial Expression Recognition

Authors: H. G. C. P. Dinesh, G. Tharshini, M. P. B. Ekanayake, G. M. R. I. Godaliyadda

Abstract:

This paper presents a new spatial feature extraction method based on principle component analysis (PCA) and Fisher Discernment Analysis (FDA) for facial expression recognition. It not only extracts reliable features for classification, but also reduces the feature space dimensions of pattern samples. In this method, first each gray scale image is considered in its entirety as the measurement matrix. Then, principle components (PCs) of row vectors of this matrix and variance of these row vectors along PCs are estimated. Therefore, this method would ensure the preservation of spatial information of the facial image. Afterwards, by incorporating the spectral information of the eigen-filters derived from the PCs, a feature vector was constructed, for a given image. Finally, FDA was used to define a set of basis in a reduced dimension subspace such that the optimal clustering is achieved. The method of FDA defines an inter-class scatter matrix and intra-class scatter matrix to enhance the compactness of each cluster while maximizing the distance between cluster marginal points. In order to matching the test image with the training set, a cosine similarity based Bayesian classification was used. The proposed method was tested on the Cohn-Kanade database and JAFFE database. It was observed that the proposed method which incorporates spatial information to construct an optimal feature space outperforms the standard PCA and FDA based methods.

Keywords: facial expression recognition, principle component analysis (PCA), fisher discernment analysis (FDA), eigen-filter, cosine similarity, bayesian classifier, f-measure

Procedia PDF Downloads 405
1462 Epigenetic Mechanisms Involved in the Occurrence and Development of Infectious Diseases

Authors: Frank Boris Feutmba Keutchou, Saurelle Fabienne Bieghan Same, Verelle Elsa Fogang Pokam, Charles Ursula Metapi Meikeu, Angel Marilyne Messop Nzomo, Ousman Tamgue

Abstract:

Infectious diseases are one of the most important causes of morbidity and mortality worldwide. These diseases are caused by micro-pathogenic organisms, such as bacteria, viruses, parasites, and fungi. Heritable changes in gene expression that do not involve changes to the underlying DNA sequence are referred to as epigenetics. Emerging evidence suggests that epigenetic mechanisms are important in the emergence and progression of infectious diseases. Pathogens can manipulate host epigenetic machinery to promote their own replication and evade immune responses. The Human Genome Project has provided new opportunities for developing better tools for the diagnosis and identification of target genes. Several epigenetic modifications, such as DNA methylation, histone modifications, and non-coding RNA expression, have been shown to influence infectious disease outcomes. Understanding the epigenetic mechanisms underlying infectious diseases may result in the progression of new therapeutic approaches focusing on host-pathogen interactions. The goal of this study is to show how different infectious agents interact with host cells after infection.

Keywords: epigenetic, infectious disease, micro-pathogenic organism, phenotype

Procedia PDF Downloads 53
1461 A Framework for Secure Information Flow Analysis in Web Applications

Authors: Ralph Adaimy, Wassim El-Hajj, Ghassen Ben Brahim, Hazem Hajj, Haidar Safa

Abstract:

Huge amounts of data and personal information are being sent to and retrieved from web applications on daily basis. Every application has its own confidentiality and integrity policies. Violating these policies can have broad negative impact on the involved company’s financial status, while enforcing them is very hard even for the developers with good security background. In this paper, we propose a framework that enforces security-by-construction in web applications. Minimal developer effort is required, in a sense that the developer only needs to annotate database attributes by a security class. The web application code is then converted into an intermediary representation, called Extended Program Dependence Graph (EPDG). Using the EPDG, the provided annotations are propagated to the application code and run against generic security enforcement rules that were carefully designed to detect insecure information flows as early as they occur. As a result, any violation in the data’s confidentiality or integrity policies is reported. As a proof of concept, two PHP web applications, Hotel Reservation and Auction, were used for testing and validation. The proposed system was able to catch all the existing insecure information flows at their source. Moreover and to highlight the simplicity of the suggested approaches vs. existing approaches, two professional web developers assessed the annotation tasks needed in the presented case studies and provided a very positive feedback on the simplicity of the annotation task.

Keywords: web applications security, secure information flow, program dependence graph, database annotation

Procedia PDF Downloads 444
1460 TARF: Web Toolkit for Annotating RNA-Related Genomic Features

Authors: Jialin Ma, Jia Meng

Abstract:

Genomic features, the genome-based coordinates, are commonly used for the representation of biological features such as genes, RNA transcripts and transcription factor binding sites. For the analysis of RNA-related genomic features, such as RNA modification sites, a common task is to correlate these features with transcript components (5'UTR, CDS, 3'UTR) to explore their distribution characteristics in terms of transcriptomic coordinates, e.g., to examine whether a specific type of biological feature is enriched near transcription start sites. Existing approaches for performing these tasks involve the manipulation of a gene database, conversion from genome-based coordinate to transcript-based coordinate, and visualization methods that are capable of showing RNA transcript components and distribution of the features. These steps are complicated and time consuming, and this is especially true for researchers who are not familiar with relevant tools. To overcome this obstacle, we develop a dedicated web app TARF, which represents web toolkit for annotating RNA-related genomic features. TARF web tool intends to provide a web-based way to easily annotate and visualize RNA-related genomic features. Once a user has uploaded the features with BED format and specified a built-in transcript database or uploaded a customized gene database with GTF format, the tool could fulfill its three main functions. First, it adds annotation on gene and RNA transcript components. For every features provided by the user, the overlapping with RNA transcript components are identified, and the information is combined in one table which is available for copy and download. Summary statistics about ambiguous belongings are also carried out. Second, the tool provides a convenient visualization method of the features on single gene/transcript level. For the selected gene, the tool shows the features with gene model on genome-based view, and also maps the features to transcript-based coordinate and show the distribution against one single spliced RNA transcript. Third, a global transcriptomic view of the genomic features is generated utilizing the Guitar R/Bioconductor package. The distribution of features on RNA transcripts are normalized with respect to RNA transcript landmarks and the enrichment of the features on different RNA transcript components is demonstrated. We tested the newly developed TARF toolkit with 3 different types of genomics features related to chromatin H3K4me3, RNA N6-methyladenosine (m6A) and RNA 5-methylcytosine (m5C), which are obtained from ChIP-Seq, MeRIP-Seq and RNA BS-Seq data, respectively. TARF successfully revealed their respective distribution characteristics, i.e. H3K4me3, m6A and m5C are enriched near transcription starting sites, stop codons and 5’UTRs, respectively. Overall, TARF is a useful web toolkit for annotation and visualization of RNA-related genomic features, and should help simplify the analysis of various RNA-related genomic features, especially those related RNA modifications.

Keywords: RNA-related genomic features, annotation, visualization, web server

Procedia PDF Downloads 183
1459 Iris Cancer Detection System Using Image Processing and Neural Classifier

Authors: Abdulkader Helwan

Abstract:

Iris cancer, so called intraocular melanoma is a cancer that starts in the iris; the colored part of the eye that surrounds the pupil. There is a need for an accurate and cost-effective iris cancer detection system since the available techniques used currently are still not efficient. The combination of the image processing and artificial neural networks has a great efficiency for the diagnosis and detection of the iris cancer. Image processing techniques improve the diagnosis of the cancer by enhancing the quality of the images, so the physicians diagnose properly. However, neural networks can help in making decision; whether the eye is cancerous or not. This paper aims to develop an intelligent system that stimulates a human visual detection of the intraocular melanoma, so called iris cancer. The suggested system combines both image processing techniques and neural networks. The images are first converted to grayscale, filtered, and then segmented using prewitt edge detection algorithm to detect the iris, sclera circles and the cancer. The principal component analysis is used to reduce the image size and for extracting features. Those features are considered then as inputs for a neural network which is capable of deciding if the eye is cancerous or not, throughout its experience adopted by many training iterations of different normal and abnormal eye images during the training phase. Normal images are obtained from a public database available on the internet, “Mile Research”, while the abnormal ones are obtained from another database which is the “eyecancer”. The experimental results for the proposed system show high accuracy 100% for detecting cancer and making the right decision.

Keywords: iris cancer, intraocular melanoma, cancerous, prewitt edge detection algorithm, sclera

Procedia PDF Downloads 477
1458 Effects of Nitrogen and Arsenic on Antioxidant Enzyme Activities and Photosynthetic Pigments in Safflower (Carthamus tinctorius L.)

Authors: Mostafa Heidari

Abstract:

Nitrogen fertilization has played a significant role in increasing crop yield, and solving problems of hunger and malnutrition worldwide. However, excessive of heavy metals such as arsenic can interfere on growth and reduced grain yield. In order to investigate the effects of different concentrations of arsenic and nitrogen fertilizer on photosynthetic pigments and antioxidant enzyme activities in safflower (cv. Goldasht), a factorial plot experiment as randomized complete block design with three replication was conducted in university of Zabol. Arsenic treatment included: A1= control or 0, A2=30, A3=60 and A4=90 mg. kg-1 soil from the Na2HASO4 source and three nitrogen levels including W1=75, W2=150 and W3=225 kg.ha-1 from urea source. Results showed that, arsenic had a significant effect on the activity of antioxidant enzymes. By increasing arsenic levels from A1 to A4, the activity of ascorbate peroxidase (APX) and gayacol peroxidase (GPX) increased and catalase (CAT) was decreased. In this study, arsenic had no significant on chlorophyll a, b and cartoneid content. Nitrogen and interaction between arsenic and nitrogen treatment, except APX, had significant effect on CAT and GPX. The highest GPX activity was obtained at A4N3 treatment. Nitrogen increased the content of chlorophyll a, b and cartoneid.

Keywords: arsenic, physiological parameters, oxidative enzymes, nitrogen

Procedia PDF Downloads 416
1457 Reducing Flood Risk in a Megacity: Using Mobile Application and Value Capture for Flood Risk Prevention and Risk Reduction Financing

Authors: Dedjo Yao Simon, Takahiro Saito, Norikazu Inuzuka, Ikuo Sugiyama

Abstract:

The megacity of Abidjan is a coastal urban area where the number of floods reported and the associated impacts are on a rapid increase due to climate change, an uncontrolled urbanization, a rapid population increase, a lack of flood disaster mitigation and citizens’ awareness. The objective of this research is to reduce in the short and long term period, the human and socio-economic impact of the flood. Hydrological simulation is applied on free of charge global spatial data (digital elevation model, satellite-based rainfall estimate, landuse) to identify the flood-prone area and to map the risk of flood. A direct interview to a sample residents is used to validate the simulation results. Then a mobile application (Flood Locator) is prototyped to disseminate the risk information to the citizen. In addition, a value capture strategy is proposed to mobilize financial resource for disaster risk reduction (DRRf) to reduce the impact of the flood. The town of Cocody in Abidjan is selected as a case study area to implement this research. The mapping of the flood risk reveals that population living in the study area is highly vulnerable. For a 5-year flood, more than 60% of the floodplain is affected by a water depth of at least 0.5 meters; and more than 1000 ha with at least 5000 buildings are directly exposed. The risk becomes higher for a 50 and 100-year floods. Also, the interview reveals that the majority of the citizen are not aware of the risk and severity of flooding in their community. This shortage of information is overcome by the Flood Locator and by an urban flood database we prototype for accumulate flood data. Flood Locator App allows the users to view floodplain and depth on a digital map; the user can activate the GPS sensor of the mobile to visualize his location on the map. Some more important additional features allow the citizen user to capture flood events and damage information that they can send remotely to the database. Also, the disclosure of the risk information could result to a decrement (-14%) of the value of properties locate inside floodplain and an increment (+19%) of the value of property in the suburb area. The tax increment due to the higher tax increment in the safer area should be captured to constitute the DRRf. The fund should be allocated to the reduction of flood risk for the benefit of people living in flood-prone areas. The flood prevention system discusses in this research will minimize in the short and long term the direct damages in the risky area due to effective awareness of citizen and the availability of DRRf. It will also contribute to the growth of the urban area in the safer zone and reduce human settlement in the risky area in the long term. Data accumulated in the urban flood database through the warning app will contribute to regenerate Abidjan towards the more resilient city by means of risk avoidable landuse in the master plan.

Keywords: abidjan, database, flood, geospatial techniques, risk communication, smartphone, value capture

Procedia PDF Downloads 257
1456 The Impact of Prior Cancer History on the Prognosis of Salivary Gland Cancer Patients: A Population-based Study from the Surveillance, Epidemiology, and End Results (SEER) Database

Authors: Junhong Li, Danni Cheng, Yaxin Luo, Xiaowei Yi, Ke Qiu, Wendu Pang, Minzi Mao, Yufang Rao, Yao Song, Jianjun Ren, Yu Zhao

Abstract:

Background: The number of multiple cancer patients was increasing, and the impact of prior cancer history on salivary gland cancer patients remains unclear. Methods: Clinical, demographic and pathological information on salivary gland cancer patients were retrospectively collected from the Surveillance, Epidemiology, and End Results (SEER) database from 2004 to 2017, and the characteristics and prognosis between patients with a prior cancer and those without prior caner were compared. Univariate and multivariate cox proportional regression models were used for the analysis of prognosis. A risk score model was established to exam the impact of treatment on patients with a prior cancer in different risk groups. Results: A total of 9098 salivary gland cancer patients were identified, and 1635 of them had a prior cancer history. Salivary gland cancer patients with prior cancer had worse survival compared with those without a prior cancer (p<0.001). Patients with a different type of first cancer had a distinct prognosis (p<0.001), and longer latent time was associated with better survival (p=0.006) in the univariate model, although both became nonsignificant in the multivariate model. Salivary gland cancer patients with a prior cancer were divided into low-risk (n= 321), intermediate-risk (n=223), and high-risk (n=62) groups and the results showed that patients at high risk could benefit from surgery, radiation therapy, and chemotherapy, and those at intermediate risk could benefit from surgery. Conclusion: Prior cancer history had an adverse impact on the survival of salivary gland cancer patients, and individualized treatment should be seriously considered for them.

Keywords: prior cancer history, prognosis, salivary gland cancer, SEER

Procedia PDF Downloads 121
1455 Biofilm Text Classifiers Developed Using Natural Language Processing and Unsupervised Learning Approach

Authors: Kanika Gupta, Ashok Kumar

Abstract:

Biofilms are dense, highly hydrated cell clusters that are irreversibly attached to a substratum, to an interface or to each other, and are embedded in a self-produced gelatinous matrix composed of extracellular polymeric substances. Research in biofilm field has become very significant, as biofilm has shown high mechanical resilience and resistance to antibiotic treatment and constituted as a significant problem in both healthcare and other industry related to microorganisms. The massive information both stated and hidden in the biofilm literature are growing exponentially therefore it is not possible for researchers and practitioners to automatically extract and relate information from different written resources. So, the current work proposes and discusses the use of text mining techniques for the extraction of information from biofilm literature corpora containing 34306 documents. It is very difficult and expensive to obtain annotated material for biomedical literature as the literature is unstructured i.e. free-text. Therefore, we considered unsupervised approach, where no annotated training is necessary and using this approach we developed a system that will classify the text on the basis of growth and development, drug effects, radiation effects, classification and physiology of biofilms. For this, a two-step structure was used where the first step is to extract keywords from the biofilm literature using a metathesaurus and standard natural language processing tools like Rapid Miner_v5.3 and the second step is to discover relations between the genes extracted from the whole set of biofilm literature using pubmed.mineR_v1.0.11. We used unsupervised approach, which is the machine learning task of inferring a function to describe hidden structure from 'unlabeled' data, in the above-extracted datasets to develop classifiers using WinPython-64 bit_v3.5.4.0Qt5 and R studio_v0.99.467 packages which will automatically classify the text by using the mentioned sets. The developed classifiers were tested on a large data set of biofilm literature which showed that the unsupervised approach proposed is promising as well as suited for a semi-automatic labeling of the extracted relations. The entire information was stored in the relational database which was hosted locally on the server. The generated biofilm vocabulary and genes relations will be significant for researchers dealing with biofilm research, making their search easy and efficient as the keywords and genes could be directly mapped with the documents used for database development.

Keywords: biofilms literature, classifiers development, text mining, unsupervised learning approach, unstructured data, relational database

Procedia PDF Downloads 144