Search results for: semantic data profiling
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25674

Search results for: semantic data profiling

24954 Indexing and Incremental Approach Using Map Reduce Bipartite Graph (MRBG) for Mining Evolving Big Data

Authors: Adarsh Shroff

Abstract:

Big data is a collection of dataset so large and complex that it becomes difficult to process using data base management tools. To perform operations like search, analysis, visualization on big data by using data mining; which is the process of extraction of patterns or knowledge from large data set. In recent years, the data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. This project uses i2MapReduce, an incremental processing extension to Map Reduce, the most widely used framework for mining big data. I2MapReduce performs key-value pair level incremental processing rather than task level re-computation, supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. To optimize the mining results, evaluate i2MapReduce using a one-step algorithm and three iterative algorithms with diverse computation characteristics for efficient mining.

Keywords: big data, map reduce, incremental processing, iterative computation

Procedia PDF Downloads 353
24953 Exploiting Identity Grievances: Al-Shabaab Propaganda Targeting Individuals Abroad

Authors: Mustafa Mabruk

Abstract:

Groups such as Al-Shabaab have managed to radicalize many individuals abroad, including the first American citizen to ever be radicalized. Yet the pathways of radicalization for these foreign individuals are understudied. Moreover, current measures to prevent foreign radicalization are ineffective, with privacy, screening and profiling implications that render current counter-radicalization efforts counterproductive. Such measures exhibit strictness, political bias, and harshness. As confirmed by recent studies, such counter-radicalization issues exacerbate existing grievances and channel fresh recruits to Al-Shabaab. Addressing these challenges is paramount, requiring alternative strategies to effectively reduce radicalization without triggering further harm. The development of counter-narratives emerges as a potential measure with minimal risk of exacerbating grievances, yet the development of such counter-narratives necessitates a thorough understanding of the radicalization pathways of foreign individuals that are understudied. This study investigates the success of Al-Shabaab in recruiting individuals abroad by analyzing their propaganda in conjunction with analyzing identity-focused theories of radicalization, including Framing Theory and Social Identity Theory. Qualitative content analysis is used to analyze various propaganda material, including tweets, speeches, and webpages. The analysis reveals that issues of identity are of major significance in the radicalization patterns identified and that grievances of Muslims worldwide are used to exploit identity-related grievances. Based on these findings, the paper argues that such evidence enhances our understanding of potential deradicalization pathways and present counter-narratives based on Islamic scripture.

Keywords: counter-narratives, foreign radicalization, identity grievances, propaganda analysis

Procedia PDF Downloads 43
24952 Analyzing Large Scale Recurrent Event Data with a Divide-And-Conquer Approach

Authors: Jerry Q. Cheng

Abstract:

Currently, in analyzing large-scale recurrent event data, there are many challenges such as memory limitations, unscalable computing time, etc. In this research, a divide-and-conquer method is proposed using parametric frailty models. Specifically, the data is randomly divided into many subsets, and the maximum likelihood estimator from each individual data set is obtained. Then a weighted method is proposed to combine these individual estimators as the final estimator. It is shown that this divide-and-conquer estimator is asymptotically equivalent to the estimator based on the full data. Simulation studies are conducted to demonstrate the performance of this proposed method. This approach is applied to a large real dataset of repeated heart failure hospitalizations.

Keywords: big data analytics, divide-and-conquer, recurrent event data, statistical computing

Procedia PDF Downloads 167
24951 Feature-Based Summarizing and Ranking from Customer Reviews

Authors: Dim En Nyaung, Thin Lai Lai Thein

Abstract:

Due to the rapid increase of Internet, web opinion sources dynamically emerge which is useful for both potential customers and product manufacturers for prediction and decision purposes. These are the user generated contents written in natural languages and are unstructured-free-texts scheme. Therefore, opinion mining techniques become popular to automatically process customer reviews for extracting product features and user opinions expressed over them. Since customer reviews may contain both opinionated and factual sentences, a supervised machine learning technique applies for subjectivity classification to improve the mining performance. In this paper, we dedicate our work is the task of opinion summarization. Therefore, product feature and opinion extraction is critical to opinion summarization, because its effectiveness significantly affects the identification of semantic relationships. The polarity and numeric score of all the features are determined by Senti-WordNet Lexicon. The problem of opinion summarization refers how to relate the opinion words with respect to a certain feature. Probabilistic based model of supervised learning will improve the result that is more flexible and effective.

Keywords: opinion mining, opinion summarization, sentiment analysis, text mining

Procedia PDF Downloads 332
24950 Mutation Profiling of Paediatric Solid Tumours in a Cohort of South African Patients

Authors: L. Lamola, E. Manolas, A. Krause

Abstract:

Background: The incidence of childhood cancer incidence is increasing gradually in low-middle income countries, such as South Africa. Globally, there is an extensive range of familial- and hereditary-cancer syndromes, where underlying germline variants increase the likelihood of developing cancer in childhood. Next-Generation Sequencing (NGS) technologies have been key in determining the occurrence and genetic contribution of germline variants to paediatric cancer development. We aimed to design and evaluate a candidate gene panel specific to inherited cancer-predisposing genes to provide a comprehensive insight into the contribution of germline variants to childhood cancer. Methods: 32 paediatric patients (aged 0-18 years) diagnosed with a malignant tumour were recruited, and biological samples were obtained. After quality control, DNA was sequenced using an ion Ampliseq 50 candidate gene panel design and Ion Torrent S5 technologies. Sequencing variants were called using Ion Torrent Suite software and were subsequently annotated using Ion Reporter and Ensembl's VEP. High priority variants were manually analysed using tools such as MutationTaster, SIFT-INDEL and VarSome. Putative identified candidates were validated via Sanger Sequencing. Results: The patients studied had a variety of cancers, the most common being nephroblastoma (13), followed by osteosarcoma (4) and astrocytoma (3). We identified 10 pathogenic / likely pathogenic variants in 10 patients, most of which were novel. Conclusions: According to the literature, we expected ~10% of our patient population to harbour pathogenic or likely pathogenic germline variants, however, we reported about 3 times (~30%) more than we expected. Majority of the identified variants are novel; this may be because this is the first study of its kind in an understudied South African population.

Keywords: Africa, genetics, germline-variants, paediatric-cancer

Procedia PDF Downloads 139
24949 Adoption of Big Data by Global Chemical Industries

Authors: Ashiff Khan, A. Seetharaman, Abhijit Dasgupta

Abstract:

The new era of big data (BD) is influencing chemical industries tremendously, providing several opportunities to reshape the way they operate and help them shift towards intelligent manufacturing. Given the availability of free software and the large amount of real-time data generated and stored in process plants, chemical industries are still in the early stages of big data adoption. The industry is just starting to realize the importance of the large amount of data it owns to make the right decisions and support its strategies. This article explores the importance of professional competencies and data science that influence BD in chemical industries to help it move towards intelligent manufacturing fast and reliable. This article utilizes a literature review and identifies potential applications in the chemical industry to move from conventional methods to a data-driven approach. The scope of this document is limited to the adoption of BD in chemical industries and the variables identified in this article. To achieve this objective, government, academia, and industry must work together to overcome all present and future challenges.

Keywords: chemical engineering, big data analytics, industrial revolution, professional competence, data science

Procedia PDF Downloads 86
24948 Secure Multiparty Computations for Privacy Preserving Classifiers

Authors: M. Sumana, K. S. Hareesha

Abstract:

Secure computations are essential while performing privacy preserving data mining. Distributed privacy preserving data mining involve two to more sites that cannot pool in their data to a third party due to the violation of law regarding the individual. Hence in order to model the private data without compromising privacy and information loss, secure multiparty computations are used. Secure computations of product, mean, variance, dot product, sigmoid function using the additive and multiplicative homomorphic property is discussed. The computations are performed on vertically partitioned data with a single site holding the class value.

Keywords: homomorphic property, secure product, secure mean and variance, secure dot product, vertically partitioned data

Procedia PDF Downloads 412
24947 Transformers in Gene Expression-Based Classification

Authors: Babak Forouraghi

Abstract:

A genetic circuit is a collection of interacting genes and proteins that enable individual cells to implement and perform vital biological functions such as cell division, growth, death, and signaling. In cell engineering, synthetic gene circuits are engineered networks of genes specifically designed to implement functionalities that are not evolved by nature. These engineered networks enable scientists to tackle complex problems such as engineering cells to produce therapeutics within the patient's body, altering T cells to target cancer-related antigens for treatment, improving antibody production using engineered cells, tissue engineering, and production of genetically modified plants and livestock. Construction of computational models to realize genetic circuits is an especially challenging task since it requires the discovery of flow of genetic information in complex biological systems. Building synthetic biological models is also a time-consuming process with relatively low prediction accuracy for highly complex genetic circuits. The primary goal of this study was to investigate the utility of a pre-trained bidirectional encoder transformer that can accurately predict gene expressions in genetic circuit designs. The main reason behind using transformers is their innate ability (attention mechanism) to take account of the semantic context present in long DNA chains that are heavily dependent on spatial representation of their constituent genes. Previous approaches to gene circuit design, such as CNN and RNN architectures, are unable to capture semantic dependencies in long contexts as required in most real-world applications of synthetic biology. For instance, RNN models (LSTM, GRU), although able to learn long-term dependencies, greatly suffer from vanishing gradient and low-efficiency problem when they sequentially process past states and compresses contextual information into a bottleneck with long input sequences. In other words, these architectures are not equipped with the necessary attention mechanisms to follow a long chain of genes with thousands of tokens. To address the above-mentioned limitations of previous approaches, a transformer model was built in this work as a variation to the existing DNA Bidirectional Encoder Representations from Transformers (DNABERT) model. It is shown that the proposed transformer is capable of capturing contextual information from long input sequences with attention mechanism. In a previous work on genetic circuit design, the traditional approaches to classification and regression, such as Random Forrest, Support Vector Machine, and Artificial Neural Networks, were able to achieve reasonably high R2 accuracy levels of 0.95 to 0.97. However, the transformer model utilized in this work with its attention-based mechanism, was able to achieve a perfect accuracy level of 100%. Further, it is demonstrated that the efficiency of the transformer-based gene expression classifier is not dependent on presence of large amounts of training examples, which may be difficult to compile in many real-world gene circuit designs.

Keywords: transformers, generative ai, gene expression design, classification

Procedia PDF Downloads 60
24946 Exploring the Neural Mechanisms of Communication and Cooperation in Children and Adults

Authors: Sara Mosteller, Larissa K. Samuelson, Sobanawartiny Wijeakumar, John P. Spencer

Abstract:

This study was designed to examine how humans are able to teach and learn semantic information as well as cooperate in order to jointly achieve sophisticated goals. Specifically, we are measuring individual differences in how these abilities develop from foundational building blocks in early childhood. The current study adopts a paradigm for novel noun learning developed by Samuelson, Smith, Perry, and Spencer (2011) to a hyperscanning paradigm [Cui, Bryant and Reiss, 2012]. This project measures coordinated brain activity between a parent and child using simultaneous functional near infrared spectroscopy (fNIRS) in pairs of 2.5, 3.5 and 4.5-year-old children and their parents. We are also separately testing pairs of adult friends. Children and parents, or adult friends, are seated across from one another at a table. The parent (in the developmental study) then teaches their child the names of novel toys. An experimenter then tests the child by presenting the objects in pairs and asking the child to retrieve one object by name. Children are asked to choose from both pairs of familiar objects and pairs of novel objects. In order to explore individual differences in cooperation with the same participants, each dyad plays a cooperative game of Jenga, in which their joint score is based on how many blocks they can remove from the tower as a team. A preliminary analysis of the noun-learning task showed that, when presented with 6 word-object mappings, children learned an average of 3 new words (50%) and that the number of objects learned by each child ranged from 2-4. Adults initially learned all of the new words but were variable in their later retention of the mappings, which ranged from 50-100%. We are currently examining differences in cooperative behavior during the Jenga playing game, including time spent discussing each move before it is made. Ongoing analyses are examining the social dynamics that might underlie the differences between words that were successfully learned and unlearned words for each dyad, as well as the developmental differences observed in the study. Additionally, the Jenga game is being used to better understand individual and developmental differences in social coordination during a cooperative task. At a behavioral level, the analysis maps periods of joint visual attention between participants during the word learning and the Jenga game, using head-mounted eye trackers to assess each participant’s first-person viewpoint during the session. We are also analyzing the coherence in brain activity between participants during novel word-learning and Jenga playing. The first hypothesis is that visual joint attention during the session will be positively correlated with both the number of words learned and with the number of blocks moved during Jenga before the tower falls. The next hypothesis is that successful communication of new words and success in the game will each be positively correlated with synchronized brain activity between the parent and child/the adult friends in cortical regions underlying social cognition, semantic processing, and visual processing. This study probes both the neural and behavioral mechanisms of learning and cooperation in a naturalistic, interactive and developmental context.

Keywords: communication, cooperation, development, interaction, neuroscience

Procedia PDF Downloads 253
24945 Contextual SenSe Model: Word Sense Disambiguation using Sense and Sense Value of Context Surrounding the Target

Authors: Vishal Raj, Noorhan Abbas

Abstract:

Ambiguity in NLP (Natural language processing) refers to the ability of a word, phrase, sentence, or text to have multiple meanings. This results in various kinds of ambiguities such as lexical, syntactic, semantic, anaphoric and referential am-biguities. This study is focused mainly on solving the issue of Lexical ambiguity. Word Sense Disambiguation (WSD) is an NLP technique that aims to resolve lexical ambiguity by determining the correct meaning of a word within a given context. Most WSD solutions rely on words for training and testing, but we have used lemma and Part of Speech (POS) tokens of words for training and testing. Lemma adds generality and POS adds properties of word into token. We have designed a novel method to create an affinity matrix to calculate the affinity be-tween any pair of lemma_POS (a token where lemma and POS of word are joined by underscore) of given training set. Additionally, we have devised an al-gorithm to create the sense clusters of tokens using affinity matrix under hierar-chy of POS of lemma. Furthermore, three different mechanisms to predict the sense of target word using the affinity/similarity value are devised. Each contex-tual token contributes to the sense of target word with some value and whichever sense gets higher value becomes the sense of target word. So, contextual tokens play a key role in creating sense clusters and predicting the sense of target word, hence, the model is named Contextual SenSe Model (CSM). CSM exhibits a noteworthy simplicity and explication lucidity in contrast to contemporary deep learning models characterized by intricacy, time-intensive processes, and chal-lenging explication. CSM is trained on SemCor training data and evaluated on SemEval test dataset. The results indicate that despite the naivety of the method, it achieves promising results when compared to the Most Frequent Sense (MFS) model.

Keywords: word sense disambiguation (wsd), contextual sense model (csm), most frequent sense (mfs), part of speech (pos), natural language processing (nlp), oov (out of vocabulary), lemma_pos (a token where lemma and pos of word are joined by underscore), information retrieval (ir), machine translation (mt)

Procedia PDF Downloads 109
24944 Cross Project Software Fault Prediction at Design Phase

Authors: Pradeep Singh, Shrish Verma

Abstract:

Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. The earlier we predict the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven data sets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.

Keywords: software metrics, fault prediction, cross project, within project.

Procedia PDF Downloads 344
24943 Comparing Emotion Recognition from Voice and Facial Data Using Time Invariant Features

Authors: Vesna Kirandziska, Nevena Ackovska, Ana Madevska Bogdanova

Abstract:

The problem of emotion recognition is a challenging problem. It is still an open problem from the aspect of both intelligent systems and psychology. In this paper, both voice features and facial features are used for building an emotion recognition system. A Support Vector Machine classifiers are built by using raw data from video recordings. In this paper, the results obtained for the emotion recognition are given, and a discussion about the validity and the expressiveness of different emotions is presented. A comparison between the classifiers build from facial data only, voice data only and from the combination of both data is made here. The need for a better combination of the information from facial expression and voice data is argued.

Keywords: emotion recognition, facial recognition, signal processing, machine learning

Procedia PDF Downloads 317
24942 Cryptosystems in Asymmetric Cryptography for Securing Data on Cloud at Various Critical Levels

Authors: Sartaj Singh, Amar Singh, Ashok Sharma, Sandeep Kaur

Abstract:

With upcoming threats in a digital world, we need to work continuously in the area of security in all aspects, from hardware to software as well as data modelling. The rise in social media activities and hunger for data by various entities leads to cybercrime and more attack on the privacy and security of persons. Cryptography has always been employed to avoid access to important data by using many processes. Symmetric key and asymmetric key cryptography have been used for keeping data secrets at rest as well in transmission mode. Various cryptosystems have evolved from time to time to make the data more secure. In this research article, we are studying various cryptosystems in asymmetric cryptography and their application with usefulness, and much emphasis is given to Elliptic curve cryptography involving algebraic mathematics.

Keywords: cryptography, symmetric key cryptography, asymmetric key cryptography

Procedia PDF Downloads 124
24941 Frequency of the English Phrasal Verbs Used by Iranian Learners as a Reference to the Style of Writing Adopted by the Learners

Authors: Hamzeh Mazaherylaghab, Mehrangiz Vahabian, Seyyedeh Zahra Asghari

Abstract:

The present study initially focused on the frequency of phrasal verbs used by Iranian learners of English. The results then needed to be compared to the findings from native speaker corpora. After the extraction of phrasal verbs from learner and native-speaker corpora the findings were analysed. The results showed that Iranian learners avoided using phrasal verbs in many cases. Some of the findings proved to be significant. It was also found that the learners used the single-word counterparts of the avoided phrasal verbs to compensate for their lack of knowledge in many cases. Semantic complexity and Lack of L1 counterpart may have been the main reasons for avoidance, but despite the avoidance phenomenon, the learners displayed a tendency to use many other phrasal verbs which may have been due to the increase in the number of multi-word verbs in Persian. The overall scores confirmed the fact that the language produced by the learners illustrates signs of more formal style in comparison with the native speakers of English by using less phrasal verbs and more formal single word verbs instead.

Keywords: corpus, corpora, LOCNESS, phrasal verbs, single-word verb

Procedia PDF Downloads 203
24940 Forming Form, Motivation and Their Biolinguistic Hypothesis: The Case of Consonant Iconicity in Tashelhiyt Amazigh and English

Authors: Noury Bakrim

Abstract:

When dealing with motivation/arbitrariness, forming form (Forma Formans) and morphodynamics are to be grasped as relevant implications of enunciation/enactment, schematization within the specificity of language as sound/meaning articulation. Thus, the fact that a language is a form does not contradict stasis/dynamic enunciation (reflexivity vs double articulation). Moreover, some languages exemplify the role of the forming form, uttering, and schematization (roots in Semitic languages, the Chinese case). Beyond the evolutionary biosemiotic process (form/substance bifurcation, the split between realization/representation), non-isomorphism/asymmetry between linguistic form/norm and linguistic realization (phonetics for instance) opens up a new horizon problematizing the role of Brain – sensorimotor contribution in the continuous forming form. Therefore, we hypothesize biotization as both process/trace co-constructing motivation/forming form. Henceforth, referring to our findings concerning distribution and motivation patterns within Berber written texts (pulse based obstruents and nasal-lateral levels in poetry) and oral storytelling (consonant intensity clustering in quantitative and semantic/prosodic motivation), we understand consonant clustering, motivation and schematization as a complex phenomenon partaking in patterns of oral/written iconic prosody and reflexive metalinguistic representation opening the stable form. We focus our inquiry on both Amazigh and English clusters (/spl/, /spr/) and iconic consonant iteration in [gnunnuy] (to roll/tumble), [smummuy] (to moan sadly or crankily). For instance, the syllabic structures of /splaeʃ/ and /splaet/ imply an anamorphic representation of the state of the world: splash, impact on aquatic surfaces/splat impact on the ground. The pair has stridency and distribution as distinctive features which specify its phonetic realization (and a part of its meaning) /ʃ/ is [+ strident] and /t/ is [+ distributed] on the vocal tract. Schematization is then a process relating both physiology/code as an arthron vocal/bodily, vocal/practical shaping of the motor-articulatory system, leading to syntactic/semantic thematization (agent/patient roles in /spl/, /sm/ and other clusters or the tense uvular /qq/ at the initial position in Berber). Furthermore, the productivity of serial syllable sequencing in Berber points out different expressivity forms. We postulate two Components of motivated formalization: i) the process of memory paradigmatization relating to sequence modeling under sensorimotor/verbal specific categories (production/perception), ii) the process of phonotactic selection - prosodic unconscious/subconscious distribution by virtue of iconicity. Basing on multiple tests including a questionnaire, phonotactic/visual recognition and oral/written reproduction, we aim at patterning/conceptualizing consonant schematization and motivation among EFL and Amazigh (Berber) learners and speakers integrating biolinguistic hypotheses.

Keywords: consonant motivation and prosody, language and order of life, anamorphic representation, represented representation, biotization, sensori-motor and brain representation, form, formalization and schematization

Procedia PDF Downloads 146
24939 Data Recording for Remote Monitoring of Autonomous Vehicles

Authors: Rong-Terng Juang

Abstract:

Autonomous vehicles offer the possibility of significant benefits to social welfare. However, fully automated cars might not be going to happen in the near further. To speed the adoption of the self-driving technologies, many governments worldwide are passing laws requiring data recorders for the testing of autonomous vehicles. Currently, the self-driving vehicle, (e.g., shuttle bus) has to be monitored from a remote control center. When an autonomous vehicle encounters an unexpected driving environment, such as road construction or an obstruction, it should request assistance from a remote operator. Nevertheless, large amounts of data, including images, radar and lidar data, etc., have to be transmitted from the vehicle to the remote center. Therefore, this paper proposes a data compression method of in-vehicle networks for remote monitoring of autonomous vehicles. Firstly, the time-series data are rearranged into a multi-dimensional signal space. Upon the arrival, for controller area networks (CAN), the new data are mapped onto a time-data two-dimensional space associated with the specific CAN identity. Secondly, the data are sampled based on differential sampling. Finally, the whole set of data are encoded using existing algorithms such as Huffman, arithmetic and codebook encoding methods. To evaluate system performance, the proposed method was deployed on an in-house built autonomous vehicle. The testing results show that the amount of data can be reduced as much as 1/7 compared to the raw data.

Keywords: autonomous vehicle, data compression, remote monitoring, controller area networks (CAN), Lidar

Procedia PDF Downloads 164
24938 Next Generation Sequencing Analysis of Circulating MiRNAs in Rheumatoid Arthritis and Osteoarthritis

Authors: Khalda Amr, Noha Eltaweel, Sherif Ismail, Hala Raslan

Abstract:

Introduction: Osteoarthritis is the most common form of arthritis that involves the wearing away of the cartilage that caps the bones in the joints. While rheumatoid arthritis is an autoimmune disease in which the immune system attacks the joints, beginning with the lining of joints. In this study, we aimed to study the top deregulated miRNAs that might be the cause of pathogenesis in both diseases. Methods: Eight cases were recruited in this study: 4 rheumatoid arthritis (RA), 2 osteoarthritis (OA) patients, as well as 2 healthy controls. Total RNA was isolated from plasma to be subjected to miRNA profiling by NGS. Sequencing libraries were constructed and generated using the NEBNextR UltraTM small RNA Sample Prep Kit for Illumina R (NEB, USA), according to the manufacturer’s instructions. The quality of samples were checked using fastqc and multiQC. Results were compared RA vs Controls and OA vs. Controls. Target gene prediction and functional annotation of the deregulated miRNAs were done using Mienturnet. The top deregulated miRNAs in each disease were selected for further validation using qRT-PCR. Results: The average number of sequencing reads per sample exceeded 2.2 million, of which approximately 57% were mapped to the human reference genome. The top DEMs in RA vs controls were miR-6724-5p, miR-1469, miR-194-3p (up), miR-1468-5p, miR-486-3p (down). In comparison, the top DEMs in OA vs controls were miR-1908-3p, miR-122b-3p, miR-3960 (up), miR-1468-5p, miR-15b-3p (down). The functional enrichment of the selected top deregulated miRNAs revealed the highly enriched KEGG pathways and GO terms. Six of the deregulated miRNAs (miR-15b, -128, -194, -328, -542 and -3180) had multiple target genes in the RA pathway, so they are more likely to affect the RA pathogenesis. Conclusion: Six of our studied deregulated miRNAs (miR-15b, -128, -194, -328, -542 and -3180) might be highly involved in the disease pathogenesis. Further functional studies are crucial to assess their functions and actual target genes.

Keywords: next generation sequencing, mirnas, rheumatoid arthritis, osteoarthritis

Procedia PDF Downloads 98
24937 Multimedia Data Fusion for Event Detection in Twitter by Using Dempster-Shafer Evidence Theory

Authors: Samar M. Alqhtani, Suhuai Luo, Brian Regan

Abstract:

Data fusion technology can be the best way to extract useful information from multiple sources of data. It has been widely applied in various applications. This paper presents a data fusion approach in multimedia data for event detection in twitter by using Dempster-Shafer evidence theory. The methodology applies a mining algorithm to detect the event. There are two types of data in the fusion. The first is features extracted from text by using the bag-ofwords method which is calculated using the term frequency-inverse document frequency (TF-IDF). The second is the visual features extracted by applying scale-invariant feature transform (SIFT). The Dempster - Shafer theory of evidence is applied in order to fuse the information from these two sources. Our experiments have indicated that comparing to the approaches using individual data source, the proposed data fusion approach can increase the prediction accuracy for event detection. The experimental result showed that the proposed method achieved a high accuracy of 0.97, comparing with 0.93 with texts only, and 0.86 with images only.

Keywords: data fusion, Dempster-Shafer theory, data mining, event detection

Procedia PDF Downloads 411
24936 Legal Issues of Collecting and Processing Big Health Data in the Light of European Regulation 679/2016

Authors: Ioannis Iglezakis, Theodoros D. Trokanas, Panagiota Kiortsi

Abstract:

This paper aims to explore major legal issues arising from the collection and processing of Health Big Data in the light of the new European secondary legislation for the protection of personal data of natural persons, placing emphasis on the General Data Protection Regulation 679/2016. Whether Big Health Data can be characterised as ‘personal data’ or not is really the crux of the matter. The legal ambiguity is compounded by the fact that, even though the processing of Big Health Data is premised on the de-identification of the data subject, the possibility of a combination of Big Health Data with other data circulating freely on the web or from other data files cannot be excluded. Another key point is that the application of some provisions of GPDR to Big Health Data may both absolve the data controller of his legal obligations and deprive the data subject of his rights (e.g., the right to be informed), ultimately undermining the fundamental right to the protection of personal data of natural persons. Moreover, data subject’s rights (e.g., the right not to be subject to a decision based solely on automated processing) are heavily impacted by the use of AI, algorithms, and technologies that reclaim health data for further use, resulting in sometimes ambiguous results that have a substantial impact on individuals. On the other hand, as the COVID-19 pandemic has revealed, Big Data analytics can offer crucial sources of information. In this respect, this paper identifies and systematises the legal provisions concerned, offering interpretative solutions that tackle dangers concerning data subject’s rights while embracing the opportunities that Big Health Data has to offer. In addition, particular attention is attached to the scope of ‘consent’ as a legal basis in the collection and processing of Big Health Data, as the application of data analytics in Big Health Data signals the construction of new data and subject’s profiles. Finally, the paper addresses the knotty problem of role assignment (i.e., distinguishing between controller and processor/joint controllers and joint processors) in an era of extensive Big Health data sharing. The findings are the fruit of a current research project conducted by a three-member research team at the Faculty of Law of the Aristotle University of Thessaloniki and funded by the Greek Ministry of Education and Religious Affairs.

Keywords: big health data, data subject rights, GDPR, pandemic

Procedia PDF Downloads 129
24935 A Web-Based Self-Learning Grammar for Spoken Language Understanding

Authors: S. Biondi, V. Catania, R. Di Natale, A. R. Intilisano, D. Panno

Abstract:

One of the major goals of Spoken Dialog Systems (SDS) is to understand what the user utters. In the SDS domain, the Spoken Language Understanding (SLU) Module classifies user utterances by means of a pre-definite conceptual knowledge. The SLU module is able to recognize only the meaning previously included in its knowledge base. Due the vastity of that knowledge, the information storing is a very expensive process. Updating and managing the knowledge base are time-consuming and error-prone processes because of the rapidly growing number of entities like proper nouns and domain-specific nouns. This paper proposes a solution to the problem of Name Entity Recognition (NER) applied to a SDS domain. The proposed solution attempts to automatically recognize the meaning associated with an utterance by using the PANKOW (Pattern based Annotation through Knowledge On the Web) method at runtime. The method being proposed extracts information from the Web to increase the SLU knowledge module and reduces the development effort. In particular, the Google Search Engine is used to extract information from the Facebook social network.

Keywords: spoken dialog system, spoken language understanding, web semantic, name entity recognition

Procedia PDF Downloads 338
24934 Adaptive Data Approximations Codec (ADAC) for AI/ML-based Cyber-Physical Systems

Authors: Yong-Kyu Jung

Abstract:

The fast growth in information technology has led to de-mands to access/process data. CPSs heavily depend on the time of hardware/software operations and communication over the network (i.e., real-time/parallel operations in CPSs (e.g., autonomous vehicles). Since data processing is an im-portant means to overcome the issue confronting data management, reducing the gap between the technological-growth and the data-complexity and channel-bandwidth. An adaptive perpetual data approximation method is intro-duced to manage the actual entropy of the digital spectrum. An ADAC implemented as an accelerator and/or apps for servers/smart-connected devices adaptively rescales digital contents (avg.62.8%), data processing/access time/energy, encryption/decryption overheads in AI/ML applications (facial ID/recognition).

Keywords: adaptive codec, AI, ML, HPC, cyber-physical, cybersecurity

Procedia PDF Downloads 79
24933 Hyper-Production of Lysine through Fermentation and Its Biological Evaluation on Broiler Chicks

Authors: Shagufta Gulraiz, Abu Saeed Hashmi, Muhammad Mohsin Javed

Abstract:

Lysine required for poultry feed is imported in Pakistan to fulfil the desired dietary needs. Present study was designed to produce maximum lysine by utilizing cheap sources to save the foreign exchange. To achieve the goal of lysine production through fermentation, large scale production of lysine was carried out in 7.5 L stirred glass vessel fermenter with wild and mutant Brevibacterium flavum (B. flavum) using all pre-optimized conditions. The identification of produced lysine was carried out by TLC and amino acid analyzer. Toxicity evaluation of produced lysine was performed before feeding to broiler chicks. During biological trial concentrated fermented broth having 8% lysine was used in poultry rations as a source of Lysine for test birds. Fermenter scale studies showed that the maximum lysine (20.8 g/L) was produced at 250 rpm, 1.5 vvm aeration, 6.0% inoculum under controlled pH conditions after 56 h of fermentation with wild culture but mutant (BFENU2) gave maximum yield of lysine 36.3 g/L under optimized condition after 48 h. Amino acid profiling showed 1.826% Lysine in fermented broth by wild B. flavum and 2.644% by mutant strain (BFENU2). Toxicity evaluation report showed that the produced lysine is safe for consumption by broilers. Biological evaluation results showed that produced lysine was equally good as commercial lysine in terms of weight gain, feed intake and feed conversion ratio. A cheap and practical bioprocess of Lysine production was concluded, that can be exploited commercially in Pakistan to save foreign exchange.

Keywords: lysine, fermentation, broiler chicks, biological evaluation

Procedia PDF Downloads 548
24932 The Noun-Phrase Elements on the Usage of the Zero Article

Authors: Wen Zhen

Abstract:

Compared to content words, function words have been relatively overlooked by English learners especially articles. The article system, to a certain extent, becomes a resistance to know English better, driven by different elements. Three principal factors can be summarized in term of the nature of the articles when referring to the difficulty of the English article system. However, making the article system more complex are difficulties in the second acquisition process, for [-ART] learners have to create another category, causing even most non-native speakers at proficiency level to make errors. According to the sequences of acquisition of the English article, it is showed that the zero article is first acquired and in high inaccuracy. The zero article is often overused in the early stages of L2 acquisition. Although learners at the intermediate level move to underuse the zero article for they realize that the zero article does not cover any case, overproduction of the zero article even occurs among advanced L2 learners. The aim of the study is to investigate noun-phrase factors which give rise to incorrect usage or overuse of the zero article, thus providing suggestions for L2 English acquisition. Moreover, it enables teachers to carry out effective instruction that activate conscious learning of students. The research question will be answered through a corpus-based, data- driven approach to analyze the noun-phrase elements from the semantic context and countability of noun-phrases. Based on the analysis of the International Thurber Thesis corpus, the results show that: (1) Although context of [-definite,-specific] favored the zero article, both[-definite,+specific] and [+definite,-specific] showed less influence. When we reflect on the frequency order of the zero article , prototypicality plays a vital role in it .(2)EFL learners in this study have trouble classifying abstract nouns as countable. We can find that it will bring about overuse of the zero article when learners can not make clear judgements on countability altered from (+definite ) to (-definite).Once a noun is perceived as uncountable by learners, the choice would fall back on the zero article. These findings suggest that learners should be engaged in recognition of the countability of new vocabulary by explaining nouns in lexical phrases and explore more complex aspects such as analysis dependent on discourse.

Keywords: noun phrase, zero article, corpus, second language acquisition

Procedia PDF Downloads 253
24931 On the Utility of Bidirectional Transformers in Gene Expression-Based Classification

Authors: Babak Forouraghi

Abstract:

A genetic circuit is a collection of interacting genes and proteins that enable individual cells to implement and perform vital biological functions such as cell division, growth, death, and signaling. In cell engineering, synthetic gene circuits are engineered networks of genes specifically designed to implement functionalities that are not evolved by nature. These engineered networks enable scientists to tackle complex problems such as engineering cells to produce therapeutics within the patient's body, altering T cells to target cancer-related antigens for treatment, improving antibody production using engineered cells, tissue engineering, and production of genetically modified plants and livestock. Construction of computational models to realize genetic circuits is an especially challenging task since it requires the discovery of the flow of genetic information in complex biological systems. Building synthetic biological models is also a time-consuming process with relatively low prediction accuracy for highly complex genetic circuits. The primary goal of this study was to investigate the utility of a pre-trained bidirectional encoder transformer that can accurately predict gene expressions in genetic circuit designs. The main reason behind using transformers is their innate ability (attention mechanism) to take account of the semantic context present in long DNA chains that are heavily dependent on the spatial representation of their constituent genes. Previous approaches to gene circuit design, such as CNN and RNN architectures, are unable to capture semantic dependencies in long contexts, as required in most real-world applications of synthetic biology. For instance, RNN models (LSTM, GRU), although able to learn long-term dependencies, greatly suffer from vanishing gradient and low-efficiency problem when they sequentially process past states and compresses contextual information into a bottleneck with long input sequences. In other words, these architectures are not equipped with the necessary attention mechanisms to follow a long chain of genes with thousands of tokens. To address the above-mentioned limitations, a transformer model was built in this work as a variation to the existing DNA Bidirectional Encoder Representations from Transformers (DNABERT) model. It is shown that the proposed transformer is capable of capturing contextual information from long input sequences with an attention mechanism. In previous works on genetic circuit design, the traditional approaches to classification and regression, such as Random Forrest, Support Vector Machine, and Artificial Neural Networks, were able to achieve reasonably high R2 accuracy levels of 0.95 to 0.97. However, the transformer model utilized in this work, with its attention-based mechanism, was able to achieve a perfect accuracy level of 100%. Further, it is demonstrated that the efficiency of the transformer-based gene expression classifier is not dependent on the presence of large amounts of training examples, which may be difficult to compile in many real-world gene circuit designs.

Keywords: machine learning, classification and regression, gene circuit design, bidirectional transformers

Procedia PDF Downloads 62
24930 Gestural Pragmatic Inference among Primates: An Experimental Approach

Authors: Siddharth Satishchandran, Brian Khumalo

Abstract:

Humans are able to derive semantic content from syntactic and pragmatic sources. Multimodal evidence from signaling theory, which examines communication between individuals within and across species, suggests that non-human primates possess similar syntactic and pragmatic capabilities. However, the extent remains unknown because primate pragmatics are relatively under-examined. Our paper reviews research within communication theory amongst non-human primates to understand current theoretical trends. We examine evidence for primate pragmatic capacities through observational, experimental, and theoretical work on gestures. Given fragmented theoretical perspectives, we provide a unified framework of communication for future research that contextualizes the available research under code biology. To achieve this, we rely on biological semiotics (biosemiotics), the philosophy of biology investigating prelinguistic meaning-making as a function of signs and codes. We close by discussing areas of potential research for studying gestural pragmatics amongst non-human primates, particularly chimpanzees (Pan troglodytes), Diana monkeys (Cercopithecus diana), and other potential candidates.

Keywords: pragmatics, non-human primates, gestural communication, biological semiotics

Procedia PDF Downloads 42
24929 Real-Time Visualization Using GPU-Accelerated Filtering of LiDAR Data

Authors: Sašo Pečnik, Borut Žalik

Abstract:

This paper presents a real-time visualization technique and filtering of classified LiDAR point clouds. The visualization is capable of displaying filtered information organized in layers by the classification attribute saved within LiDAR data sets. We explain the used data structure and data management, which enables real-time presentation of layered LiDAR data. Real-time visualization is achieved with LOD optimization based on the distance from the observer without loss of quality. The filtering process is done in two steps and is entirely executed on the GPU and implemented using programmable shaders.

Keywords: filtering, graphics, level-of-details, LiDAR, real-time visualization

Procedia PDF Downloads 309
24928 Simulating Human Behavior in (Un)Built Environments: Using an Actor Profiling Method

Authors: Hadas Sopher, Davide Schaumann, Yehuda E. Kalay

Abstract:

This paper addresses the shortcomings of architectural computation tools in representing human behavior in built environments, prior to construction and occupancy of those environments. Evaluating whether a design fits the needs of its future users is currently done solely post construction, or is based on the knowledge and intuition of the designer. This issue is of high importance when designing complex buildings such as hospitals, where the quality of treatment as well as patient and staff satisfaction are of major concern. Existing computational pre-occupancy human behavior evaluation methods are geared mainly to test ergonomic issues, such as wheelchair accessibility, emergency egress, etc. As such, they rely on Agent Based Modeling (ABM) techniques, which emphasize the individual user. Yet we know that most human activities are social, and involve a number of actors working together, which ABM methods cannot handle. Therefore, we present an event-based model that manages the interaction between multiple Actors, Spaces, and Activities, to describe dynamically how people use spaces. This approach requires expanding the computational representation of Actors beyond their physical description, to include psychological, social, cultural, and other parameters. The model presented in this paper includes cognitive abilities and rules that describe the response of actors to their physical and social surroundings, based on the actors’ internal status. The model has been applied in a simulation of hospital wards, and showed adaptability to a wide variety of situated behaviors and interactions.

Keywords: agent based modeling, architectural design evaluation, event modeling, human behavior simulation, spatial cognition

Procedia PDF Downloads 264
24927 Personal Knowledge Management: Systematic Review and Future Direction

Authors: Kuribachew Gizaw Tohiye, Monica Garfield

Abstract:

Personal knowledge management is the aspect of knowledge management that relates to the way in which individuals organize and manage their own set of knowledge. While in that respect, there has been research in this area for the past 25 years, it is at present necessary to speculate upon what research has been done and what we have discovered about this arena of knowledge management. In contrast to organizational knowledge management, which focuses on a firm’s profitability and competitiveness, personal knowledge management (PKM) is concerned with the person’s self-effectiveness, competence and success. People are concerned in managing their knowledge in order to become more efficient in a variety of personal and organizational interests. This study presents a systematic review of PKM studies. Articles with PKM concepts are reviewed with the objective of clearly defining PKM, identifying the benefits of PKM, classifying the tools that enable PKM and finding the research gaps to indicate future research directions in the area. Consequently, we have developed a definition of PKM and identified the benefits of PKM, including an understanding of who seeks PKM and for what. Tools enabling PKM are identified and classified under three categories Web 1.0, 2.0 and 3.0 and finally the research gap and future directions are suggested. Research which facilitates collaboration by using semantic technologies is suggested to be studied further to improve PKM effectiveness.

Keywords: personal knowledge management, knowledge management, organizational knowledge management, systematic review

Procedia PDF Downloads 332
24926 Mirna Expression Profile is Different in Human Amniotic Mesenchymal Stem Cells Isolated from Obese Respect to Normal Weight Women

Authors: Carmela Nardelli, Laura Iaffaldano, Valentina Capobianco, Antonietta Tafuto, Maddalena Ferrigno, Angela Capone, Giuseppe Maria Maruotti, Maddalena Raia, Rosa Di Noto, Luigi Del Vecchio, Pasquale Martinelli, Lucio Pastore, Lucia Sacchetti

Abstract:

Maternal obesity and nutrient excess in utero increase the risk of future metabolic diseases in the adult life. The mechanisms underlying this process are probably based on genetic, epigenetic alterations and changes in foetal nutrient supply. In mammals, the placenta is the main interface between foetus and mother, it regulates intrauterine development, modulates adaptive responses to sub optimal in uterus conditions and it is also an important source of human amniotic mesenchymal stem cells (hA-MSCs). We previously highlighted a specific microRNA (miRNA) profiling in amnion from obese (Ob) pregnant women, here we compared the miRNA expression profile of hA-MSCs isolated from (Ob) and control (Co) women, aimed to search for any alterations in metabolic pathways that could predispose the new-born to the obese phenotype. Methods: We isolated, at delivery, hA-MSCs from amnion of 16 Ob- and 7 Co-women with pre-pregnancy body mass index (mean/SEM) 40.3/1.8 and 22.4/1.0 kg/m2, respectively. hA-MSCs were phenotyped by flow cytometry. Globally, 384 miRNAs were evaluated by the TaqMan Array Human MicroRNA Panel v 1.0 (Applied Biosystems). By the TargetScan program we selected the target genes of the miRNAs differently expressed in Ob- vs Co-hA-MSCs; further, by KEGG database, we selected the statistical significant biological pathways. Results: The immunophenotype characterization confirmed the mesenchymal origin of the isolated hA-MSCs. A large percentage of the tested miRNAs, about 61.4% (232/378), was expressed in hA-MSCs, whereas 38.6% (146/378) was not. Most of the expressed miRNAs (89.2%, 207/232) did not differ between Ob- and Co-hA-MSCs and were not further investigated. Conversely, 4.8% of miRNAs (11/232) was higher and 6.0% (14/232) was lower in Ob- vs Co-hA-MSCs. Interestingly, 7/232 miRNAs were obesity-specific, being expressed only in hA-MSCs isolated from obese women. Bioinformatics showed that these miRNAs significantly regulated (P<0.001) genes belonging to several metabolic pathways, i.e. MAPK signalling, actin cytoskeleton, focal adhesion, axon guidance, insulin signaling, etc. Conclusions: Our preliminary data highlight an altered miRNA profile in Ob- vs Co-hA-MSCs and suggest that an epigenetic miRNA-based mechanism of gene regulation could affect pathways involved in placental growth and function, thereby potentially increasing the newborn’s risk of metabolic diseases in the adult life.

Keywords: hA-MSCs, obesity, miRNA, biosystem

Procedia PDF Downloads 528
24925 Estimating Destinations of Bus Passengers Using Smart Card Data

Authors: Hasik Lee, Seung-Young Kho

Abstract:

Nowadays, automatic fare collection (AFC) system is widely used in many countries. However, smart card data from many of cities does not contain alighting information which is necessary to build OD matrices. Therefore, in order to utilize smart card data, destinations of passengers should be estimated. In this paper, kernel density estimation was used to forecast probabilities of alighting stations of bus passengers and applied to smart card data in Seoul, Korea which contains boarding and alighting information. This method was also validated with actual data. In some cases, stochastic method was more accurate than deterministic method. Therefore, it is sufficiently accurate to be used to build OD matrices.

Keywords: destination estimation, Kernel density estimation, smart card data, validation

Procedia PDF Downloads 352