Search results for: text annotation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1357

Search results for: text annotation

967 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering

Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining

Abstract:

DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.

Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)

Procedia PDF Downloads 254
966 Possibilities and Challenges of Using Machine Translation in Foreign Language Education

Authors: Miho Yamashita

Abstract:

In recent years, there have been attempts to introduce Machine Translation (MT) into foreign language teaching, especially in writing instructions. This is because the performance of neural machine translation has improved dramatically since 2016, and some university instructors started to introduce MT translations to their students as a "good model" to learn from. However, MT is still not perfect, and there are many incorrect translations. In order to translate the intended text into a foreign language, it is necessary to edit the original manuscript written in the native language (pre-edit) and revise the translated foreign language text (post-edit). The latter is considered especially difficult for users without a high proficiency level of foreign language. Therefore, the author allowed her students to use MT in her writing class in one of the private universities in Japan and investigated 1) how groups of students with different English proficiency levels revised MT translations when translating Japanese manuscripts into English and 2) whether the post-edit process differed when the students revised alone or in pairs. The results showed that in 1), certain non-post-edited grammatical errors were found regardless of their proficiency levels, indicating the need for teacher intervention, and in 2), more appropriate corrections were found in pairs, and their frequent use of a dictionary was also observed. In this presentation, the author will discuss how MT writing instruction can be integrated effectively in an aim to achieve multimodal foreign language education.

Keywords: machine translation, writing instruction, pre-edit, post-edit

Procedia PDF Downloads 41
965 How Unicode Glyphs Revolutionized the Way We Communicate

Authors: Levi Corallo

Abstract:

Typed language made by humans on computers and cell phones has made a significant distinction from previous modes of written language exchanges. While acronyms remain one of the most predominant markings of typed language, another and perhaps more recent revolution in the way humans communicate has been with the use of symbols or glyphs, primarily Emojis—globally introduced on the iPhone keyboard by Apple in 2008. This paper seeks to analyze the use of symbols in typed communication from both a linguistic and machine learning perspective. The Unicode system will be explored and methods of encoding will be juxtaposed with the current machine and human perception. Topics in how typed symbol usage exists in conversation will be explored as well as topics across current research methods dealing with Emojis like sentiment analysis, predictive text models, and so on. This study proposes that sequential analysis is a significant feature for analyzing unicode characters in a corpus with machine learning. Current models that are trying to learn or translate the meaning of Emojis should be starting to learn using bi- and tri-grams of Emoji, as well as observing the relationship between combinations of different Emoji in tandem. The sociolinguistics of an entire new vernacular of language referred to here as ‘typed language’ will also be delineated across my analysis with unicode glyphs from both a semantic and technical perspective.

Keywords: unicode, text symbols, emojis, glyphs, communication

Procedia PDF Downloads 172
964 Braille Lab: A New Design Approach for Social Entrepreneurship and Innovation in Assistive Tools for the Visually Impaired

Authors: Claudio Loconsole, Daniele Leonardis, Antonio Brunetti, Gianpaolo Francesco Trotta, Nicholas Caporusso, Vitoantonio Bevilacqua

Abstract:

Unfortunately, many people still do not have access to communication, with specific regard to reading and writing. Among them, people who are blind or visually impaired, have several difficulties in getting access to the world, compared to the sighted. Indeed, despite technology advancement and cost reduction, nowadays assistive devices are still expensive such as Braille-based input/output systems which enable reading and writing texts (e.g., personal notes, documents). As a consequence, assistive technology affordability is fundamental in supporting the visually impaired in communication, learning, and social inclusion. This, in turn, has serious consequences in terms of equal access to opportunities, freedom of expression, and actual and independent participation to a society designed for the sighted. Moreover, the visually impaired experience difficulties in recognizing objects and interacting with devices in any activities of daily living. It is not a case that Braille indications are commonly reported only on medicine boxes and elevator keypads. Several software applications for the automatic translation of written text into speech (e.g., Text-To-Speech - TTS) enable reading pieces of documents. However, apart from simple tasks, in many circumstances TTS software is not suitable for understanding very complicated pieces of text requiring to dwell more on specific portions (e.g., mathematical formulas or Greek text). In addition, the experience of reading\writing text is completely different both in terms of engagement, and from an educational perspective. Statistics on the employment rate of blind people show that learning to read and write provides the visually impaired with up to 80% more opportunities of finding a job. Especially in higher educational levels, where the ability to digest very complex text is key, accessibility and availability of Braille plays a fundamental role in reducing drop-out rate of the visually impaired, thus affecting the effectiveness of the constitutional right to get access to education. In this context, the Braille Lab project aims at overcoming these social needs by including affordability in designing and developing assistive tools for visually impaired people. In detail, our awarded project focuses on a technology innovation of the operation principle of existing assistive tools for the visually impaired leaving the Human-Machine Interface unchanged. This can result in a significant reduction of the production costs and consequently of tool selling prices, thus representing an important opportunity for social entrepreneurship. The first two assistive tools designed within the Braille Lab project following the proposed approach aims to provide the possibility to personally print documents and handouts and to read texts written in Braille using refreshable Braille display, respectively. The former, named ‘Braille Cartridge’, represents an alternative solution for printing in Braille and consists in the realization of an electronic-controlled dispenser printing (cartridge) which can be integrated within traditional ink-jet printers, in order to leverage the efficiency and cost of the device mechanical structure which are already being used. The latter, named ‘Braille Cursor’, is an innovative Braille display featuring a substantial technology innovation by means of a unique cursor virtualizing Braille cells, thus limiting the number of active pins needed for Braille characters.

Keywords: Human rights, social challenges and technology innovations, visually impaired, affordability, assistive tools

Procedia PDF Downloads 242
963 Photoleap: An AI-Powered Photo Editing App with Advanced Features and User Satisfaction Analysis

Authors: Joud Basyouni, Rama Zagzoog, Mashael Al Faleh, Jana Alireza

Abstract:

AI is changing many fields and speeding up tasks that used to take a long time. It used to take too long to edit photos. However, many AI-powered apps make photo editing, automatic effects, and animations much easier than other manual editing apps with no AI. The mobile app Photoleap edits photos and creates digital art using AI. Editing photos with text prompts is also becoming a standard these days with the help of apps like Photoleap. Now, users can change backgrounds, add animations, turn text into images, and create scenes with AI. This project report discusses the photo editing app's history and popularity. Photoleap resembles Photoshop, Canva, Photos, and Pixlr. The report includes survey questions to assess Photoleap user satisfaction. The report describes Photoleap's features and functions with screenshots. Photoleap uses AI well. Charts and graphs show Photoleap user ratings and comments from the survey. This project found that most Photoleap users liked how well it worked, was made, and was easy to use. People liked changing photos and adding backgrounds. Users can create stunning photo animations. A few users dislike the app's animations, AI art, and photo effects. The project report discusses the app's pros and cons and offers improvements.

Keywords: artificial intelligence, photoleap, images, background, photo editing

Procedia PDF Downloads 35
962 Exploring Social Impact of Emerging Technologies from Futuristic Data

Authors: Heeyeul Kwon, Yongtae Park

Abstract:

Despite the highly touted benefits, emerging technologies have unleashed pervasive concerns regarding unintended and unforeseen social impacts. Thus, those wishing to create safe and socially acceptable products need to identify such side effects and mitigate them prior to the market proliferation. Various methodologies in the field of technology assessment (TA), namely Delphi, impact assessment, and scenario planning, have been widely incorporated in such a circumstance. However, literatures face a major limitation in terms of sole reliance on participatory workshop activities. They unfortunately missed out the availability of a massive untapped data source of futuristic information flooding through the Internet. This research thus seeks to gain insights into utilization of futuristic data, future-oriented documents from the Internet, as a supplementary method to generate social impact scenarios whilst capturing perspectives of experts from a wide variety of disciplines. To this end, network analysis is conducted based on the social keywords extracted from the futuristic documents by text mining, which is then used as a guide to produce a comprehensive set of detailed scenarios. Our proposed approach facilitates harmonized depictions of possible hazardous consequences of emerging technologies and thereby makes decision makers more aware of, and responsive to, broad qualitative uncertainties.

Keywords: emerging technologies, futuristic data, scenario, text mining

Procedia PDF Downloads 471
961 Provenance in Scholarly Publications: Introducing the provCite Ontology

Authors: Maria Joseph Israel, Ahmed Amer

Abstract:

Our work aims to broaden the application of provenance technology beyond its traditional domains of scientific workflow management and database systems by offering a general provenance framework to capture richer and extensible metadata in unstructured textual data sources such as literary texts, commentaries, translations, and digital humanities. Specifically, we demonstrate the feasibility of capturing and representing expressive provenance metadata, including more of the context for citing scholarly works (e.g., the authors’ explicit or inferred intentions at the time of developing his/her research content for publication), while also supporting subsequent augmentation with similar additional metadata (by third parties, be they human or automated). To better capture the nature and types of possible citations, in our proposed provenance scheme metaScribe, we extend standard provenance conceptual models to form our proposed provCite ontology. This provides a conceptual framework which can accurately capture and describe more of the functional and rhetorical properties of a citation than can be achieved with any current models.

Keywords: knowledge representation, provenance architecture, ontology, metadata, bibliographic citation, semantic web annotation

Procedia PDF Downloads 92
960 JaCoText: A Pretrained Model for Java Code-Text Generation

Authors: Jessica Lopez Espejel, Mahaman Sanoussi Yahaya Alassan, Walid Dahhane, El Hassane Ettifouri

Abstract:

Pretrained transformer-based models have shown high performance in natural language generation tasks. However, a new wave of interest has surged: automatic programming language code generation. This task consists of translating natural language instructions to a source code. Despite the fact that well-known pre-trained models on language generation have achieved good performance in learning programming languages, effort is still needed in automatic code generation. In this paper, we introduce JaCoText, a model based on Transformer neural network. It aims to generate java source code from natural language text. JaCoText leverages the advantages of both natural language and code generation models. More specifically, we study some findings from state of the art and use them to (1) initialize our model from powerful pre-trained models, (2) explore additional pretraining on our java dataset, (3) lead experiments combining the unimodal and bimodal data in training, and (4) scale the input and output length during the fine-tuning of the model. Conducted experiments on CONCODE dataset show that JaCoText achieves new state-of-the-art results.

Keywords: java code generation, natural language processing, sequence-to-sequence models, transformer neural networks

Procedia PDF Downloads 242
959 Disowning of ‘Our Lady of Alice Bhatti’ by Mohammad Hanif Through Gendered and Religious Discourse

Authors: Abrar Ajmal

Abstract:

The language used in literature reveals the culture and social gestalt of any society in which it has been constructed and consumed. This paper carries the same rationale, which aims to track certain socio-religious and cultural-economic disparities and discrepancies towards minorities, particularly Christians, in an Islamic re(public) where there is a clear majority of Muslims with the help of analysis of instances of language used in the narratives “Our Lady of Alice Bhatt” by Mohammad Hanif. It would highlight social inequalities practiced deeply in sociocultural discourse. Moreover, this research would also touch upon the question of gender discrimination and gender construction as a female entity in a male-chauvinistic scenic turnout using language since the novel revolves around communicative forfeits of Alice Bhatti’s life where she is fraying in fisticuffs to befit herself in a miss-fitted society. It would employ using Fairclough's framework for analysis to conduct a critical discourse analysis of the text at three axiom levels namely textual analysis, discursive practices, and socio-cultural analysis. Thus, the results would reveal textual findings in linguistic analysis, a range of embedded discourses in discursive practices, and consumption of the text into socio-cultural explications with the use of language and lexicalization employed in the selected excerpts.

Keywords: gendered discourse, socio-economic disparities minorities, Islamization, analytical framework

Procedia PDF Downloads 27
958 Probing Language Models for Multiple Linguistic Information

Authors: Bowen Ding, Yihao Kuang

Abstract:

In recent years, large-scale pre-trained language models have achieved state-of-the-art performance on a variety of natural language processing tasks. The word vectors produced by these language models can be viewed as dense encoded presentations of natural language that in text form. However, it is unknown how much linguistic information is encoded and how. In this paper, we construct several corresponding probing tasks for multiple linguistic information to clarify the encoding capabilities of different language models and performed a visual display. We firstly obtain word presentations in vector form from different language models, including BERT, ELMo, RoBERTa and GPT. Classifiers with a small scale of parameters and unsupervised tasks are then applied on these word vectors to discriminate their capability to encode corresponding linguistic information. The constructed probe tasks contain both semantic and syntactic aspects. The semantic aspect includes the ability of the model to understand semantic entities such as numbers, time, and characters, and the grammatical aspect includes the ability of the language model to understand grammatical structures such as dependency relationships and reference relationships. We also compare encoding capabilities of different layers in the same language model to infer how linguistic information is encoded in the model.

Keywords: language models, probing task, text presentation, linguistic information

Procedia PDF Downloads 78
957 On ‘Freaks’ and the Feminine in Margaret Atwood’s ‘Lusus Naturae’

Authors: Shahd Alshammari

Abstract:

This paper considers one of Margaret Atwood’s short stories ‘Lusus Naturae'. Through a critical lens that makes use of Julia Kristeva’s work on Powers of Horror and abjection, this paper suggests that the monstrous girl is the disabled woman, the abject in society. The monster is used as a metaphor for the unknown, the misunderstood, and the ‘different’ woman. Culturally Relevant Teaching (CRT) is a pedagogy that calls for making course material accessible and relevant to students. Through the study of literary texts, we are able to help create agency inside and outside the classroom. Stories are a necessary part of establishing connections across borders and boundaries. Stories are meant to raise awareness both inside and outside the classroom. The discussion is equally important, and the text is meant to facilitate relevant questions that the students need to consider when it comes to identity. Questions to consider are: what does it mean to be a ‘girl’ today, and what implications and consequences are at hand when you fail to perform this gendered identity? Gender is sometimes a fatal bond in the Middle East, and even more so, is the disability. In the case of our unnamed protagonist, she undergoes a process of un-becoming, a non-linear process of growing up. In a sense, it is a counter-Bildungsroman. The reading of this text emphasizes that a non-linear narrative is sometimes necessary for the female protagonist’s self-awareness and development. Discussion in class facilitates this sense of agency and questioning of gender and disability.

Keywords: disability, gender, literature, pedagogy

Procedia PDF Downloads 629
956 Social Data-Based Users Profiles' Enrichment

Authors: Amel Hannech, Mehdi Adda, Hamid Mcheick

Abstract:

In this paper, we propose a generic model of user profile integrating several elements that may positively impact the research process. We exploit the classical behavior of users and integrate a delimitation process of their research activities into several research sessions enriched with contextual and temporal information, which allows reflecting the current interests of these users in every period of time and infer data freshness. We argue that the annotation of resources gives more transparency on users' needs. It also strengthens social links among resources and users, and can so increase the scope of the user profile. Based on this idea, we integrate the social tagging practice in order to exploit the social users' behavior to enrich their profiles. These profiles are then integrated into a recommendation system in order to predict the interesting personalized items of users allowing to assist them in their researches and further enrich their profiles. In this recommendation, we provide users new research experiences.

Keywords: user profiles, topical ontology, contextual information, folksonomies, tags' clusters, data freshness, association rules, data recommendation

Procedia PDF Downloads 242
955 The Use of Neuter in Oedipus Lines to Refer to Antigone in Phoenissae of Seneca

Authors: Cíntia Martins Sanches

Abstract:

In the first part of Phoenissae of Seneca, Antigone is a guide to Oedipus, and they leave Thebes: he is blind searching for death (inflicting the punishment himself wished on the killer of Laius, ie exile and death); she is trying to convince him to give up such punishment and bring him back to Thebes. Concerning Oedipus lines, we observed a high frequency of Latin neuter in the treatment the protagonist gave to his daughter Antigone. We considered in this study that such frequency may be related to the sanctification of the daughter, who is seen by him as an enlightened being and without defects, free of the human condition (which takes on the existence of failures by essence). This study, thus, puts forward an analysis of the passages the said feature is present, relating them to the effect of meaning found in each occurrence. As part of a doctorate, this study investigates the stylistic idiom of Seneca in the Oedipus and Phoenissae tragedies, aiming at translating both tragedies expressively. The concept of stylistic idiom concerns the stylistic affinity required for a translation to be equivalent to the source text. In this wise, this study inquires into how the Latin text is organized poetically, pointing out the expressive features frequently appearing in both dramas. The method we used is based on the Semiotics theory — observing how connotation, ie a language use in which prevails the poetic function, naturally polysemous, acts to achieve each expressive effect.

Keywords: antigone, neuter, Oedipus, Phoenissae, Seneca

Procedia PDF Downloads 263
954 Ranking Priorities for Digital Health in Portugal: Aligning Health Managers’ Perceptions with Official Policy Perspectives

Authors: Pedro G. Rodrigues, Maria J. Bárrios, Sara A. Ambrósio

Abstract:

The digitalisation of health is a profoundly transformative economic, political, and social process. As is often the case, such processes need to be carefully managed if misunderstandings, policy misalignments, or outright conflicts between the government and a wide gamut of stakeholders with competing interests are to be avoided. Thus, ensuring open lines of communication where all parties know what each other’s concerns are is key to good governance, as well as efficient and effective policymaking. This project aims to make a small but still significant contribution in this regard in that we seek to determine the extent to which health managers’ perceptions of what is a priority for digital health in Portugal are aligned with official policy perspectives. By applying state-of-the-art artificial intelligence technology first to the indexed literature on digital health and then to a set of official policy documents on the same topic, followed by a survey directed at health managers working in public and private hospitals in Portugal, we obtain two priority rankings that, when compared, will allow us to produce a synthesis and toolkit on digital health policy in Portugal, with a view to identifying areas of policy convergence and divergence. This project is also particularly peculiar in the sense that sophisticated digital methods related to text analytics are employed to study good governance aspects of digitalisation applied to health care.

Keywords: digital health, health informatics, text analytics, governance, natural language understanding

Procedia PDF Downloads 43
953 The Role of Digital Text in School and Vernacular Literacies: Students Digital Practices at Cybercafés in Mexico

Authors: Guadalupe López-Bonilla

Abstract:

Students of all educational levels participate in literacy practices that may involve print or digital media. Scholars from the New Literacy Studies distinguish practices that fulfill institutional purposes such as those established at schools from literate practices aimed at doing other kinds of activities, such as reading instructions in order to play a video game; the first are known as institutional practices while the latter are considered vernacular literacies. When students perform these kinds of activities they engage with print and digital media according to the demands of the task. In this paper, it is aimed to discuss the results of a research project focusing on literacy practices of high school students at 10 urban cybercafés in Mexico. The main objective was to analyze the literacy practices of students performing both school tasks and vernacular literacies. The methodology included a focused ethnography with online and face to face observations of 10 high school students (5 male and 5 female) and interviews after performing each task. In the results, it is presented how students treat texts as open, dynamic and relational artifacts when engaging in vernacular literacies; while texts are conceived as closed, authoritarian and fixed documents when performing school activities. Samples of each type of activity are shown followed by a discussion of the pedagogical implications for improving school literacy.

Keywords: digital literacy, text, school literacy, vernacular practices

Procedia PDF Downloads 245
952 A Web-Based Self-Learning Grammar for Spoken Language Understanding

Authors: S. Biondi, V. Catania, R. Di Natale, A. R. Intilisano, D. Panno

Abstract:

One of the major goals of Spoken Dialog Systems (SDS) is to understand what the user utters. In the SDS domain, the Spoken Language Understanding (SLU) Module classifies user utterances by means of a pre-definite conceptual knowledge. The SLU module is able to recognize only the meaning previously included in its knowledge base. Due the vastity of that knowledge, the information storing is a very expensive process. Updating and managing the knowledge base are time-consuming and error-prone processes because of the rapidly growing number of entities like proper nouns and domain-specific nouns. This paper proposes a solution to the problem of Name Entity Recognition (NER) applied to a SDS domain. The proposed solution attempts to automatically recognize the meaning associated with an utterance by using the PANKOW (Pattern based Annotation through Knowledge On the Web) method at runtime. The method being proposed extracts information from the Web to increase the SLU knowledge module and reduces the development effort. In particular, the Google Search Engine is used to extract information from the Facebook social network.

Keywords: spoken dialog system, spoken language understanding, web semantic, name entity recognition

Procedia PDF Downloads 313
951 Myanmar Character Recognition Using Eight Direction Chain Code Frequency Features

Authors: Kyi Pyar Zaw, Zin Mar Kyu

Abstract:

Character recognition is the process of converting a text image file into editable and searchable text file. Feature Extraction is the heart of any character recognition system. The character recognition rate may be low or high depending on the extracted features. In the proposed paper, 25 features for one character are used in character recognition. Basically, there are three steps of character recognition such as character segmentation, feature extraction and classification. In segmentation step, horizontal cropping method is used for line segmentation and vertical cropping method is used for character segmentation. In the Feature extraction step, features are extracted in two ways. The first way is that the 8 features are extracted from the entire input character using eight direction chain code frequency extraction. The second way is that the input character is divided into 16 blocks. For each block, although 8 feature values are obtained through eight-direction chain code frequency extraction method, we define the sum of these 8 feature values as a feature for one block. Therefore, 16 features are extracted from that 16 blocks in the second way. We use the number of holes feature to cluster the similar characters. We can recognize the almost Myanmar common characters with various font sizes by using these features. All these 25 features are used in both training part and testing part. In the classification step, the characters are classified by matching the all features of input character with already trained features of characters.

Keywords: chain code frequency, character recognition, feature extraction, features matching, segmentation

Procedia PDF Downloads 294
950 Teaching Tolerance in the Language Classroom through a Text

Authors: Natalia Kasatkina

Abstract:

In an ever-increasing globalization, one’s grasp of diversity and tolerance has never been more indispensable, and it is a vital duty for all those in the field of foreign language teaching to help children cultivate such values. The present study explores the role of DIVERSITY and TOLERANCE in the language classroom and elementary, middle, and high school students’ perceptions of these two concepts. It draws on several theoretical domains of language acquisition, cultural awareness, and school psychology. Relying on these frameworks, the major findings are synthesized, and a paradigm of teaching tolerance through language-teaching is formulated. Upon analysing how tolerant our children are with ‘others’ in and outside the classroom, we have concluded that intolerance and aggression towards the ‘other’ increase with age, and that a feeling of supremacy over migrants and a sense of fear towards them begin to manifest more apparently when the students are in high school. In addition, we have also found that children in elementary school do not exhibit such prejudiced thoughts and behavior, which leads us to the believe that tolerance as well as intolerance are learned. Therefore, it is within our reach to teach our children to be open-minded and accepting. We have used the novel ‘Uncle Tom’s Cabin’ by Harriet Beecher Stowe as a springboard for lessons which are not only targeted at shedding light on the role of language in the modern world, but also aim to stimulate an awareness of cultural diversity. We equally strive to conduct further cross-cultural research in order to solidify the theory behind this study, and thus devise a language-based curriculum which would encourage tolerance through the examination of various literary texts.

Keywords: literary text, tolerance, EFL classroom, word-association test

Procedia PDF Downloads 266
949 EnumTree: An Enumerative Biclustering Algorithm for DNA Microarray Data

Authors: Haifa Ben Saber, Mourad Elloumi

Abstract:

In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative tree (EnumTree) for biclustering of binary microarray data. is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of ​​EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA micryarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Biclusters with different numbers of rows. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevent biclusters.

Keywords: DNA microarray, biclustering, gene expression data, tree, datamining.

Procedia PDF Downloads 352
948 Multimodal Content: Fostering Students’ Language and Communication Competences

Authors: Victoria L. Malakhova

Abstract:

The research is devoted to multimodal content and its effectiveness in developing students’ linguistic and intercultural communicative competences as an indefeasible constituent of their future professional activity. Description of multimodal content both as a linguistic and didactic phenomenon makes the study relevant. The objective of the article is the analysis of creolized texts and the effect they have on fostering higher education students’ skills and their productivity. The main methods used are linguistic text analysis, qualitative and quantitative methods, deduction, generalization. The author studies texts with full and partial creolization, their features and role in composing multimodal textual space. The main verbal and non-verbal markers and paralinguistic means that enhance the linguo-pragmatic potential of creolized texts are covered. To reveal the efficiency of multimodal content application in English teaching, the author conducts an experiment among both undergraduate students and teachers. This allows specifying main functions of creolized texts in the process of language learning, detecting ways of enhancing students’ competences, and increasing their motivation. The described stages of using creolized texts can serve as an algorithm for work with multimodal content in teaching English as a foreign language. The findings contribute to improving the efficiency of the academic process.

Keywords: creolized text, English language learning, higher education, language and communication competences, multimodal content

Procedia PDF Downloads 93
947 A Postmodern Framework for Quranic Hermeneutics

Authors: Christiane Paulus

Abstract:

Post-Islamism assumes that the Quran should not be viewed in terms of what Lyotard identifies as a ‘meta-narrative'. However, its socio-ethical content can be viewed as critical of power discourse (Foucault). Practicing religion seems to be limited to rites and individual spirituality, taqwa. Alternatively, can we build on Muhammad Abduh's classic-modern reform and develop it through a postmodernist frame? This is the main question of this study. Through his general and vague remarks on the context of the Quran, Abduh was the first to refer to the historical and cultural distance of the text as an obstacle for interpretation. His application, however, corresponded to the modern absolute idea of authentic sharia. He was followed by Amin al-Khuli, who hermeneutically linked the content of the Quran to the theory of evolution. Fazlur Rahman and Nasr Hamid abu Zeid remain reluctant to go beyond the general level in terms of context. The hermeneutic circle, therefore, persists in challenging, how to get out to overcome one’s own assumptions. The insight into and the acceptance of the lasting ambivalence of understanding can be grasped as a postmodern approach; it is documented in Derrida's discovery of the shift in text meanings, difference, also in Lyotard's theory of différend. The resulting mixture of meanings (Wolfgang Welsch) can be read together with the classic ambiguity of the premodern interpreters of the Quran (Thomas Bauer). Confronting hermeneutic difficulties in general, Niklas Luhmann proves every description an attribution, tautology, i.e., remaining in the circle. ‘De-tautologization’ is possible, namely by analyzing the distinctions in the sense of objective, temporal and social information that every text contains. This could be expanded with the Kantian aesthetic dimension of reason (critique of pure judgment) corresponding to the iʽgaz of the Coran. Luhmann asks, ‘What distinction does the observer/author make?’ Quran as a speech from God to the first listeners could be seen as a discourse responding to the problems of everyday life of that time, which can be viewed as the general goal of the entire Qoran. Through reconstructing koranic Lifeworlds (Alfred Schütz) in detail, the social structure crystallizes the socio-economic differences, the enormous poverty. The koranic instruction to provide the basic needs for the neglected groups, which often intersect (old, poor, slaves, women, children), can be seen immediately in the text. First, the references to lifeworlds/social problems and discourses in longer koranic passages should be hypothesized. Subsequently, information from the classic commentaries could be extracted, the classical Tafseer, in particular, contains rich narrative material for reconstructing. By selecting and assigning suitable, specific context information, the meaning of the description becomes condensed (Clifford Geertz). In this manner, the text gets necessarily an alienation and is newly accessible. The socio-ethical implications can thus be grasped from the difference of the original problem and the revealed/improved order/procedure; this small step can be materialized as such, not as an absolute solution but as offering plausible patterns for today’s challenges as the Agenda 2030.

Keywords: postmodern hermeneutics, condensed description, sociological approach, small steps of reform

Procedia PDF Downloads 190
946 Alphabet Recognition Using Pixel Probability Distribution

Authors: Vaidehi Murarka, Sneha Mehta, Dishant Upadhyay

Abstract:

Our project topic is “Alphabet Recognition using pixel probability distribution”. The project uses techniques of Image Processing and Machine Learning in Computer Vision. Alphabet recognition is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files etc. Alphabet Recognition based OCR application is sometimes used in signature recognition which is used in bank and other high security buildings. One of the popular mobile applications includes reading a visiting card and directly storing it to the contacts. OCR's are known to be used in radar systems for reading speeders license plates and lots of other things. The implementation of our project has been done using Visual Studio and Open CV (Open Source Computer Vision). Our algorithm is based on Neural Networks (machine learning). The project was implemented in three modules: (1) Training: This module aims “Database Generation”. Database was generated using two methods: (a) Run-time generation included database generation at compilation time using inbuilt fonts of OpenCV library. Human intervention is not necessary for generating this database. (b) Contour–detection: ‘jpeg’ template containing different fonts of an alphabet is converted to the weighted matrix using specialized functions (contour detection and blob detection) of OpenCV. The main advantage of this type of database generation is that the algorithm becomes self-learning and the final database requires little memory to be stored (119kb precisely). (2) Preprocessing: Input image is pre-processed using image processing concepts such as adaptive thresholding, binarizing, dilating etc. and is made ready for segmentation. “Segmentation” includes extraction of lines, words, and letters from the processed text image. (3) Testing and prediction: The extracted letters are classified and predicted using the neural networks algorithm. The algorithm recognizes an alphabet based on certain mathematical parameters calculated using the database and weight matrix of the segmented image.

Keywords: contour-detection, neural networks, pre-processing, recognition coefficient, runtime-template generation, segmentation, weight matrix

Procedia PDF Downloads 363
945 Meta Mask Correction for Nuclei Segmentation in Histopathological Image

Authors: Jiangbo Shi, Zeyu Gao, Chen Li

Abstract:

Nuclei segmentation is a fundamental task in digital pathology analysis and can be automated by deep learning-based methods. However, the development of such an automated method requires a large amount of data with precisely annotated masks which is hard to obtain. Training with weakly labeled data is a popular solution for reducing the workload of annotation. In this paper, we propose a novel meta-learning-based nuclei segmentation method which follows the label correction paradigm to leverage data with noisy masks. Specifically, we design a fully conventional meta-model that can correct noisy masks by using a small amount of clean meta-data. Then the corrected masks are used to supervise the training of the segmentation model. Meanwhile, a bi-level optimization method is adopted to alternately update the parameters of the main segmentation model and the meta-model. Extensive experimental results on two nuclear segmentation datasets show that our method achieves the state-of-the-art result. In particular, in some noise scenarios, it even exceeds the performance of training on supervised data.

Keywords: deep learning, histopathological image, meta-learning, nuclei segmentation, weak annotations

Procedia PDF Downloads 120
944 Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Authors: Yunseok Noh, Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park

Abstract:

Scripts are one of the basic text resources to understand broadcasting contents. Since broadcast media wields lots of influence over the public, tools for understanding broadcasting contents are more required. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches. Scripts also provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scripts consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics based on statistical learning method. To tackle this problem, we propose a method of learning with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, by using high quality of topics, we can get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

Keywords: broadcasting contents, scripts, text similarity, topic model

Procedia PDF Downloads 293
943 Treating Voxels as Words: Word-to-Vector Methods for fMRI Meta-Analyses

Authors: Matthew Baucum

Abstract:

With the increasing popularity of fMRI as an experimental method, psychology and neuroscience can greatly benefit from advanced techniques for summarizing and synthesizing large amounts of data from brain imaging studies. One promising avenue is automated meta-analyses, in which natural language processing methods are used to identify the brain regions consistently associated with certain semantic concepts (e.g. “social”, “reward’) across large corpora of studies. This study builds on this approach by demonstrating how, in fMRI meta-analyses, individual voxels can be treated as vectors in a semantic space and evaluated for their “proximity” to terms of interest. In this technique, a low-dimensional semantic space is built from brain imaging study texts, allowing words in each text to be represented as vectors (where words that frequently appear together are near each other in the semantic space). Consequently, each voxel in a brain mask can be represented as a normalized vector sum of all of the words in the studies that showed activation in that voxel. The entire brain mask can then be visualized in terms of each voxel’s proximity to a given term of interest (e.g., “vision”, “decision making”) or collection of terms (e.g., “theory of mind”, “social”, “agent”), as measured by the cosine similarity between the voxel’s vector and the term vector (or the average of multiple term vectors). Analysis can also proceed in the opposite direction, allowing word cloud visualizations of the nearest semantic neighbors for a given brain region. This approach allows for continuous, fine-grained metrics of voxel-term associations, and relies on state-of-the-art “open vocabulary” methods that go beyond mere word-counts. An analysis of over 11,000 neuroimaging studies from an existing meta-analytic fMRI database demonstrates that this technique can be used to recover known neural bases for multiple psychological functions, suggesting this method’s utility for efficient, high-level meta-analyses of localized brain function. While automated text analytic methods are no replacement for deliberate, manual meta-analyses, they seem to show promise for the efficient aggregation of large bodies of scientific knowledge, at least on a relatively general level.

Keywords: FMRI, machine learning, meta-analysis, text analysis

Procedia PDF Downloads 424
942 De Novo Assembly and Characterization of the Transcriptome during Seed Development, and Generation of Genic-SSR Markers in Pomegranate (Punica granatum L.)

Authors: Ozhan Simsek, Dicle Donmez, Burhanettin Imrak, Ahsen Isik Ozguven, Yildiz Aka Kacar

Abstract:

Pomegranate (Punica granatum L.) is known to be one of the oldest edible fruit tree species, with a wide geographical global distribution. Fruits from the two defined varieties (Hicaznar and 33N26) were taken at intervals after pollination and fertilization at different sizes. Seed samples were used for transcriptome sequencing. Primary sequencing was produced by Illumina Hi-Seq™ 2000. Firstly, we had raw reads, and it was subjected to quality control (QC). Raw reads were filtered into clean reads and aligned to the reference sequences. De novo analysis was performed to detect genes expressed in seeds of pomegranate varieties. We performed downstream analysis to determine differentially expressed genes. We generated about 27.09 gb bases in total after Illumina Hi-Seq sequencing. All samples were assembled together, we got 59,264 Unigenes, the total length, average length, N50, and GC content of Unigenes are 84.547.276 bp, 1.426 bp, 2,137 bp, and 46.20 %, respectively. Unigenes were annotated with 7 functional databases, finally, 42.681(NR: 72.02%), 39.660 (NT: 66.92%), 30.790 (Swissprot: 51.95%), 20.212 (COG: 34.11%), 27.689 (KEGG: 46.72%), 12.328 (GO: 20.80%), and 33,833 (Interpro: 57.09%) Unigenes were annotated. With functional annotation results, we detected 42.376 CDS, and 4.999 SSR distribute on 16.143 Unigenes.

Keywords: next generation sequencing, SSR, RNA-Seq, Illumina

Procedia PDF Downloads 214
941 A Systematic Review: Prevalence and Risk Factors of Low Back Pain among Waste Collection Workers

Authors: Benedicta Asante, Brenna Bath, Olugbenga Adebayo, Catherine Trask

Abstract:

Background: Waste Collection Workers’ (WCWs) activities contribute greatly to the recycling sector and are an important component of the waste management industry. As the recycling sector evolves, reports of injuries and fatal accidents in the industry demand notice particularly common and debilitating musculoskeletal disorders such as low back pain (LBP). WCWs are likely exposed to diverse work-related hazards that could contribute to LBP. However, to our knowledge there has never been a systematic review or other synthesis of LBP findings within this workforce. The aim of this systematic review was to determine the prevalence and risk factors of LBP among WCWs. Method: A comprehensive search was conducted in Ovid Medline, EMBASE, and Global Health e-publications with search term categories ‘low back pain’ and ‘waste collection workers’. Articles were screened at title, abstract, and full-text stages by two reviewers. Data were extracted on study design, sampling strategy, socio-demographic, geographical region, and exposure definition, definition of LBP, risk factors, response rate, statistical techniques, and LBP prevalence. Risk of bias (ROB) was assessed based on Hoy Damien’s ROB scale. Results: The search of three databases generated 79 studies. Thirty-two studies met the study inclusion criteria for both title and abstract; thirteen full-text articles met the study criteria at the full-text stage. Seven articles (54%) reported prevalence within 12 months of LBP between 42-82% among WCW. The major risk factors for LBP among WCW included: awkward posture; lifting; pulling; pushing; repetitive motions; work duration; and physical loads. Summary data and syntheses of findings was presented in trend-lines and tables to establish the several prevalence periods based on age and region distribution. Public health implications: LBP is a major occupational hazard among WCWs. In light of these risks and future growth in this industry, further research should focus on more detail ergonomic exposure assessment and LBP prevention efforts.

Keywords: low back pain, scavenger, waste collection workers, waste pickers

Procedia PDF Downloads 298
940 Engagement Analysis Using DAiSEE Dataset

Authors: Naman Solanki, Souraj Mondal

Abstract:

With the world moving towards online communication, the video datastore has exploded in the past few years. Consequently, it has become crucial to analyse participant’s engagement levels in online communication videos. Engagement prediction of people in videos can be useful in many domains, like education, client meetings, dating, etc. Video-level or frame-level prediction of engagement for a user involves the development of robust models that can capture facial micro-emotions efficiently. For the development of an engagement prediction model, it is necessary to have a widely-accepted standard dataset for engagement analysis. DAiSEE is one of the datasets which consist of in-the-wild data and has a gold standard annotation for engagement prediction. Earlier research done using the DAiSEE dataset involved training and testing standard models like CNN-based models, but the results were not satisfactory according to industry standards. In this paper, a multi-level classification approach has been introduced to create a more robust model for engagement analysis using the DAiSEE dataset. This approach has recorded testing accuracies of 0.638, 0.7728, 0.8195, and 0.866 for predicting boredom level, engagement level, confusion level, and frustration level, respectively.

Keywords: computer vision, engagement prediction, deep learning, multi-level classification

Procedia PDF Downloads 95
939 A BERT-Based Model for Financial Social Media Sentiment Analysis

Authors: Josiel Delgadillo, Johnson Kinyua, Charles Mutigwe

Abstract:

The purpose of sentiment analysis is to determine the sentiment strength (e.g., positive, negative, neutral) from a textual source for good decision-making. Natural language processing in domains such as financial markets requires knowledge of domain ontology, and pre-trained language models, such as BERT, have made significant breakthroughs in various NLP tasks by training on large-scale un-labeled generic corpora such as Wikipedia. However, sentiment analysis is a strong domain-dependent task. The rapid growth of social media has given users a platform to share their experiences and views about products, services, and processes, including financial markets. StockTwits and Twitter are social networks that allow the public to express their sentiments in real time. Hence, leveraging the success of unsupervised pre-training and a large amount of financial text available on social media platforms could potentially benefit a wide range of financial applications. This work is focused on sentiment analysis using social media text on platforms such as StockTwits and Twitter. To meet this need, SkyBERT, a domain-specific language model pre-trained and fine-tuned on financial corpora, has been developed. The results show that SkyBERT outperforms current state-of-the-art models in financial sentiment analysis. Extensive experimental results demonstrate the effectiveness and robustness of SkyBERT.

Keywords: BERT, financial markets, Twitter, sentiment analysis

Procedia PDF Downloads 128
938 Mining User-Generated Contents to Detect Service Failures with Topic Model

Authors: Kyung Bae Park, Sung Ho Ha

Abstract:

Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.

Keywords: latent dirichlet allocation, R program, text mining, topic model, user generated contents, visualization

Procedia PDF Downloads 164