Search results for: Corpus-Based Collocations Dictionary
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 109

Search results for: Corpus-Based Collocations Dictionary

49 On the Comprehension of English Compound Nouns by Arabic-Speaking EFL Learners

Authors: Abdel Rahman Altakhaineh, Mohamma Alaghawat, Hiba Alhendi

Abstract:

This paper reports an investigation of the comprehension of English compound nouns by sixty Arabic-speaking English Foreign Language (EFL) learners majoring in English at the University of Jordan, Amman. The investigation focused on the problems that these learners may encounter in understanding certain types of compounds and their ability to use their L1 compound noun knowledge to produce the meaning of L2 compound nouns. Participants whose English proficiency level was advanced underwent a test to identify the meaning ofan underlined compound without using a dictionary. Theresponses to the three different types of compounds were analyzed usingTwo-Way repeated measures ANOVA, and the results showed that there were different endocentric and exocentric compound responses within subordinative compounds, with a statistically significant difference between the two in favor of endocentric compounds. We argue that the endocentric, especially subordinative endocentric compounds,weremore easily understood due to its representative nature, i.e., because the head represents the meaning of the whole compound. The study concludes with pedagogical implications for teaching compound nouns.

Keywords: morphology, compounding, SLA, arabic-speaking EFL learners

Procedia PDF Downloads 85
48 Meaningful Habit for EFL Learners

Authors: Ana Maghfiroh

Abstract:

Learning a foreign language needs a big effort from the learner itself to make their language ability grows better day by day. Among those, they also need a support from all around them including teacher, friends, as well as activities which support them to speak the language. When those activities developed well as a habit which are done regularly, it will help improving the students’ language competence. It was a qualitative research which aimed to find out and describe some activities implemented in Pesantren Al Mawaddah, Ponorogo, in order to teach the students a foreign language. In collecting the data, the researcher used interview, questionnaire, and documentation. From the study, it was found that Pesantren Al Mawaddah had successfully built the language habit on the students to speak the target language. More than 15 hours a day students were compelled to speak foreign language, Arabic or English, in turn. It aimed to habituate the students to keep in touch with the target language. The habit was developed through daily language activities, such as dawn vocabs giving, dictionary handling, daily language use, speech training and language intensive course, daily language input, and night vocabs memorizing. That habit then developed the students awareness towards the language learned as well as promoted their language mastery.

Keywords: habit, communicative competence, daily language activities, Pesantren

Procedia PDF Downloads 511
47 Sparsity-Based Unsupervised Unmixing of Hyperspectral Imaging Data Using Basis Pursuit

Authors: Ahmed Elrewainy

Abstract:

Mixing in the hyperspectral imaging occurs due to the low spatial resolutions of the used cameras. The existing pure materials “endmembers” in the scene share the spectra pixels with different amounts called “abundances”. Unmixing of the data cube is an important task to know the present endmembers in the cube for the analysis of these images. Unsupervised unmixing is done with no information about the given data cube. Sparsity is one of the recent approaches used in the source recovery or unmixing techniques. The l1-norm optimization problem “basis pursuit” could be used as a sparsity-based approach to solve this unmixing problem where the endmembers is assumed to be sparse in an appropriate domain known as dictionary. This optimization problem is solved using proximal method “iterative thresholding”. The l1-norm basis pursuit optimization problem as a sparsity-based unmixing technique was used to unmix real and synthetic hyperspectral data cubes.

Keywords: basis pursuit, blind source separation, hyperspectral imaging, spectral unmixing, wavelets

Procedia PDF Downloads 177
46 The Effect of Using LDOCE on Iranian EFL Learners’ Pronunciation Accuracy

Authors: Mohammad Hadi Mahmoodi, Elahe Saedpanah

Abstract:

Since pronunciation is among those factors that can have strong effects on EFL learners’ successful communication, instructional programs with accurate pronunciation purposes seem to be a necessity in any L2 teaching context. The widespread use of smart mobile phones brings with itself various educational applications, which can assist foreign language learners in learning and speaking another language other than their L1. In line with this supportive innovation, the present study investigated the role of LDOCE (Longman Dictionary of Contemporary English), a mobile application, on improving Iranian EFL learners’ pronunciation accuracy. To this aim, 40 EFL learners studying English at the intermediate level participated in the current study. This was an experimental research with two groups of 20 students in an experimental and a control group. The data were collected through the administration of a pronunciation pretest before the instruction and a post-test after the treatment. In addition, the assessment was based on the pupils’ recorded voices while reading the selected words. The results of the independent samples t-test indicated that using LDOCE significantly affected Iranian EFL learners' pronunciation accuracy with those in the experimental group outperforming their control group counterparts.

Keywords: LDOCE, EFL learners, pronunciation accuracy, CALL, MALL

Procedia PDF Downloads 524
45 A Sparse Representation Speech Denoising Method Based on Adapted Stopping Residue Error

Authors: Qianhua He, Weili Zhou, Aiwu Chen

Abstract:

A sparse representation speech denoising method based on adapted stopping residue error was presented in this paper. Firstly, the cross-correlation between the clean speech spectrum and the noise spectrum was analyzed, and an estimation method was proposed. In the denoising method, an over-complete dictionary of the clean speech power spectrum was learned with the K-singular value decomposition (K-SVD) algorithm. In the sparse representation stage, the stopping residue error was adaptively achieved according to the estimated cross-correlation and the adjusted noise spectrum, and the orthogonal matching pursuit (OMP) approach was applied to reconstruct the clean speech spectrum from the noisy speech. Finally, the clean speech was re-synthesised via the inverse Fourier transform with the reconstructed speech spectrum and the noisy speech phase. The experiment results show that the proposed method outperforms the conventional methods in terms of subjective and objective measure.

Keywords: speech denoising, sparse representation, k-singular value decomposition, orthogonal matching pursuit

Procedia PDF Downloads 477
44 A Board of Comparative Study of Central Secondary Education (CBSE) and Board of Secondry Education Madhya Pradesh BHOPAL (BSEMPB) Hindi Text Books of Class-VI

Authors: Shri Krishna Mishra, Badri Yadav

Abstract:

Proficient persons should be involved in formulation of the structure of the textbooks so that the topics selected in the Hindi textbooks for Class VII should contribute towards linguistic and literary development of the child and the language of the textbook matches the comprehension level of the student.The topics of tile textbooks should provide good illustrations and suitable exercises. Topics of variety of taste can be included in the textbook to satisfy the inquisitive children. There could be abstracts/hints at the beginning of each lesson. Meanings for difficult words must be given at the end of each topic for convenience of the parents and children as most of them find it difficult and time consuming to use Hindi dictionary. Exercises should be relevant covering the whole topic and the difficulty level should match the maturity level of the students in respect of CBSE Board. The stitching and binding of CBSE prescribed books may be improved to increase durability.

Keywords: comparative study of CBSE and BSEMPB, Central Secondary Education, Board of Secondry Education, BHOPAL

Procedia PDF Downloads 380
43 Deep Learning Based-Object-classes Semantic Classification of Arabic Texts

Authors: Imen Elleuch, Wael Ouarda, Gargouri Bilel

Abstract:

We proposes in this paper a Deep Learning based approach to classify text in order to enrich an Arabic ontology based on the objects classes of Gaston Gross. Those object classes are defined by taking into account the syntactic and semantic features of the treated language. Thus, our proposed approach is a hybrid one. In fact, it is based on the one hand on the object classes that represents a knowledge based-approach on classification of text and in the other hand it uses the deep learning approach that use the word embedding-based-approach to classify text. We have applied our proposed approach on a corpus constructed from an Arabic dictionary. The obtained semantic classification of text will enrich the Arabic objects classes ontology. In fact, new classes can be added to the ontology or an expansion of the features that characterizes each object class can be updated. The obtained results are compared to a similar work that treats the same object with a classical linguistic approach for the semantic classification of text. This comparison highlight our hybrid proposed approach that can be ameliorated by broaden the dataset used in the deep learning process.

Keywords: deep-learning approach, object-classes, semantic classification, Arabic

Procedia PDF Downloads 50
42 Particular Features of the First Romanian Multilingual Dictionaries

Authors: Mihaela Mocanu

Abstract:

The Romanian multilingual dictionaries – also named polyglot, plurilingual or polylingual dictionaries, have known a slow yet constant development starting with the end of the 17th century, when the first such work is attested, to the present time, when we witness a considerable increase of the number of polyglot dictionaries, especially the terminological ones. This paper aims at analyzing the context in which the first Romanian multilingual dictionaries were issued, as well as and the organization and structure particularities of the first lexicographic works of this type. The irretrievable loss of some of these works as well as the partial conservation of others renders the attempt to retrace the beginnings of Romanian lexicography extremely difficult. The research methodology is part of a descriptive and analytical approach based on two types of sources, subject to contrastive analysis: the notes made by the initiators of lexicographic projects and the testimonies of their contemporaries, respectively, along with the specialized studies regarding the history of the old Romanian lexicography. The analysis of the contents has indicated that these dictionaries lacked a scientific apparatus in the true sense of the phrase, failed to obey unitary organizational criteria, being limited, most of the times, to mere inventories of words, where the Romanian term was assigned its correspondent in other languages. Motivated by practical reasons, the first multilingual dictionaries were aimed at the clerics their purpose being to ensure the translators’ fidelity towards the original religious texts, regarded as sacred.

Keywords: Romanian lexicography, multilingual dictionary, terminology, language

Procedia PDF Downloads 271
41 A Simple Adaptive Atomic Decomposition Voice Activity Detector Implemented by Matching Pursuit

Authors: Thomas Bryan, Veton Kepuska, Ivica Kostanic

Abstract:

A simple adaptive voice activity detector (VAD) is implemented using Gabor and gammatone atomic decomposition of speech for high Gaussian noise environments. Matching pursuit is used for atomic decomposition, and is shown to achieve optimal speech detection capability at high data compression rates for low signal to noise ratios. The most active dictionary elements found by matching pursuit are used for the signal reconstruction so that the algorithm adapts to the individual speakers dominant time-frequency characteristics. Speech has a high peak to average ratio enabling matching pursuit greedy heuristic of highest inner products to isolate high energy speech components in high noise environments. Gabor and gammatone atoms are both investigated with identical logarithmically spaced center frequencies, and similar bandwidths. The algorithm performs equally well for both Gabor and gammatone atoms with no significant statistical differences. The algorithm achieves 70% accuracy at a 0 dB SNR, 90% accuracy at a 5 dB SNR and 98% accuracy at a 20dB SNR using 30dB SNR as a reference for voice activity.

Keywords: atomic decomposition, gabor, gammatone, matching pursuit, voice activity detection

Procedia PDF Downloads 271
40 Possibilities and Challenges of Using Machine Translation in Foreign Language Education

Authors: Miho Yamashita

Abstract:

In recent years, there have been attempts to introduce Machine Translation (MT) into foreign language teaching, especially in writing instructions. This is because the performance of neural machine translation has improved dramatically since 2016, and some university instructors started to introduce MT translations to their students as a "good model" to learn from. However, MT is still not perfect, and there are many incorrect translations. In order to translate the intended text into a foreign language, it is necessary to edit the original manuscript written in the native language (pre-edit) and revise the translated foreign language text (post-edit). The latter is considered especially difficult for users without a high proficiency level of foreign language. Therefore, the author allowed her students to use MT in her writing class in one of the private universities in Japan and investigated 1) how groups of students with different English proficiency levels revised MT translations when translating Japanese manuscripts into English and 2) whether the post-edit process differed when the students revised alone or in pairs. The results showed that in 1), certain non-post-edited grammatical errors were found regardless of their proficiency levels, indicating the need for teacher intervention, and in 2), more appropriate corrections were found in pairs, and their frequent use of a dictionary was also observed. In this presentation, the author will discuss how MT writing instruction can be integrated effectively in an aim to achieve multimodal foreign language education.

Keywords: machine translation, writing instruction, pre-edit, post-edit

Procedia PDF Downloads 41
39 Antimicrobial and Antioxidant Activities of Actinobacteria Isolated from the Pollen of Pinus sylvestris Grown on the Lake Baikal Shore

Authors: Denis V. Axenov-Gribanov, Irina V. Voytsekhovskaya, Evgenii S. Protasov, Maxim A. Timofeyev

Abstract:

Isolated ecosystems existing under specific environmental conditions have been shown to be promising sources of new strains of actinobacteria. The taiga forest of Baikal Siberia has not been well studied, and its actinobacterial population remains uncharacterized. The proximity between the huge water mass of Lake Baikal and high mountain ranges influences the structure and diversity of the plant world in Siberia. Here, we report the isolation of eighteen actinobacterial strains from male cones of Pinus sylvestris trees growing on the shore of the ancient Lake Baikal in Siberia. The actinobacterial strains were isolated on solid nutrient MS media and Czapek agar supplemented with cycloheximide and phosphomycin. Identification of actinobacteria was carried out by 16S rRNA gene sequencing and further analysis of the evolutionary history. Four different liquid and solid media (NL19, DNPM, SG and ISP) were tested for metabolite production. The metabolite extracts produced by the isolated strains were tested for antibacterial and antifungal activities. Also, antiradical activity of crude extracts was carried out. Strain Streptomyces sp. IB 2014 I 74-3 that active against Gram-negative bacteria was selected for dereplication analysis with using the high-yield liquid chromatography with mass-spectrometry. Mass detection was performed in both positive and negative modes, with the detection range set to 160–2500 m/z. Data were collected and analyzed using Bruker Compass Data Analysis software, version 4.1. Dereplication was performed using the Dictionary of Natural Products (DNP) database version 6.1 with the following search parameters: accurate molecular mass, absorption spectra and source of compound isolation. Thus, in addition to more common representative strains of Streptomyces, several species belonging to the genera Rhodococcus, Amycolatopsis, and Micromonospora were isolated. Several of the selected strains were deposited in the Russian Collection of Agricultural Microorganisms (RCAM), St. Petersburg, Russia. All isolated strains exhibited antibacterial and antifungal activities. We identified several strains that inhibited the growth of the pathogen Candida albicans but did not hinder the growth of Saccharomyces cerevisiae. Several isolates were active against Gram-positive and Gram-negative bacteria. Moreover, extracts of several strains demonstrated high antioxidant activity. The high proportion of biologically active strains producing antibacterial and specific antifungal compounds may reflect their role in protecting pollen against phytopathogens. Dereplication of the secondary metabolites of the strain Streptomyces sp. IB 2014 I 74-3 was resulted in the fact that a total of 59 major compounds were detected in the culture liquid extract of strain cultivated in ISP medium. Eight compounds were preliminarily identified based on characteristics described in the Dictionary of Natural Products database, using the search parameters Streptomyces sp. IB 2014 I 74-3 was found to produce saframycin A, Y3 and S; 2-amino-3-oxo-3H-phenoxazine-1,8-dicarboxylic acid; galtamycinone; platencin A4-13R and A4-4S; ganefromycin d1; the antibiotic SS 8201B; and streptothricin D, 40-decarbamoyl, 60-carbamoyl. Moreover, forty-nine of the 59 compounds detected in the extract examined in the present study did not result in any positive hits when searching within the DNP database and could not be identified based on available mass-spec data. Thus, these compounds might represent new findings.

Keywords: actinobacteria, Baikal Lake, biodiversity, male cones, Pinus sylvestris

Procedia PDF Downloads 207
38 Information Disclosure And Financial Sentiment Index Using a Machine Learning Approach

Authors: Alev Atak

Abstract:

In this paper, we aim to create a financial sentiment index by investigating the company’s voluntary information disclosures. We retrieve structured content from BIST 100 companies’ financial reports for the period 1998-2018 and extract relevant financial information for sentiment analysis through Natural Language Processing. We measure strategy-related disclosures and their cross-sectional variation and classify report content into generic sections using synonym lists divided into four main categories according to their liquidity risk profile, risk positions, intra-annual information, and exposure to risk. We use Word Error Rate and Cosin Similarity for comparing and measuring text similarity and derivation in sets of texts. In addition to performing text extraction, we will provide a range of text analysis options, such as the readability metrics, word counts using pre-determined lists (e.g., forward-looking, uncertainty, tone, etc.), and comparison with reference corpus (word, parts of speech and semantic level). Therefore, we create an adequate analytical tool and a financial dictionary to depict the importance of granular financial disclosure for investors to identify correctly the risk-taking behavior and hence make the aggregated effects traceable.

Keywords: financial sentiment, machine learning, information disclosure, risk

Procedia PDF Downloads 71
37 Enhancing Word Meaning Retrieval Using FastText and Natural Language Processing Techniques

Authors: Sankalp Devanand, Prateek Agasimani, Shamith V. S., Rohith Neeraje

Abstract:

Machine translation has witnessed significant advancements in recent years, but the translation of languages with distinct linguistic characteristics, such as English and Sanskrit, remains a challenging task. This research presents the development of a dedicated English-to-Sanskrit machine translation model, aiming to bridge the linguistic and cultural gap between these two languages. Using a variety of natural language processing (NLP) approaches, including FastText embeddings, this research proposes a thorough method to improve word meaning retrieval. Data preparation, part-of-speech tagging, dictionary searches, and transliteration are all included in the methodology. The study also addresses the implementation of an interpreter pattern and uses a word similarity task to assess the quality of word embeddings. The experimental outcomes show how the suggested approach may be used to enhance word meaning retrieval tasks with greater efficacy, accuracy, and adaptability. Evaluation of the model's performance is conducted through rigorous testing, comparing its output against existing machine translation systems. The assessment includes quantitative metrics such as BLEU scores, METEOR scores, Jaccard Similarity, etc.

Keywords: machine translation, English to Sanskrit, natural language processing, word meaning retrieval, fastText embeddings

Procedia PDF Downloads 13
36 A Corpus-Linguistic Analysis of Online Iranian News Coverage on Syrian Revolution

Authors: Amaal Ali Al-Gamde

Abstract:

The Syrian revolution is a major issue in the Middle East, which draws in world powers and receives a great focus in international mass media since 2011. The heavy global reliance on cyber news and digital sources plays a key role in conveying a sense of bias to a wide range of online readers. Thus, based on the assumption that media discourse possesses ideological implications, this study investigates the representation of Syrian revolution in online media. The paper explores the discursive constructions of anti and pro-government powers in Syrian revolution in 1000,000-word corpus of Fars online reports (an Iranian news agency), issued between 2013 and 2015. Taking a corpus assisted discourse analysis approach, the analysis investigates three types of lexicosemantic relations, the semantic macrostructures within which the two social actors are framed, the lexical collocations characterizing the news discourse and the discourse prosodies they tell about the two sides of the conflict. The study utilizes computer-based approaches, sketch engine and AntConc software to minimize the bias of the subjective analysis. The analysis moves from the insights of lexical frequencies and keyness scores to examine themes and the collocational patterns. The findings reveal the Fars agency’s ideological mode of representations in reporting events of Syrian revolution in two ways. The first is by stereotyping the opposition groups under the umbrella of terrorism, using words such as (law breakers, foreign-backed groups, militant groups, terrorists) to legitimize the atrocities of security forces against protesters and enhance horror among civilians. The second is through emphasizing the power of the government and depicting it as the defender of the Arab land by foregrounding the discourse of international conspiracy against Syria. The paper concludes discussing the potential importance of triangulating corpus linguistic tools with critical discourse analysis to elucidate more about discourses and reality.

Keywords: discourse prosody, ideology, keyness, semantic macrostructure

Procedia PDF Downloads 110
35 Collocation Errors in English as Second Language (ESL) Essay Writing

Authors: Fatima Muhammad Shitu

Abstract:

In language learning, Second language learners like their native speaker counter parts, commit errors in their attempt to achieve competence in the target language. The realm of Collocation has to do with meaning relation between lexical items. In all human language, there is a kind of ‘natural order’ in which words are arranged or relate to one another in sentences so much so that when a word occurs in a given context, the related or naturally co -occurring word will automatically come to the mind. It becomes an error, therefore, if students inappropriately pair or arrange such ‘naturally’ co – occurring lexical items in a text. It has been observed that most of the second language learners in this research group commit collocational errors. A study of this kind is very significant as it gives insight into the kinds of errors committed by learners. This will help the language teacher to be able to identify the sources and causes of such errors as well as correct them thereby guiding, helping and leading the learners towards achieving some level of competence in the language. The aim of the study is to understand the nature of these errors as stumbling blocks to effective essay writing. The objective of the study is to identify the errors, analyse their structural compositions so as to determine whether there are similarities between students in this regard and to find out whether there are patterns to these kinds of errors which will enable the researcher to understand their sources and causes. As a descriptive research, the researcher samples some nine hundred essays collected from three hundred undergraduate learners of English as a second language in the Federal College of Education, Kano, North- West Nigeria, i.e. three essays per each student. The essays which were given on three different lecture times were of similar thematic preoccupations (i.e. same topics) and length (i.e. same number of words). The essays were written during the lecture hour at three different lecture occasions. The errors were identified in a systematic manner whereby errors so identified were recorded only once even if they occur severally in students’ essays. The data was collated using percentages in which the identified number of occurrences were converted accordingly in percentages. The findings from the study indicates that there are similarities as well as regular and repeated errors which provided a pattern. Based on the pattern identified, the conclusion is that students’ collocational errors are attributable to poor teaching and learning which resulted in wrong generalisation of rules.

Keywords: collocations, errors, second language learning, ESL students

Procedia PDF Downloads 310
34 Using Audit Tools to Maintain Data Quality for ACC/NCDR PCI Registry Abstraction

Authors: Vikrum Malhotra, Manpreet Kaur, Ayesha Ghotto

Abstract:

Background: Cardiac registries such as ACC Percutaneous Coronary Intervention Registry require high quality data to be abstracted, including data elements such as nuclear cardiology, diagnostic coronary angiography, and PCI. Introduction: The audit tool created is used by data abstractors to provide data audits and assess the accuracy and inter-rater reliability of abstraction performed by the abstractors for a health system. This audit tool solution has been developed across 13 registries, including ACC/NCDR registries, PCI, STS, Get with the Guidelines. Methodology: The data audit tool was used to audit internal registry abstraction for all data elements, including stress test performed, type of stress test, data of stress test, results of stress test, risk/extent of ischemia, diagnostic catheterization detail, and PCI data elements for ACC/NCDR PCI registries. This is being used across 20 hospital systems internally and providing abstraction and audit services for them. Results: The data audit tool had inter-rater reliability and accuracy greater than 95% data accuracy and IRR score for the PCI registry in 50 PCI registry cases in 2021. Conclusion: The tool is being used internally for surgical societies and across hospital systems. The audit tool enables the abstractor to be assessed by an external abstractor and includes all of the data dictionary fields for each registry.

Keywords: abstraction, cardiac registry, cardiovascular registry, registry, data

Procedia PDF Downloads 82
33 An Investigation into Problems Confronting Pre-Service Teachers of French in South-West Nigeria

Authors: Modupe Beatrice Adeyinka

Abstract:

French, as a foreign language in Nigeria, is pronounced to be the second official language and a compulsory subject in the primary school level; hence, colleges of education across the nation are saddled with the responsibility of training teachers for the subject. However, it has been observed that this policy has not been fully implemented, for French teachers in training, do face many challenges, of which translation is chief. In a bid to investigate the major cause of the perceived translation problem, this study examined French translation problems of pre-service teachers in selected colleges of education in the southwest, Nigeria. This study adopted a descriptive survey research design. The simple random sampling technique was used to select four colleges of education in the southwest, where 100 French students were randomly selected by selecting 25 from each school. The pre-service teachers’ French translation problems’ questionnaire (PTFTPQ) was used as an instrument while four research questions were answered and three null hypotheses were tested. Among others, the findings revealed that students do have problems with false friends, though mainly with its interpretation when attempting French-English translation and vice versa; majority of the students make use of French dictionary as a way out and found the material very useful for their understanding of false friends. Teachers were, therefore, urged to attend in-service training where they would be exposed to new and emerging strategies, approaches and methodologies of French language teaching that will make students overcome the challenge of translation in learning French.

Keywords: false friends, French language, pre-service teachers, source language, target language, translation

Procedia PDF Downloads 132
32 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance

Procedia PDF Downloads 125
31 Sentiment Analysis of Chinese Microblog Comments: Comparison between Support Vector Machine and Long Short-Term Memory

Authors: Xu Jiaqiao

Abstract:

Text sentiment analysis is an important branch of natural language processing. This technology is widely used in public opinion analysis and web surfing recommendations. At present, the mainstream sentiment analysis methods include three parts: sentiment analysis based on a sentiment dictionary, based on traditional machine learning, and based on deep learning. This paper mainly analyzes and compares the advantages and disadvantages of the SVM method of traditional machine learning and the Long Short-term Memory (LSTM) method of deep learning in the field of Chinese sentiment analysis, using Chinese comments on Sina Microblog as the data set. Firstly, this paper classifies and adds labels to the original comment dataset obtained by the web crawler, and then uses Jieba word segmentation to classify the original dataset and remove stop words. After that, this paper extracts text feature vectors and builds document word vectors to facilitate the training of the model. Finally, SVM and LSTM models are trained respectively. After accuracy calculation, it can be obtained that the accuracy of the LSTM model is 85.80%, while the accuracy of SVM is 91.07%. But at the same time, LSTM operation only needs 2.57 seconds, SVM model needs 6.06 seconds. Therefore, this paper concludes that: compared with the SVM model, the LSTM model is worse in accuracy but faster in processing speed.

Keywords: sentiment analysis, support vector machine, long short-term memory, Chinese microblog comments

Procedia PDF Downloads 62
30 A Cost Effective Approach to Develop Mid-Size Enterprise Software Adopted the Waterfall Model

Authors: Mohammad Nehal Hasnine, Md Kamrul Hasan Chayon, Md Mobasswer Rahman

Abstract:

Organizational tendencies towards computer-based information processing have been observed noticeably in the third-world countries. Many enterprises are taking major initiatives towards computerized working environment because of massive benefits of computer-based information processing. However, designing and developing information resource management software for small and mid-size enterprises under budget costs and strict deadline is always challenging for software engineers. Therefore, we introduced an approach to design mid-size enterprise software by using the Waterfall model, which is one of the SDLC (Software Development Life Cycles), in a cost effective way. To fulfill research objectives, in this study, we developed mid-sized enterprise software named “BSK Management System” that assists enterprise software clients with information resource management and perform complex organizational tasks. Waterfall model phases have been applied to ensure that all functions, user requirements, strategic goals, and objectives are met. In addition, Rich Picture, Structured English, and Data Dictionary have been implemented and investigated properly in engineering manner. Furthermore, an assessment survey with 20 participants has been conducted to investigate the usability and performance of the proposed software. The survey results indicated that our system featured simple interfaces, easy operation and maintenance, quick processing, and reliable and accurate transactions.

Keywords: end-user application development, enterprise software design, information resource management, usability

Procedia PDF Downloads 413
29 Developing an Exhaustive and Objective Definition of Social Enterprise through Computer Aided Text Analysis

Authors: Deepika Verma, Runa Sarkar

Abstract:

One of the prominent debates in the social entrepreneurship literature has been to establish whether entrepreneurial work for social well-being by for-profit organizations can be classified as social entrepreneurship or not. Of late, the scholarship has reached a consensus. It concludes that there seems little sense in confining social entrepreneurship to just non-profit organizations. Boosted by this research, increasingly a lot of businesses engaged in filling the social infrastructure gaps in developing countries are calling themselves social enterprise. These organizations are diverse in their ownership, size, objectives, operations and business models. The lack of a comprehensive definition of social enterprise leads to three issues. Firstly, researchers may face difficulty in creating a database for social enterprises because the choice of an entity as a social enterprise becomes subjective or based on some pre-defined parameters by the researcher which is not replicable. Secondly, practitioners who use ‘social enterprise’ in their vision/mission statement(s) may find it difficult to adjust their business models accordingly especially during the times when they face the dilemma of choosing social well-being over business viability. Thirdly, social enterprise and social entrepreneurship attract a lot of donor funding and venture capital. In the paucity of a comprehensive definitional guide, the donors or investors may find assigning grants and investments difficult. It becomes necessary to develop an exhaustive and objective definition of social enterprise and examine whether the understanding of the academicians and practitioners about social enterprise match. This paper develops a dictionary of words often associated with social enterprise or (and) social entrepreneurship. It further compares two lexicographic definitions of social enterprise imputed from the abstracts of academic journal papers and trade publications extracted from the EBSCO database using the ‘tm’ package in R software.

Keywords: EBSCO database, lexicographic definition, social enterprise, text mining

Procedia PDF Downloads 364
28 Omni-Modeler: Dynamic Learning for Pedestrian Redetection

Authors: Michael Karnes, Alper Yilmaz

Abstract:

This paper presents the application of the omni-modeler towards pedestrian redetection. The pedestrian redetection task creates several challenges when applying deep neural networks (DNN) due to the variety of pedestrian appearance with camera position, the variety of environmental conditions, and the specificity required to recognize one pedestrian from another. DNNs require significant training sets and are not easily adapted for changes in class appearances or changes in the set of classes held in its knowledge domain. Pedestrian redetection requires an algorithm that can actively manage its knowledge domain as individuals move in and out of the scene, as well as learn individual appearances from a few frames of a video. The Omni-Modeler is a dynamically learning few-shot visual recognition algorithm developed for tasks with limited training data availability. The Omni-Modeler adapts the knowledge domain of pre-trained deep neural networks to novel concepts with a calculated localized language encoder. The Omni-Modeler knowledge domain is generated by creating a dynamic dictionary of concept definitions, which are directly updatable as new information becomes available. Query images are identified through nearest neighbor comparison to the learned object definitions. The study presented in this paper evaluates its performance in re-identifying individuals as they move through a scene in both single-camera and multi-camera tracking applications. The results demonstrate that the Omni-Modeler shows potential for across-camera view pedestrian redetection and is highly effective for single-camera redetection with a 93% accuracy across 30 individuals using 64 example images for each individual.

Keywords: dynamic learning, few-shot learning, pedestrian redetection, visual recognition

Procedia PDF Downloads 48
27 Environmental Degradation and Globalization with Special Reference to Developing Economics

Authors: Indira Sinha

Abstract:

According to the Oxford Advanced Learner's English Dictionary of Current English, environment is the complex of physical, chemical and biotic factors that act upon an organism or an ecological community and ultimately determines its form and survival. It is defined as conditions and circumstances which are affecting people's lives. The meaning of environmental degradation is the degradation of the environment through depletion of resources such as air, water and soil and the destruction of ecosystems and extinction of wildlife. Globalization is a significant feature of recent world history. The aim of this phenomenon is to integrate societies, economies and cultures through a common link of trading policies, technology and communication. Undoubtedly it has opened up the world economy at a very high speed but at the same time it has an adverse impact on the environment. The purpose of the present study is to investigate the impact of globalization on the environmental conditions. An overview of what the forces of globalization have in store for the environment with constructing large number of industries and destroying large forests lands will be given in this paper. The forces of globalization have created many serious environmental problems like high temperature, extinction of many species of plant and animal and outlet of poisonous chemicals from industries. The revelation of this study is that in case of developing economics these problems are more critical. In developing countries like India many factories are built with less environmental regulations, while developed economies maintain positive environmental practices. The present study is a micro level study which aims to employ a combination of theoretical, descriptive, empirical and analytical approach in addition to the time tested case method.

Keywords: globalization, trade policies, environmental degradation, developing economies, large industries

Procedia PDF Downloads 215
26 Designing a Corpus Database to Enhance the Learning of Old English Language

Authors: Raquel Mateo Mendaza, Carmen Novo Urraca

Abstract:

The current paper presents the elaboration of a corpus database that aligns two different corpora in order to simplify the search of information both for researchers and students of Old English. This database comprises the information contained in two main reference corpora, namely the Dictionary of Old English Corpus (DOEC), compiled at the University of Toronto, and the York-Toronto-Helsinki Parsed Corpus of Old English (YCOE). The first one provides information on all surviving texts written in the Old English language. The latter offers the syntactical and morphological annotation of several texts included in the DOEC. Although both corpora are closely related, as the YCOE includes the DOE source text identifier, the main problem detected is that there is not an alignment of texts that allows for the search of whole fragments to be further analysed in terms of morphology and syntax. The database proposed in this paper gathers all this information and presents it in a simple, more accessible, visual, and educational way. The alignment of fragments has been done in an automatized way. However, some problems have emerged during the creating process particularly related to the lack of correspondence in the division of fragments. For this reason, it has been necessary to revise the whole entries manually to obtain a truthful high-quality product and to carefully indicate the gaps encountered in these corpora. All in all, this database contains more than 60,000 entries corresponding with the DOE fragments annotated by the YCOE. The main strength of the resulting product is its research and teaching implications in the study of Old English. The use of this database will help researchers and students in the study of different aspects of the language, such as inflectional morphology, syntactic behaviour of given words, or translation studies, among others. By means of the search of words or fragments, the annotated information on morphology and syntax will be automatically displayed, automatizing, and speeding up the search of data.

Keywords: alignment, corpus database, morphosyntactic analysis, Old English

Procedia PDF Downloads 107
25 A Methodology to Integrate Data in the Company Based on the Semantic Standard in the Context of Industry 4.0

Authors: Chang Qin, Daham Mustafa, Abderrahmane Khiat, Pierre Bienert, Paulo Zanini

Abstract:

Nowadays, companies are facing lots of challenges in the process of digital transformation, which can be a complex and costly undertaking. Digital transformation involves the collection and analysis of large amounts of data, which can create challenges around data management and governance. Furthermore, it is also challenged to integrate data from multiple systems and technologies. Although with these pains, companies are still pursuing digitalization because by embracing advanced technologies, companies can improve efficiency, quality, decision-making, and customer experience while also creating different business models and revenue streams. In this paper, the issue that data is stored in data silos with different schema and structures is focused. The conventional approaches to addressing this issue involve utilizing data warehousing, data integration tools, data standardization, and business intelligence tools. However, these approaches primarily focus on the grammar and structure of the data and neglect the importance of semantic modeling and semantic standardization, which are essential for achieving data interoperability. In this session, the challenge of data silos in Industry 4.0 is addressed by developing a semantic modeling approach compliant with Asset Administration Shell (AAS) models as an efficient standard for communication in Industry 4.0. The paper highlights how our approach can facilitate the data mapping process and semantic lifting according to existing industry standards such as ECLASS and other industrial dictionaries. It also incorporates the Asset Administration Shell technology to model and map the company’s data and utilize a knowledge graph for data storage and exploration.

Keywords: data interoperability in industry 4.0, digital integration, industrial dictionary, semantic modeling

Procedia PDF Downloads 70
24 Expert System: Debugging Using MD5 Process Firewall

Authors: C. U. Om Kumar, S. Kishore, A. Geetha

Abstract:

An Operating system (OS) is software that manages computer hardware and software resources by providing services to computer programs. One of the important user expectations of the operating system is to provide the practice of defending information from unauthorized access, disclosure, modification, inspection, recording or destruction. Operating system is always vulnerable to the attacks of malwares such as computer virus, worm, Trojan horse, backdoors, ransomware, spyware, adware, scareware and more. And so the anti-virus software were created for ensuring security against the prominent computer viruses by applying a dictionary based approach. The anti-virus programs are not always guaranteed to provide security against the new viruses proliferating every day. To clarify this issue and to secure the computer system, our proposed expert system concentrates on authorizing the processes as wanted and unwanted by the administrator for execution. The Expert system maintains a database which consists of hash code of the processes which are to be allowed. These hash codes are generated using MD5 message-digest algorithm which is a widely used cryptographic hash function. The administrator approves the wanted processes that are to be executed in the client in a Local Area Network by implementing Client-Server architecture and only the processes that match with the processes in the database table will be executed by which many malicious processes are restricted from infecting the operating system. The add-on advantage of this proposed Expert system is that it limits CPU usage and minimizes resource utilization. Thus data and information security is ensured by our system along with increased performance of the operating system.

Keywords: virus, worm, Trojan horse, back doors, Ransomware, Spyware, Adware, Scareware, sticky software, process table, MD5, CPU usage and resource utilization

Procedia PDF Downloads 391
23 Compilation and Statistical Analysis of an Arabic-English Legal Corpus in Sketch Engine

Authors: C. Brierley, H. El-Farahaty, A. Farhan

Abstract:

The Leeds Parallel Corpus of Arabic-English Constitutions is a parallel corpus for the Arabic legal domain. Analysis of legal language via Corpus Linguistics techniques is an important development. In legal proceedings, a corpus-based approach to disambiguating meaning is set to replace the dictionary as an interpretative tool, and legal scholarship in the States is now attuned to the potential for Text Analytics over vast quantities of text-based legal material, following the business and medical industries. This trend is reflected in Europe: the interdisciplinary research group in Computer Assisted Legal Linguistics mines big data collections of legal and non-legal texts to analyse: legal interpretations; legal discourse; the comprehensibility of legal texts; conflict resolution; and linguistic human rights. This paper focuses on ‘dignity’ as an important aspect of the overarching concept of human rights in current constitutions across the Arab world. We have compiled a parallel, Arabic-English raw text corpus (169,861 Arabic words and 205,893 English words) from reputable websites such as the World Intellectual Property Organisation and CONSTITUTE, and uploaded and queried our corpus in Sketch Engine. Our most challenging task was sentence-level alignment of Arabic-English data. This entailed manual intervention to ensure correspondence on a one-to-many basis since Arabic sentences differ from English in length and punctuation. We have searched for morphological variants of ‘dignity’ (رامة ك, karāma) in the Arabic data and inspected their English translation equivalents. The term occurs most frequently in the Sudanese constitution (10 instances), and not at all in the constitution of Palestine. Its most frequent collocate, determined via the logDice statistic in Sketch Engine, is ‘human’ as in ‘human dignity’.

Keywords: Arabic constitution, corpus-based legal linguistics, human rights, parallel Arabic-English legal corpora

Procedia PDF Downloads 153
22 Embracing Diverse Learners: A Way Towards Effective Learning

Authors: Mona Kamel Hassan

Abstract:

Teaching a class of diverse learners poses a great challenge not only for foreign and second language teachers, but also for teachers in different disciplines as well as for curriculum designers. Thus, to contribute to previous research tackling language diversity, the current paper shares the experience of teaching a reading, writing and vocabulary building course to diverse Arabic as a Foreign Language learners in their advanced language proficiency level. Diversity is represented in students’ motivation, their prior knowledge, their various needs and interests, their level of anxiety, and their different learning styles and skills. While teaching this course the researcher adopted the universal design for learning (UDL) framework, which is a means to meet the various needs of diverse learners. UDL stresses the importance of enabling the entire diverse students to gain skills, knowledge, and enthusiasm to learn through the employment of teaching methods that respond to students' individual differences. Accordingly, the educational curriculum developed for this course and the teaching methods employed is modified. First, the researcher made the language curriculum vivid and attractive to inspire students' learning and to keep them engaged in their learning process. The researcher encouraged the entire students, from the first day, to suggest topics of their interest; political, social, cultural, etc. The authentic Arabic texts chosen are those that best meet students’ needs, interests, lives, and sociolinguistic issues, together with the linguistic and cultural components. In class and under the researcher’s guidance, students dig into these topics to find solutions for the tackled issues while working with their peers. Second, to gain equal opportunities to demonstrate learning, role-playing was encouraged to give students the opportunity to perform different linguistic tasks, to reflect and share their diverse interests and cultural backgrounds with their peers. Third, to bring the UDL into the classroom, students were encouraged to work on interactive, collaborative activities through technology to improve their reading and writing skills and reinforce their mastery of the accumulated vocabulary, idiomatic expressions, and collocations. These interactive, collaborative activities help to facilitate student-student communication and student-teacher communication and to increase comfort in this class of diverse learners. Detailed samples of the educational curriculum and interactive, collaborative activities developed, accompanied by methods of teaching employed to teach these diverse learners, are presented for illustration. Results revealed that students are responsive to the educational materials which are developed for this course. Therefore, they engaged in the learning process and classroom activities and discussions effectively. They also appreciated their instructor’s willingness to differentiate the teaching methods to suit students of diverse background knowledge, learning styles, level of anxiety, etc. Finally, the researcher believes that sharing this experience in teaching diverse learners will help both language teachers and teachers in other disciplines to develop a better understanding to meet their students' diverse needs. Results will also pave the way for curriculum designers to develop educational material that meets the needs of diverse learners.

Keywords: teaching, language, diverse, learners

Procedia PDF Downloads 76
21 Identifying Necessary Words for Understanding Academic Articles in English as a Second or a Foreign Language

Authors: Stephen Wagman

Abstract:

This paper identifies three common structures in English sentences that are important for understanding academic texts, regardless of the characteristics or background of the readers or whether they are reading English as a second or a foreign language. Adapting a model from the Humanities, the explication of texts used in literary studies, the paper analyses sample sentences to reveal structures that enable the reader not only to decide which words are necessary for understanding the main ideas but to make the decision without knowing the meaning of the words. By their very syntax noun structures point to the key word for understanding them. As a rule, the key noun is followed by easily identifiable prepositions, relative pronouns, or verbs and preceded by single adjectives. With few exceptions, the modifiers are unnecessary for understanding the idea of the sentence. In addition, sentences are often structured by lists in which the items frequently consist of parallel groups of words. The principle of a list is that all the items are similar in meaning and it is not necessary to understand all of the items to understand the point of the list. This principle is especially important when the items are long or there is more than one list in the same sentence. The similarity in meaning of these items enables readers to reduce sentences that are hard to grasp to an understandable core without excessive use of a dictionary. Finally, the idea of subordination and the identification of the subordinate parts of sentences through connecting words makes it possible for readers to focus on main ideas without having to sift through the less important and more numerous secondary structures. Sometimes a main idea requires a subordinate one to complete its meaning, but usually, subordinate ideas are unnecessary for understanding the main point of the sentence and its part in the development of the argument from sentence to sentence. Moreover, the connecting words themselves indicate the functions of the subordinate structures. These most frequently show similarity and difference or reasons and results. Recognition of all of these structures can not only enable students to read more efficiently but to focus their attention on the development of the argument and this rather than a multitude of unknown vocabulary items, the repetition in lists, or the subordination in sentences are the one necessary element for comprehension of academic articles.

Keywords: development of the argument, lists, noun structures, subordination

Procedia PDF Downloads 230
20 A Corpus-based Study of Adjuncts in Colombian English as a Second Language (ESL) Argumentative Essays

Authors: E. Velasco

Abstract:

Meeting high standards of writing in a Second Language (L2) is extremely important for many students who wish to undertake studies at universities in both English and non-English speaking countries. University lecturers in English speaking countries continue to express dissatisfaction with the apparent poor quality of essay writing skills displayed by English as a Second Language (ESL) students, whose essays are often criticised for their lack of cohesion and coherence. These critiques have extended to contexts such as Colombia, where many ESL students are criticised for their inability to write high-quality academic texts in L2-English, particularly at the tertiary level. If Colombian ESL students are expected to meet high standards of writing when studying locally and abroad, it makes sense to carry out specific research that can perhaps lead to recommendations to support their quest for improving argumentative strategies. Employing Corpus Linguistics methods within a Learner Corpus Research framework, and a combination of Log-Likelihood and Bayes Factor measures, this paper investigated argumentative essays written by Colombian ESL students. The study specifically aimed to analyse conjunctive adjuncts in argumentative essays to find out how Colombian ESL students connect their ideas in discourse. Results suggest that a) Colombian ESL learners need explicit instruction on specific areas of conjunctive adjuncts to counteract overuse, underuse and misuse; b) underuse of endophoric and evidential adjuncts highlights gaps between IELTS-like essays and good quality tertiary-level essays and published papers, and these gaps are linked to prior knowledge brought into writing task, rhetorical functions in writing, and research processes before writing takes place; c) both Colombian ESL learners and L1-English writers (in a reference corpus) overuse some adjuncts and underuse endophoric and evidential adjuncts, when compared to skilled L1-English and L2-English writers, so differences in frequencies of adjuncts has little to do with the writers’ L1, and differences are rather linked to types of essays writers produce (e.g. ESL vs. university essays). Ender Velasco: The pedagogical recommendations deriving from the study are that: a) Colombian ESL learners need to be shown that overuse is not the only way of giving cohesion to argumentative essays and there are other alternatives to cohesion (e.g., implicit adjuncts, lexical chains and collocations); b) syllabi and classroom input need to raise awareness of gaps in writing skills between IELTS-like and tertiary-level argumentative essays, and of how endophoric and evidential adjuncts are used to refer to anaphoric and cataphoric sections of essays, and to other people’s work or ideas; c) syllabi and classroom input need to include essay-writing tasks based on previous research/reading which learners need to incorporate into their arguments, and tasks that raise awareness of referencing systems (e.g., APA); d) classroom input needs to include explicit instruction on use of punctuation, functions and/or syntax with specific conjunctive adjuncts such as for example, for that reason, although, despite and nevertheless.

Keywords: argumentative essays, colombian english as a second language (esl) learners, conjunctive adjuncts, corpus linguistics

Procedia PDF Downloads 52