Search results for: Lexis

6 Identification of Most Frequently Occurring Lexis in Body-enhancement Medicinal Unsolicited Bulk e-mails

Authors: Jatinderkumar R. Saini, Apurva A. Desai

Abstract:

e-mail has become an important means of electronic communication but the viability of its usage is marred by Unsolicited Bulk e-mail (UBE) messages. UBE consists of many types like pornographic, virus infected and 'cry-for-help' messages as well as fake and fraudulent offers for jobs, winnings and medicines. UBE poses technical and socio-economic challenges to usage of e-mails. To meet this challenge and combat this menace, we need to understand UBE. Towards this end, the current paper presents a content-based textual analysis of more than 2700 body enhancement medicinal UBE. Technically, this is an application of Text Parsing and Tokenization for an un-structured textual document and we approach it using Bag Of Words (BOW) and Vector Space Document Model techniques. We have attempted to identify the most frequently occurring lexis in the UBE documents that advertise various products for body enhancement. The analysis of such top 100 lexis is also presented. We exhibit the relationship between occurrence of a word from the identified lexis-set in the given UBE and the probability that the given UBE will be the one advertising for fake medicinal product. To the best of our knowledge and survey of related literature, this is the first formal attempt for identification of most frequently occurring lexis in such UBE by its textual analysis. Finally, this is a sincere attempt to bring about alertness against and mitigate the threat of such luring but fake UBE.

Keywords: Body Enhancement, Lexis, Medicinal, Unsolicited Bulk e-mail (UBE), Vector Space Document Model, Viagra

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3498

5 Identification of Most Frequently Occurring Lexis in Winnings-announcing Unsolicited Bulke-mails

Authors: Jatinderkumar R. Saini, Apurva A. Desai

Abstract:

e-mail has become an important means of electronic communication but the viability of its usage is marred by Unsolicited Bulk e-mail (UBE) messages. UBE consists of many types like pornographic, virus infected and 'cry-for-help' messages as well as fake and fraudulent offers for jobs, winnings and medicines. UBE poses technical and socio-economic challenges to usage of e-mails. To meet this challenge and combat this menace, we need to understand UBE. Towards this end, the current paper presents a content-based textual analysis of nearly 3000 winnings-announcing UBE. Technically, this is an application of Text Parsing and Tokenization for an un-structured textual document and we approach it using Bag Of Words (BOW) and Vector Space Document Model techniques. We have attempted to identify the most frequently occurring lexis in the winnings-announcing UBE documents. The analysis of such top 100 lexis is also presented. We exhibit the relationship between occurrence of a word from the identified lexisset in the given UBE and the probability that the given UBE will be the one announcing fake winnings. To the best of our knowledge and survey of related literature, this is the first formal attempt for identification of most frequently occurring lexis in winningsannouncing UBE by its textual analysis. Finally, this is a sincere attempt to bring about alertness against and mitigate the threat of such luring but fake UBE.

Keywords: Lexis, Unsolicited Bulk e-mail (UBE), Vector SpaceDocument Model, Winnings, Lottery

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1531

4 Functioning of Turkic Elements in Modern Hindi

Authors: B. S. Bokuleva, R. A. Avakova, A. A. Sultangubieva, U. Schamiloglu

Abstract:

It is discussed about modern usage of adopted words and their vocabularies, Turkism usage fields, phonetic, grammatical and lexis-semantic assimilation of the typological-morphological structures of entering to different Hindi languages in comparative typological aspects in this scientific article. The lexis vocabulary is rich, the prevalence area is wide and it has researched the entering process of vocabulary into the great languages of Turkic elements from the speakers- numbers. The research work has worked on the base of Hindi vocabulary.

Keywords: Adopted words, language communications, Turkism, Turkic languages.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2161

3 The Study of Formal and Semantic Errors of Lexis by Persian EFL Learners

Authors: Mohammad J. Rezai, Fereshteh Davarpanah

Abstract:

Producing a text in a language which is not one’s mother tongue can be a demanding task for language learners. Examining lexical errors committed by EFL learners is a challenging area of investigation which can shed light on the process of second language acquisition. Despite the considerable number of investigations into grammatical errors, few studies have tackled formal and semantic errors of lexis committed by EFL learners. The current study aimed at examining Persian learners’ formal and semantic errors of lexis in English. To this end, 60 students at three different proficiency levels were asked to write on 10 different topics in 10 separate sessions. Finally, 600 essays written by Persian EFL learners were collected, acting as the corpus of the study. An error taxonomy comprising formal and semantic errors was selected to analyze the corpus. The formal category covered misselection and misformation errors, while the semantic errors were classified into lexical, collocational and lexicogrammatical categories. Each category was further classified into subcategories depending on the identified errors. The results showed that there were 2583 errors in the corpus of 9600 words, among which, 2030 formal errors and 553 semantic errors were identified. The most frequent errors in the corpus included formal error commitment (78.6%), which were more prevalent at the advanced level (42.4%). The semantic errors (21.4%) were more frequent at the low intermediate level (40.5%). Among formal errors of lexis, the highest number of errors was devoted to misformation errors (98%), while misselection errors constituted 2% of the errors. Additionally, no significant differences were observed among the three semantic error subcategories, namely collocational, lexical choice and lexicogrammatical. The results of the study can shed light on the challenges faced by EFL learners in the second language acquisition process.

Keywords: Collocational errors, lexical errors, Persian EFL learners, semantic errors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1220

2 Peer Corrective Feedback on Written Errors in Computer-Mediated Communication

Authors: S. H. J. Liu

Abstract:

This paper aims to explore the role of peer Corrective Feedback (CF) in improving written productions by English-as-a- foreign-language (EFL) learners who work together via Wikispaces. It attempted to determine the effect of peer CF on form accuracy in English, such as grammar and lexis. Thirty-four EFL learners at the tertiary level were randomly assigned into the experimental (with peer feedback) or the control (without peer feedback) group; each group was subdivided into small groups of two or three. This resulted in six and seven small groups in the experimental and control groups, respectively. In the experimental group, each learner played a role as an assessor (providing feedback to others), as well as an assessee (receiving feedback from others). Each participant was asked to compose his/her written work and revise it based on the feedback. In the control group, on the other hand, learners neither provided nor received feedback but composed and revised their written work on their own. Data collected from learners’ compositions and post-task interviews were analyzed and reported in this study. Following the completeness of three writing tasks, 10 participants were selected and interviewed individually regarding their perception of collaborative learning in the Computer-Mediated Communication (CMC) environment. Language aspects to be analyzed included lexis (e.g., appropriate use of words), verb tenses (e.g., present and past simple), prepositions (e.g., in, on, and between), nouns, and articles (e.g., a/an). Feedback types consisted of CF, affective, suggestive, and didactic. Frequencies of feedback types and the accuracy of the language aspects were calculated. The results first suggested that accurate items were found more in the experimental group than in the control group. Such results entail that those who worked collaboratively outperformed those who worked non-collaboratively on the accuracy of linguistic aspects. Furthermore, the first type of CF (e.g., corrections directly related to linguistic errors) was found to be the most frequently employed type, whereas affective and didactic were the least used by the experimental group. The results further indicated that most participants perceived that peer CF was helpful in improving the language accuracy, and they demonstrated a favorable attitude toward working with others in the CMC environment. Moreover, some participants stated that when they provided feedback to their peers, they tended to pay attention to linguistic errors in their peers’ work but overlook their own errors (e.g., past simple tense) when writing. Finally, L2 or FL teachers or practitioners are encouraged to employ CMC technologies to train their students to give each other feedback in writing to improve the accuracy of the language and to motivate them to attend to the language system.

Keywords: Peer corrective feedback, computer-mediated communication, second or foreign language learning, Wikispaces.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1443

1 Identification of Non-Lexicon Non-Slang Unigrams in Body-enhancement Medicinal UBE

Authors: Jatinderkumar R. Saini, Apurva A. Desai

Abstract:

Email has become a fast and cheap means of online communication. The main threat to email is Unsolicited Bulk Email (UBE), commonly called spam email. The current work aims at identification of unigrams in more than 2700 UBE that advertise body-enhancement drugs. The identification is based on the requirement that the unigram is neither present in dictionary, nor is a slang term. The motives of the paper are many fold. This is an attempt to analyze spamming behaviour and employment of wordmutation technique. On the side-lines of the paper, we have attempted to better understand the spam, the slang and their interplay. The problem has been addressed by employing Tokenization technique and Unigram BOW model. We found that the non-lexicon words constitute nearly 66% of total number of lexis of corpus whereas non-slang words constitute nearly 2.4% of non-lexicon words. Further, non-lexicon non-slang unigrams composed of 2 lexicon words, form more than 71% of the total number of such unigrams. To the best of our knowledge, this is the first attempt to analyze usage of non-lexicon non-slang unigrams in any kind of UBE.

Keywords: Body Enhancement, Lexicon, Medicinal, Slang, Unigram, Unsolicited Bulk e-mail (UBE)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1812