Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 764

Search results for: word embedding

764 A Word-to-Vector Formulation for Word Representation

Authors: Sandra Rizkallah, Amir F. Atiya


This work presents a novel word to vector representation that is based on embedding the words into a sphere, whereby the dot product of the corresponding vectors represents the similarity between any two words. Embedding the vectors into a sphere enabled us to take into consideration the antonymity between words, not only the synonymity, because of the suitability to handle the polarity nature of words. For example, a word and its antonym can be represented as a vector and its negative. Moreover, we have managed to extract an adequate vocabulary. The obtained results show that the proposed approach can capture the essence of the language, and can be generalized to estimate a correct similarity of any new pair of words.

Keywords: natural language processing, word to vector, text similarity, text mining

Procedia PDF Downloads 168
763 Genomic Sequence Representation Learning: An Analysis of K-Mer Vector Embedding Dimensionality

Authors: James Jr. Mashiyane, Risuna Nkolele, Stephanie J. Müller, Gciniwe S. Dlamini, Rebone L. Meraba, Darlington S. Mapiye


When performing language tasks in natural language processing (NLP), the dimensionality of word embeddings is chosen either ad-hoc or is calculated by optimizing the Pairwise Inner Product (PIP) loss. The PIP loss is a metric that measures the dissimilarity between word embeddings, and it is obtained through matrix perturbation theory by utilizing the unitary invariance of word embeddings. Unlike in natural language, in genomics, especially in genome sequence processing, unlike in natural language processing, there is no notion of a “word,” but rather, there are sequence substrings of length k called k-mers. K-mers sizes matter, and they vary depending on the goal of the task at hand. The dimensionality of word embeddings in NLP has been studied using the matrix perturbation theory and the PIP loss. In this paper, the sufficiency and reliability of applying word-embedding algorithms to various genomic sequence datasets are investigated to understand the relationship between the k-mer size and their embedding dimension. This is completed by studying the scaling capability of three embedding algorithms, namely Latent Semantic analysis (LSA), Word2Vec, and Global Vectors (GloVe), with respect to the k-mer size. Utilising the PIP loss as a metric to train embeddings on different datasets, we also show that Word2Vec outperforms LSA and GloVe in accurate computing embeddings as both the k-mer size and vocabulary increase. Finally, the shortcomings of natural language processing embedding algorithms in performing genomic tasks are discussed.

Keywords: word embeddings, k-mer embedding, dimensionality reduction

Procedia PDF Downloads 15
762 Resume Ranking Using Custom Word2vec and Rule-Based Natural Language Processing Techniques

Authors: Subodh Chandra Shakya, Rajendra Sapkota, Aakash Tamang, Shushant Pudasaini, Sujan Adhikari, Sajjan Adhikari


Lots of efforts have been made in order to measure the semantic similarity between the text corpora in the documents. Techniques have been evolved to measure the similarity of two documents. One such state-of-art technique in the field of Natural Language Processing (NLP) is word to vector models, which converts the words into their word-embedding and measures the similarity between the vectors. We found this to be quite useful for the task of resume ranking. So, this research paper is the implementation of the word2vec model along with other Natural Language Processing techniques in order to rank the resumes for the particular job description so as to automate the process of hiring. The research paper proposes the system and the findings that were made during the process of building the system.

Keywords: chunking, document similarity, information extraction, natural language processing, word2vec, word embedding

Procedia PDF Downloads 49
761 Application of Vector Representation for Revealing the Richness of Meaning of Facial Expressions

Authors: Carmel Sofer, Dan Vilenchik, Ron Dotsch, Galia Avidan


Studies investigating emotional facial expressions typically reveal consensus among observes regarding the meaning of basic expressions, whose number ranges between 6 to 15 emotional states. Given this limited number of discrete expressions, how is it that the human vocabulary of emotional states is so rich? The present study argues that perceivers use sequences of these discrete expressions as the basis for a much richer vocabulary of emotional states. Such mechanisms, in which a relatively small number of basic components is expanded to a much larger number of possible combinations of meanings, exist in other human communications modalities, such as spoken language and music. In these modalities, letters and notes, which serve as basic components of spoken language and music respectively, are temporally linked, resulting in the richness of expressions. In the current study, in each trial participants were presented with sequences of two images containing facial expression in different combinations sampled out of the eight static basic expressions (total 64; 8X8). In each trial, using single word participants were required to judge the 'state of mind' portrayed by the person whose face was presented. Utilizing word embedding methods (Global Vectors for Word Representation), employed in the field of Natural Language Processing, and relying on machine learning computational methods, it was found that the perceived meanings of the sequences of facial expressions were a weighted average of the single expressions comprising them, resulting in 22 new emotional states, in addition to the eight, classic basic expressions. An interaction between the first and the second expression in each sequence indicated that every single facial expression modulated the effect of the other facial expression thus leading to a different interpretation ascribed to the sequence as a whole. These findings suggest that the vocabulary of emotional states conveyed by facial expressions is not restricted to the (small) number of discrete facial expressions. Rather, the vocabulary is rich, as it results from combinations of these expressions. In addition, present research suggests that using word embedding in social perception studies, can be a powerful, accurate and efficient tool, to capture explicit and implicit perceptions and intentions. Acknowledgment: The study was supported by a grant from the Ministry of Defense in Israel to GA and CS. CS is also supported by the ABC initiative in Ben-Gurion University of the Negev.

Keywords: Glove, face perception, facial expression perception. , facial expression production, machine learning, word embedding, word2vec

Procedia PDF Downloads 80
760 Secure Text Steganography for Microsoft Word Document

Authors: Khan Farhan Rafat, M. Junaid Hussain


Seamless modification of an entity for the purpose of hiding a message of significance inside its substance in a manner that the embedding remains oblivious to an observer is known as steganography. Together with today's pervasive registering frameworks, steganography has developed into a science that offers an assortment of strategies for stealth correspondence over the globe that must, however, need a critical appraisal from security breach standpoint. Microsoft Word is amongst the preferably used word processing software, which comes as a part of the Microsoft Office suite. With a user-friendly graphical interface, the richness of text editing, and formatting topographies, the documents produced through this software are also most suitable for stealth communication. This research aimed not only to epitomize the fundamental concepts of steganography but also to expound on the utilization of Microsoft Word document as a carrier for furtive message exchange. The exertion is to examine contemporary message hiding schemes from security aspect so as to present the explorative discoveries and suggest enhancements which may serve a wellspring of information to encourage such futuristic research endeavors.

Keywords: hiding information in plain sight, stealth communication, oblivious information exchange, conceal, steganography

Procedia PDF Downloads 169
759 Language Development and Growing Spanning Trees in Children Semantic Network

Authors: Somayeh Sadat Hashemi Kamangar, Fatemeh Bakouie, Shahriar Gharibzadeh


In this study, we target to exploit Maximum Spanning Trees (MST) of children's semantic networks to investigate their language development. To do so, we examine the graph-theoretic properties of word-embedding networks. The networks are made of words children learn prior to the age of 30 months as the nodes and the links which are built from the cosine vector similarity of words normatively acquired by children prior to two and a half years of age. These networks are weighted graphs and the strength of each link is determined by the numerical similarities of the two words (nodes) on the sides of the link. To avoid changing the weighted networks to the binaries by setting a threshold, constructing MSTs might present a solution. MST is a unique sub-graph that connects all the nodes in such a way that the sum of all the link weights is maximized without forming cycles. MSTs as the backbone of the semantic networks are suitable to examine developmental changes in semantic network topology in children. From these trees, several parameters were calculated to characterize the developmental change in network organization. We showed that MSTs provides an elegant method sensitive to capture subtle developmental changes in semantic network organization.

Keywords: maximum spanning trees, word-embedding, semantic networks, language development

Procedia PDF Downloads 58
758 Word of Mouth and Its Impact on Marketing

Authors: Fatima Naz, Ayesha Tariq


In view of growing of the internet users for e-commerce and taking into account, the emergent impact of word of mouth phenomenon this research has different aims. The aims of this study were built following dissimilar discussion with teachers and colleagues enlightening that word of mouth information for online purchasing do not have the same effect for everybody. Then they were born following dissimilar researchers together with what was already done in previous researches and what was completed. As a result different aims were drawn; the initial aim of this research is to study the attention of the customers in the word of mouth to power their online purchasing activities. The next aim is to analyze the people influenced by the interest of word of mouth. The following aim is to examine the marketing behavior bearing in mind the internet progress and word of mouth, their consideration for word of mouth marketing. In the form of research questions the aims of the study are: 1) How community utilizes and multiplies word of mouth information about online purchasing experience? 2) How communities perceive the word of mouth marketing? 3) How marketers take the word of mouth phenomenon and how they handle it?

Keywords: belief, power, inspiration, self-expression, positive attitude to online marketing, forwarding of contents, purchasing decision, standard marketing

Procedia PDF Downloads 330
757 Secured Embedding of Patient’s Confidential Data in Electrocardiogram Using Chaotic Maps

Authors: Butta Singh


This paper presents a chaotic map based approach for secured embedding of patient’s confidential data in electrocardiogram (ECG) signal. The chaotic map generates predefined locations through the use of selective control parameters. The sample value difference method effectually hides the confidential data in ECG sample pairs at these predefined locations. Evaluation of proposed method on all 48 records of MIT-BIH arrhythmia ECG database demonstrates that the embedding does not alter the diagnostic features of cover ECG. The secret data imperceptibility in stego-ECG is evident through various statistical and clinical performance measures. Statistical metrics comprise of Percentage Root Mean Square Difference (PRD) and Peak Signal to Noise Ratio (PSNR). Further, a comparative analysis between proposed method and existing approaches was also performed. The results clearly demonstrated the superiority of proposed method.

Keywords: chaotic maps, ECG steganography, data embedding, electrocardiogram

Procedia PDF Downloads 97
756 Math Word Problems: Context and Achievement

Authors: Irena Smetackova


The important part of school mathematics are word problems which represent the connection between school knowledge and life reality. To find the reasons why students consider word problems to be difficult, it is necessary to take into consideration the motivational settings, besides mathematical knowledge and reading skills. Our goal is to identify whether the familiar or unfamiliar context of math word problem influences solving success rate and if so, whether the reasons are motivational or cognitive. For this purpose, we conducted three steps study in group of fifty pupils 9-10 years old. In the first step, we asked pupils to create ‘the best’ word problems for entered numerical formula. The set of 19 word problems with different contexts were selected. In the second step, pupils were asked to evaluate (without solving) how they like each item and how easy it is for them. The 6 word problems with low preference and low estimated success rate were selected and combined with other 6 problems with high preference and success rate. In the third step, the same pupils were asked to solve the word problems. The analysis showed that pupils attitudes and solving toward word problems varied by the context. The strong gender patterns both in preferred contexts and in estimated success rates were identified however the real success rate did not differ so strongly. The success gap between word problems with and without preferred contexts were stronger than the gap between problems with and without real experience with the context. The hypothesis that motivational factors are more important than cognitive factors was confirmed.

Keywords: mathematics, context of reality, motivation, cognition, word problems

Procedia PDF Downloads 121
755 Secure Watermarking not at the Cost of Low Robustness

Authors: Jian Cao


This paper describes a novel watermarking technique which we call the random direction embedding (RDE) watermarking. Unlike traditional watermarking techniques, the watermark energy after the RDE embedding does not focus on a fixed direction, leading to the security against the traditional unauthorized watermark removal attack. In addition, the experimental results show that when compared with the existing secure watermarking, namely natural watermarking (NW), the RDE watermarking gains significant improvement in terms of robustness. In fact, the security of the RDE watermarking is not at the cost of low robustness, and it can even achieve more robust than the traditional spread spectrum watermarking, which has been shown to be very insecure.

Keywords: robustness, spread spectrum watermarking, watermarking security, random direction embedding (RDE)

Procedia PDF Downloads 310
754 The Use of Videos: Effects on Children's Language and Literacy Skills

Authors: Rahimah Saimin


Previous research has shown that young children can learn from educational television programmes, videos or other technological media. However, the blending of any of these with traditional printed-based text appears to be omitted. Repeated viewing is an important factor in children's ability to comprehend the content or plot. The present study combined videos with traditional printed-based text and required repeated viewing and is original and distinctive. The first study was a pilot study to explore whether the intervention is implementable in ordinary classrooms. The second study explored whether the curricular embedding is important or whether the video with curricular embedding is effective. The third study explored the effect of “dosage”, i.e. whether a longer/ more intense intervention has a proportionately greater effect on outcomes. Both measured outcomes (comprehension, word sounds, and early word recognition) and unmeasured outcomes (engagement to reading traditional printed-based texts or/and multimodal texts) were obtained from this study. Observation indicated degree of engagement in reading. The theoretical framework was multimodality theory combined with Piaget’s and Vygotsky’s learning theories. An experimental design was used with 4-5-year-old children in nursery schools and primary schools. Six links to video clips exploring non-fiction science content were provided to teachers. The first session is whole-class and subsequent sessions small-group. The teacher then engaged the children in dialogue using supplementary materials. About half of each class was selected randomly for pre-post assessments. Two assessments were used the British Picture Vocabulary Scale (BPVSIII) and the York Assessment of Reading for Comprehension (YARC): Early Reading. Different programme fidelity means were deployed- observations, teacher self-reports attendance logs and post-delivery interviews. Data collection is in progress and results will be available shortly. If this multiphase study show effectiveness in one or other application, then teachers will have other tools which they can use to enhance vocabulary, letter knowledge and word reading. This would be a valuable addition to their repertoire.

Keywords: language skills, literacy skills, multimodality, video

Procedia PDF Downloads 267
753 Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

Authors: Gaelle Candel, David Naccache


t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embeddings. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n²) to O(n²=k), and the memory requirement from n² to 2(n=k)², which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution, and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.

Keywords: concept drift, data visualization, dimension reduction, embedding, monitoring, reusability, t-SNE, unsupervised learning

Procedia PDF Downloads 65
752 A Supervised Approach for Word Sense Disambiguation Based on Arabic Diacritics

Authors: Alaa Alrakaf, Sk. Md. Mizanur Rahman


Since the last two decades’ Arabic natural language processing (ANLP) has become increasingly much more important. One of the key issues related to ANLP is ambiguity. In Arabic language different pronunciation of one word may have a different meaning. Furthermore, ambiguity also has an impact on the effectiveness and efficiency of Machine Translation (MT). The issue of ambiguity has limited the usefulness and accuracy of the translation from Arabic to English. The lack of Arabic resources makes ambiguity problem more complicated. Additionally, the orthographic level of representation cannot specify the exact meaning of the word. This paper looked at the diacritics of Arabic language and used them to disambiguate a word. The proposed approach of word sense disambiguation used Diacritizer application to Diacritize Arabic text then found the most accurate sense of an ambiguous word using Naïve Bayes Classifier. Our Experimental study proves that using Arabic Diacritics with Naïve Bayes Classifier enhances the accuracy of choosing the appropriate sense by 23% and also decreases the ambiguity in machine translation.

Keywords: Arabic natural language processing, machine learning, machine translation, Naive bayes classifier, word sense disambiguation

Procedia PDF Downloads 272
751 Improved Processing Speed for Text Watermarking Algorithm in Color Images

Authors: Hamza A. Al-Sewadi, Akram N. A. Aldakari


Copyright protection and ownership proof of digital multimedia are achieved nowadays by digital watermarking techniques. A text watermarking algorithm for protecting the property rights and ownership judgment of color images is proposed in this paper. Embedding is achieved by inserting texts elements randomly into the color image as noise. The YIQ image processing model is found to be faster than other image processing methods, and hence, it is adopted for the embedding process. An optional choice of encrypting the text watermark before embedding is also suggested (in case required by some applications), where, the text can is encrypted using any enciphering technique adding more difficulty to hackers. Experiments resulted in embedding speed improvement of more than double the speed of other considered systems (such as least significant bit method, and separate color code methods), and a fairly acceptable level of peak signal to noise ratio (PSNR) with low mean square error values for watermarking purposes.

Keywords: steganography, watermarking, time complexity measurements, private keys

Procedia PDF Downloads 83
750 Expressivity of Word-Formation in English and Russian Advertising Lexicon

Authors: Voronina Ekaterina Borisovna


The problem of expressivity of advertising lexicon is studied in the article. The comparison of English and Russian advertising lexicons is done. The objects of the analysis were English and Russian advertising texts, both printed advertising texts and texts extracted from the commercials. Some conclusions concerning the expressivity of advertising lexicon were made. Expressivity can be included in the semantic structure of words or created by word-formation means. Expressivity caused by morphological derivatives includes such facilities as derivational affixes, models and types of word formation.

Keywords: advertising lexicon, expressivity, word-formation means, linguistics

Procedia PDF Downloads 267
749 Accounting as Addressed in the Qur’aan

Authors: Shahriar M. Saadullah, Abdul-Quddoos Abdul-Basith, Zaki K. Abushawish


As a part of academic research in Islamic Accounting it is important to know how the word Accounting is discussed in the Qur’aan. This paper identifies and analyzes the word Accounting in the Qur’aan, which is significant to know and understand. The paper uses a methodology of identifying the root word of Accounting Hasaba (حسب) in the Qur’aan with the help of Islam 360 software and analyzes the use of the relevant words derived from the root word. Then the paper attempts to connect the findings to the contemporary Accounting issues. The paper finds that the root word of Accounting Hasaba (حسب) appears in the Qur’aan 109 times but it is only used in the sense Account, Accountable, or Accounting 45 times. These words appear in 44 different verses in the Qur’aan, appearing twice in one of the verses. The paper divides these verses into 8 different themes namely, Day of Accounting, without any Accounting, Accounting of Time, Self-Accounting, Swift in Accounting, Accounting is only with God, Awareness and the Good Accounting, and Heedlessness and the Bad Accounting. The way the words Account, Accounting, and Accountable is discussed in the Qur’aan links to the contemporary accounting issues including Ethics, Agency Theory, and Internal Control. The links discovered in the paper clearly shows the timeless nature of the message of the Qur’aan.

Keywords: accounting, contemporary accounting issues, Qur'aan, root word of accounting hasaba

Procedia PDF Downloads 122
748 Pudhaiyal: A Maze-Based Treasure Hunt Game for Tamil Words

Authors: Aarthy Anandan, Anitha Narasimhan, Madhan Karky


Word-based games are popular in helping people to improve their vocabulary skills. Games like ‘word search’ and crosswords provide a smart way of increasing vocabulary skills. Word search games are fun to play, but also educational which actually helps to learn a language. Finding the words from word search puzzle helps the player to remember words in an easier way, and it also helps to learn the spellings of words. In this paper, we present a tile distribution algorithm for a Maze-Based Treasure Hunt Game 'Pudhaiyal’ for Tamil words, which describes how words can be distributed horizontally, vertically or diagonally in a 10 x 10 grid. Along with the tile distribution algorithm, we also present an algorithm for the scoring model of the game. The proposed game has been tested with 20,000 Tamil words.

Keywords: Pudhaiyal, Tamil word game, word search, scoring, maze, algorithm

Procedia PDF Downloads 288
747 Reversible Information Hitting in Encrypted JPEG Bitstream by LSB Based on Inherent Algorithm

Authors: Vaibhav Barve


Reversible information hiding has drawn a lot of interest as of late. Being reversible, we can restore unique computerized data totally. It is a plan where mystery data is put away in digital media like image, video, audio to maintain a strategic distance from unapproved access and security reason. By and large JPEG bit stream is utilized to store this key data, first JPEG bit stream is encrypted into all around sorted out structure and then this secret information or key data is implanted into this encrypted region by marginally changing the JPEG bit stream. Valuable pixels suitable for information implanting are computed and as indicated by this key subtle elements are implanted. In our proposed framework we are utilizing RC4 algorithm for encrypting JPEG bit stream. Encryption key is acknowledged by framework user which, likewise, will be used at the time of decryption. We are executing enhanced least significant bit supplanting steganography by utilizing genetic algorithm. At first, the quantity of bits that must be installed in a guaranteed coefficient is versatile. By utilizing proper parameters, we can get high capacity while ensuring high security. We are utilizing logistic map for shuffling of bits and utilization GA (Genetic Algorithm) to find right parameters for the logistic map. Information embedding key is utilized at the time of information embedding. By utilizing precise picture encryption and information embedding key, the beneficiary can, without much of a stretch, concentrate the incorporated secure data and totally recoup the first picture and also the original secret information. At the point when the embedding key is truant, the first picture can be recouped pretty nearly with sufficient quality without getting the embedding key of interest.

Keywords: data embedding, decryption, encryption, reversible data hiding, steganography

Procedia PDF Downloads 222
746 Challenges of Embedding Entrepreneurship in Modibbo Adama University of Technology Yola, Nigeria

Authors: Michael Ubale Cyril


Challenges of embedding entrepreneurship in tertiary institutions in Nigeria requires a consistent policy for equipping schools with necessary facilities like establishing incubating technology centre, the right calibres of human resources, appropriate pedagogical tools for teaching entrepreneurship education and exhibition grounds where products and services will be delivered and patronised by the customers. With the death of facilities in public schools in Nigeria, educators are clamouring for a way out. This study investigated the challenges of embedding entrepreneurship education in Modibbo Adama University of Technology Yola, Nigeria. The population for the study was 201 comprising 34 industrial entrepreneurs, 76 technical teachers and 91 final year undergraduates. The data was analysed using means of 3 groups, standard deviation, and analysis of variance. The study found out, that technical teachers have not been trained to teach entrepreneurship education, approaches to teaching methodology, were not varied and lack of infrastructural facilities like building was not a factor. It was recommended that technical teachers be retrained to teach entrepreneurship education, textbooks in entrepreneurship should be published with Nigerian outlook.

Keywords: challenges, embedding, entrepreneurship pedagogical, technology incubating centres

Procedia PDF Downloads 206
745 The Role of Reading Self-Efficacy and Perception of Difficulty in English Reading among Chinese ESL Learners

Authors: Kevin Chan, Kevin K. H. Chung, Patcy P. S. Yeung, H. L. Ip, Bill T. C. Chung, Karen M. K. Chung


Purpose: Recent evidence shows that reading self-efficacy and students perceived difficulty in reading are significantly associated with word reading and reading fluency. However, little is known about these relationships among students learning to read English as a second language, particularly in Chinese students. This study examined the contributions of reading self-efficacy, perception of difficulty in reading, and cognitive-linguistic skills to performance on English word reading and reading fluency in Chinese students. Method: A sample of 122 second-and third-grade students in Hong Kong, China, participated in this study. Students completed the measures of reading self-efficacy and perception of difficulty in reading. They were assessed on their English cognitive-linguistic and reading skills: rapid automatized naming, nonword reading, phonological awareness, word reading, and one-minute word reading. Results: Results of path analysis indicated that when students’ grades were controlled, reading self-efficacy was a significant correlate of word reading and reading fluency, whereas perception of difficulty in reading negatively predicted word reading. Conclusion: These findings underscore the importance of taking students’ reading self-efficacy and perception of difficulty in reading and their cognitive-linguistic skills into consideration when designing reading intervention and instructions for students learning English as a second language.

Keywords: self-efficacy, perception of difficulty in reading, english as a second language, word reading

Procedia PDF Downloads 87
744 Speech Recognition Performance by Adults: A Proposal for a Battery for Marathi

Authors: S. B. Rathna Kumar, Pranjali A Ujwane, Panchanan Mohanty


The present study aimed to develop a battery for assessing speech recognition performance by adults in Marathi. A total of four word lists were developed by considering word frequency, word familiarity, words in common use, and phonemic balance. Each word list consists of 25 words (15 monosyllabic words in CVC structure and 10 monosyllabic words in CVCV structure). Equivalence analysis and performance-intensity function testing was carried using the four word lists on a total of 150 native speakers of Marathi belonging to different regions of Maharashtra (Vidarbha, Marathwada, Khandesh and Northern Maharashtra, Pune, and Konkan). The subjects were further equally divided into five groups based on above mentioned regions. It was found that there was no significant difference (p > 0.05) in the speech recognition performance between groups for each word list and between word lists for each group. Hence, the four word lists developed were equally difficult for all the groups and can be used interchangeably. The performance-intensity (PI) function curve showed semi-linear function, and the groups’ mean slope of the linear portions of the curve indicated an average linear slope of 4.64%, 4.73%, 4.68%, and 4.85% increase in word recognition score per dB for list 1, list 2, list 3 and list 4 respectively. Although, there is no data available on speech recognition tests for adults in Marathi, most of the findings of the study are in line with the findings of research reports on other languages. The four word lists, thus developed, were found to have sufficient reliability and validity in assessing speech recognition performance by adults in Marathi.

Keywords: speech recognition performance, phonemic balance, equivalence analysis, performance-intensity function testing, reliability, validity

Procedia PDF Downloads 286
743 Single-Cell Visualization with Minimum Volume Embedding

Authors: Zhenqiu Liu


Visualizing the heterogeneity within cell-populations for single-cell RNA-seq data is crucial for studying the functional diversity of a cell. However, because of the high level of noises, outlier, and dropouts, it is very challenging to measure the cell-to-cell similarity (distance), visualize and cluster the data in a low-dimension. Minimum volume embedding (MVE) projects the data into a lower-dimensional space and is a promising tool for data visualization. However, it is computationally inefficient to solve a semi-definite programming (SDP) when the sample size is large. Therefore, it is not applicable to single-cell RNA-seq data with thousands of samples. In this paper, we develop an efficient algorithm with an accelerated proximal gradient method and visualize the single-cell RNA-seq data efficiently. We demonstrate that the proposed approach separates known subpopulations more accurately in single-cell data sets than other existing dimension reduction methods.

Keywords: single-cell RNA-seq, minimum volume embedding, visualization, accelerated proximal gradient method

Procedia PDF Downloads 140
742 Sentence Structure for Free Word Order Languages in Context with Anaphora Resolution: A Case Study of Hindi

Authors: Pardeep Singh, Kamlesh Dutta


Many languages have fixed sentence structure and others are free word order. The accuracy of anaphora resolution of syntax based algorithm depends on structure of the sentence. So, it is important to analyze the structure of any language before implementing these algorithms. In this study, we analyzed the sentence structure exploiting the case marker in Hindi as well as some special tag for subject and object. We also investigated the word order for Hindi. Word order typology refers to the study of the order of the syntactic constituents of a language. We analyzed 165 news items of Ranchi Express from EMILEE corpus of plain text. It consisted of 1745 sentences. Eight file of dialogue based from the same corpus has been analyzed which will have 1521 sentences. The percentages of subject object verb structure (SOV) and object subject verb (OSV) are 66.90 and 33.10, respectively.

Keywords: anaphora resolution, free word order languages, SOV, OSV

Procedia PDF Downloads 380
741 Effects of Word Formation Dissimilarities on Youruba Learners of English

Authors: Pelumi Olowofoyeku


English as a language has great reach and influence; it is taught all over the world. For instance, in Nigeria, English language is been taught and learned as a second language; therefore second learners of English in Nigeria have certain problems they contend with. Because of the dissimilarities in word formation patterns of English and Yoruba languages, Yoruba learners of English mostly found in the south west of Nigeria, and some parts of Kwara, Kogi, and Edo states of Nigeria have problems with word formation patterns in English. The objectives of this paper therefore, are: to identify the levels of word formation dissimilarities in English and Yoruba languages and to examine the effects of these dissimilarities on the Yoruba learners of English. The data for this paper were graded words purposely selected and presented to selected students of Adeniran Ogunsanya College of Education, Oto-Ijanikin, Lagos, who are Yoruba learners of English. These respondents were randomly selected to form words which are purposively selected to test the effects of word formation dissimilarities between Yoruba (the respondent’s first language) and English language on the respondents. The dissimilarities are examined using contrastive analysis tools. This paper reveals that there are differences in the word formation patterns of Yoruba and English languages. The writer believes that there is need for language teachers to undertake comparative studies of the two languages involved for methodological reasons. The author then suggests that teachers should identify the problem areas and systematically teach their students. The paper concludes that although English and Yoruba word formation patterns differ very significantly in many respects, there exist language universals in all languages which language educators should take advantage of in teaching.

Keywords: word formation patterns, graded words, ESL, Yoruba learners

Procedia PDF Downloads 410
740 Transcription Skills and Written Composition in Chinese

Authors: Pui-sze Yeung, Connie Suk-han Ho, David Wai-ock Chan, Kevin Kien-hoa Chung


Background: Recent findings have shown that transcription skills play a unique and significant role in Chinese word reading and spelling (i.e. word dictation), and written composition development. The interrelationships among component skills of transcription, word reading, word spelling, and written composition in Chinese have rarely been examined in the literature. Is the contribution of component skills of transcription to Chinese written composition mediated by word level skills (i.e., word reading and spelling)? Methods: The participants in the study were 249 Chinese children in Grade 1, Grade 3, and Grade 5 in Hong Kong. They were administered measures of general reasoning ability, orthographic knowledge, stroke sequence knowledge, word spelling, handwriting fluency, word reading, and Chinese narrative writing. Orthographic knowledge- orthographic knowledge was assessed by a task modeled after the lexical decision subtest of the Hong Kong Test of Specific Learning Difficulties in Reading and Writing (HKT-SpLD). Stroke sequence knowledge: The participants’ performance in producing legitimate stroke sequences was measured by a stroke sequence knowledge task. Handwriting fluency- Handwriting fluency was assessed by a task modeled after the Chinese Handwriting Speed Test. Word spelling: The stimuli of the word spelling task consist of fourteen two-character Chinese words. Word reading: The stimuli of the word reading task consist of 120 two-character Chinese words. Written composition: A narrative writing task was used to assess the participants’ text writing skills. Results: Analysis of covariance results showed that there were significant between-grade differences in the performance of word reading, word spelling, handwriting fluency, and written composition. Preliminary hierarchical multiple regression analysis results showed that orthographic knowledge, word spelling, and handwriting fluency were unique predictors of Chinese written composition even after controlling for age, IQ, and word reading. The interaction effects between grade and each of these three skills (orthographic knowledge, word spelling, and handwriting fluency) were not significant. Path analysis results showed that orthographic knowledge contributed to written composition both directly and indirectly through word spelling, while handwriting fluency contributed to written composition directly and indirectly through both word reading and spelling. Stroke sequence knowledge only contributed to written composition indirectly through word spelling. Conclusions: Preliminary hierarchical regression results were consistent with previous findings about the significant role of transcription skills in Chinese word reading, spelling and written composition development. The fact that orthographic knowledge contributed both directly and indirectly to written composition through word reading and spelling may reflect the impact of the script-sound-meaning convergence of Chinese characters on the composing process. The significant contribution of word spelling and handwriting fluency to Chinese written composition across elementary grades highlighted the difficulty in attaining automaticity of transcription skills in Chinese, which limits the working memory resources available for other composing processes.

Keywords: orthographic knowledge, transcription skills, word reading, writing

Procedia PDF Downloads 321
739 The Greek Root Word ‘Kos’ and the Trade of Ancient Greek with Tamil Nadu, India

Authors: D. Pugazhendhi


The ancient Greeks were forerunners in many fields than other societies. So, the Greeks were well connected with all the countries which were well developed during that time through trade route. In this connection, trading of goods from the ancient Greece to Tamil Nadu which is presently in India, though they are geographically far away, played an important role. In that way, the word and the goods related with kos and kare got exchanged between these two societies. So, it is necessary to compare the phonology and the morphological occurrences of these words that are found common both in the ancient Greek and Tamil literatures of the contemporary period. The results show that there were many words derived from the root kos with the basic meaning of ‘arrange’ in the ancient Greek language, but this is not the case in the usage of the word kare. In the ancient Tamil literature, the word ‘kos’ does not have any root and also had rare occurrences. But it was just the opposite in the case of the word ‘kare’. One of all the meanings of the word, which was derived from the root ‘kos’ in ancient Greek literature, is related with costly ornaments. This meaning seems to have close resemblance with the usage of word ‘kos’ in ancient Tamil literature. Also, the meaning of the word ‘kare’ in ancient Tamil literature is related with spices whereas, in the ancient Greek literature, its meaning is related to that of the cooking of meat using spices. Hence, the similarity seen in the meanings of these words ‘kos’ and ‘kare’ in both these languages provides lead for further study. More than that, the ancient literary resources which are available in both these languages ensure the export and import of gold and spices from the ancient Greek land to Tamil land.

Keywords: arrange, kare, Kos, ornament, Tamil

Procedia PDF Downloads 65
738 Automatic Aggregation and Embedding of Microservices for Optimized Deployments

Authors: Pablo Chico De Guzman, Cesar Sanchez


Microservices are a software development methodology in which applications are built by composing a set of independently deploy-able, small, modular services. Each service runs a unique process and it gets instantiated and deployed in one or more machines (we assume that different microservices are deployed into different machines). Microservices are becoming the de facto standard for developing distributed cloud applications due to their reduced release cycles. In principle, the responsibility of a microservice can be as simple as implementing a single function, which can lead to the following issues: - Resource fragmentation due to the virtual machine boundary. - Poor communication performance between microservices. Two composition techniques can be used to optimize resource fragmentation and communication performance: aggregation and embedding of microservices. Aggregation allows the deployment of a set of microservices on the same machine using a proxy server. Aggregation helps to reduce resource fragmentation, and is particularly useful when the aggregated services have a similar scalability behavior. Embedding deals with communication performance by deploying on the same virtual machine those microservices that require a communication channel (localhost bandwidth is reported to be about 40 times faster than cloud vendor local networks and it offers better reliability). Embedding can also reduce dependencies on load balancer services since the communication takes place on a single virtual machine. For example, assume that microservice A has two instances, a1 and a2, and it communicates with microservice B, which also has two instances, b1 and b2. One embedding can deploy a1 and b1 on machine m1, and a2 and b2 are deployed on a different machine m2. This deployment configuration allows each pair (a1-b1), (a2-b2) to communicate using the localhost interface without the need of a load balancer between microservices A and B. Aggregation and embedding techniques are complex since different microservices might have incompatible runtime dependencies which forbid them from being installed on the same machine. There is also a security concern since the attack surface between microservices can be larger. Luckily, container technology allows to run several processes on the same machine in an isolated manner, solving the incompatibility of running dependencies and the previous security concern, thus greatly simplifying aggregation/embedding implementations by just deploying a microservice container on the same machine as the aggregated/embedded microservice container. Therefore, a wide variety of deployment configurations can be described by combining aggregation and embedding to create an efficient and robust microservice architecture. This paper presents a formal method that receives a declarative definition of a microservice architecture and proposes different optimized deployment configurations by aggregating/embedding microservices. The first prototype is based on i2kit, a deployment tool also submitted to ICWS 2018. The proposed prototype optimizes the following parameters: network/system performance, resource usage, resource costs and failure tolerance.

Keywords: aggregation, deployment, embedding, resource allocation

Procedia PDF Downloads 122
737 A Comparative Study on the Positive and Negative of Electronic Word-of-Mouth on the SERVQUAL Scale-Take A Certain Armed Forces General Hospital in Taiwan As An Example

Authors: Po-Chun Lee, Li-Lin Liang, Ching-Yuan Huang


Purpose: Research on electronic word-of-mouth (eWOM)& online review has been widely used in service industry management research in recent years. The SERVQUAL scale is the most commonly used method to measure service quality. Therefore, the purpose of this research is to combine electronic word of mouth & online review with the SERVQUAL scale. To explore the comparative study of positive and negative electronic word-of-mouth reviews of a certain armed force general hospital in Taiwan. Data sources: This research obtained online word-of-mouth comment data on google maps from a military hospital in Taiwan in the past ten years through Internet data mining technology. Research methods: This study uses the semantic content analysis method to classify word-of-mouth reviews according to the revised PZB SERVQUAL scale. Then carry out statistical analysis. Results of data synthesis: The results of this study disclosed that the negative reviews of this military hospital in Taiwan have been increasing year by year. Under the COVID-19 epidemic, positive word-of-mouth has a downward trend. Among the five determiners of SERVQUAL of PZB, positive word-of-mouth reviews performed best in “Assurance,” with a positive review rate of 58.89%, Followed by 43.33% of “Responsiveness.” In negative word-of-mouth reviews, “Assurance” performed the worst, with a positive rate of 70.99%, followed by responsive 29.01%. Conclusions: The important conclusions of this study disclosed that the total number of electronic word-of-mouth reviews of the military hospital has revealed positive growth in recent years, and the positive word-of-mouth growth has revealed negative growth after the epidemic of COVID-19, while the negative word-of-mouth has grown substantially. Regardless of the positive and negative comments, what patients care most about is “Assurance” of the professional attitude and skills of the medical staff, which needs to be strengthened most urgently. In addition, good “Reliability” will help build positive word-of-mouth. However, poor “Responsiveness” can easily lead to the spread of negative word-of-mouth. This study suggests that the hospital should focus on these few service-oriented quality management and audits.

Keywords: quality of medical service, electronic word-of-mouth, armed forces general hospital

Procedia PDF Downloads 105
736 Computable Difference Matrix for Synonyms in the Holy Quran

Authors: Mohamed Ali Al Shaari, Khalid M. El Fitori


In the field of Quran Studies known as Ghareeb A Quran (the study of the meanings of strange words and structures in Holy Quran), it is difficult to distinguish some pragmatic meanings from conceptual meanings. One who wants to study this subject may need to look for a common usage between any two words or more; to understand general meaning, and sometimes may need to look for common differences between them, even if there are synonyms (word sisters). Some of the distinguished scholars of Arabic linguistics believe that there are no synonym words, they believe in varieties of meaning and multi-context usage. Based on this viewpoint, our method was designed to look for synonyms of a word, then the differences that distinct the word and their synonyms. There are many available books that use such a method e.g. synonyms books, dictionaries, glossaries, and some books on the interpretations of strange vocabulary of the Holy Quran, but it is difficult to look up words in these written works. For that reason, we proposed a logical entity, which we called Differences Matrix (DM). DM groups the synonyms words to extract the relations between them and to know the general meaning, which defines the skeleton of all word synonyms; this meaning is expressed by a word of its sisters. In Differences Matrix, we used the sisters(words) as titles for rows and columns, and in the obtained cells we tried to define the row title (word) by using column title (her sister), so the relations between sisters appear, the expected result is well defined groups of sisters for each word. We represented the obtained results formally, and used the defined groups as a base for building the ontology of the Holy Quran synonyms.

Keywords: Quran, synonyms, differences matrix, ontology

Procedia PDF Downloads 329
735 Difference Expansion Based Reversible Data Hiding Scheme Using Edge Directions

Authors: Toshanlal Meenpal, Ankita Meenpal


A very important technique in reversible data hiding field is Difference expansion. Secret message as well as the cover image may be completely recovered without any distortion after data extraction process due to reversibility feature. In general, in any difference expansion scheme embedding is performed by integer transform in the difference image acquired by grouping two neighboring pixel values. This paper proposes an improved reversible difference expansion embedding scheme. We mainly consider edge direction for embedding by modifying the difference of two neighboring pixels values. In general, the larger difference tends to bring a degraded stego image quality than the smaller difference. Image quality in the range of 0.5 to 3.7 dB in average is achieved by the proposed scheme, which is shown through the experimental results. However payload wise it achieves almost similar capacity in comparisons with previous method.

Keywords: information hiding, wedge direction, difference expansion, integer transform

Procedia PDF Downloads 406