Search results for: Arabic Information extraction
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 12397

Search results for: Arabic Information extraction

12337 Phonological and Syntactic Evidence from Arabic in Favor of Biolinguistics

Authors: Marwan Jarrah

Abstract:

This research paper provides two pieces of phonological and syntactic evidence from Arabic for biolinguistics perspective of language processing. The first piece of evidence concerns the instances where a singular noun is converted to a plural noun in Arabic. Based on the findings of several research papers, this study shows that a singular word does not lose any of its moras when it is pluralized either regularly or irregularly. This mora conservation principle complies with the general physical law of the conservation of mass which states that mass is neither created nor destroyed but changed from one form into another. The second piece of evidence concerns the observation that when the object in some Arabic dialects including Jordanian Arabic and Najdi Arabic is a topic and positioned in situ (i.e. after the verb), the verb agrees with it, something that generates an agreeing inflection marker of the verb that agrees in Number, Person, and Gender with the in-situ topicalized object. This interaction between the verb and the object in such cases is invoked because of the extra feature the object bears, i.e. TOPIC feature. We suggest that such an interaction complies with the general natural law that elements become active when they, e.g., get an additional electron, when the mass number is not equal to the atomic number.

Keywords: biolinguistics, Arabic, physics, interaction

Procedia PDF Downloads 202
12336 Retranslation of Orientalism: Reading Said in Arabic

Authors: Fadil Elmenfi

Abstract:

Edward Said, in his book Culture and Imperialism, devotes the introduction to the Arabic translation. He claims that the fading echo of Orientalism in the Arab world is unlike the positive reflections of its counterpart elsewhere in the world. The probable reason behind his inquiry would be that the methodology Abu Deeb applied in translating Said's book contributed to the book having the limited impact which Said is referring to. The paper adds new insights to the body of theory and the effectiveness of the performance of translation from culture to culture. It presents a survey that can provide the reader with an overview of Said's Orientalism and the two Arabic translations of the book. It investigates some of the problems of translating cultural texts, more specifically translating features of Said's style.

Keywords: Orientalism, retranslation, Arabic Language, Muhammad Enani, Kamal Abu Deeb, Edward Said

Procedia PDF Downloads 489
12335 Defining Heritage Language Learners of Arabic: Linguistic and Cultural Factors

Authors: Rasha Elhawari

Abstract:

Heritage language learners (HLL) are part of the linguistic reality in Foreign Language Learning (FLL). These learners present several characteristics that are different from non-heritage language learners. They have a personal connection with the language and their motivation to learn the language is partly because of this personal connection. In Canada there is a large diversity in the foreign language learning classroom; the Arabic language classroom is no exception. The Arabic HLL is unique for more than one reason. First, is the fact that the Arabic language is spoken across twenty-two Arab countries across the Arab World. Across the Arab World there is a standard variation and a local dialect that co-exist side by side, i.e. diaglossia exists in a strong and unique way as a feature of Arabic. Second, Arabic is the language that all Muslims across the Muslim World use for their prayers. This raises a number of points when we consider Arabic as a Heritage Language; namely the role of diaglossia, culture and religion. The fact that there is a group of leaners that can be regarded as HLL who are not of Arabic speaking background but are Muslims and use the language for religious purposes is unique, thus course developers and language instructors need take this into consideration. The paper takes a closer look at this distinction and establishes sub-groups the Arabic HLLs in a language and/or culture specific way related mainly to the Arabic HLL. It looks at the learners at the beginners’ Arabic class at the undergraduate university level over a period of three years in order to define this learner. Learners belong to different groups and backgrounds but they all share common characteristics. The paper presents a detailed look at the learner types present at this class in order to help prepare and develop material for this specific learner group. The paper shows that separate HLL and non-HLL courses, especially at the introductory and intermediate level, is successful in resolving some of the pedagogical problems that occur in the Arabic as a Foreign Language classroom. In conclusion, the paper recommends the development of HLL courses at the early levels of language learning. It calls for a change in the pedagogical practices to overcome some of the challenges learner in the introductory Arabic class can face.

Keywords: Arabic, Heritage Language, langauge learner, teaching

Procedia PDF Downloads 379
12334 Modern Nahwu's View about the Theory of Amil

Authors: Kisno Umbar

Abstract:

Arabic grammar (nahwu) is one of the most important disciplines to learn about the Islamic literature (kitab al-turats). In the last century, learning Arabic grammar was difficult for both the Arabian or non-Arabian native. Most of the traditional nahwu scholars viewed that the theory of amil is a major problem. The views had influenced large number of modern nahwu scholars, and some of them refuse the theory of amil to simplify Arabic grammar to make it easier. The aim of the study is to compare many views of the modern nahwu scholars about the theory of amil including their reasons. In addition, the study is to reveal whether they follow classic scholars or give a view. The author uses literature study approach to get data of modern nahwu scholars from their books as a primary resource. As a secondary resource, the author uses the updated relevant researches from journals about the theory of amil. Besides, the author put on several resources from the traditional nahwu scholars to compare the views. The analysis showed the contrasting views about the theory of amil. Most of the scholars refuse the amil because it isn’t originally derived from Arabic tradition, but it is influenced by Aristotelian philosophy. The others persistently use the amil inasmuch as it is one of the characteristics that differ Arabic language and other languages.

Keywords: Arabic grammar, Amil, Arabic tradition, Aristotelian philosophy

Procedia PDF Downloads 122
12333 The Diglossia and the Bilingualism: Concept, Problems, and Solutions

Authors: Abdou Mahmoud Abdou Hussein

Abstract:

We attempt, in this paper, to spot the light on the difference between the two concepts (diglossia and bilingualism). Thus, we will show the definition of these two concepts among various perspectives. On the other hand, we will emphasize and highlight 'diglossa' in The Arabic language historically. Furthermore, we will illustrate the factors of the diglossia, the impact of diglossia on the learners of Arabic (native and non native speakers) and finally the suggested solutions for this issue.

Keywords: Arabic linguistics, diglossia, bilingualism, native and non-native speakers

Procedia PDF Downloads 366
12332 Reconciling the Modern Standard Arabic with the Local Dialects in Writing Literary Texts

Authors: Ahmed M. Ghaleb, Ehab S. Al-Nuzaili

Abstract:

This paper attempts to shed light on the question of the choice between standard Arabic and the vernacular in writing literary texts. Modern Standard Arabic (MSA) has long been the formal language of writing education, administration, and media, shred across the Arab countries. In the mid-20th century, some writers have begun to write their literary works in local dialects claiming that they can be more realistic. On the other hand, other writers have opposed this new trend as it can be a threat to the Standard Arabic or MSA that unify all Arabs. However, some other writers, like Tawfiq al-Hakim, Hamed Damanhouri, Najib Mahfouz, and Hanna Mineh, attempted to solve this problem by using what W. M. Hutchins called a 'hybrid language', a middle language between the standard and the vernacular. It is also termed 'a third language'. The paper attempts to examine some of the literary texts in which a combination of the standard and the colloquial is employed. Thus, the paper attempts to find out a solution by proposing a third language, a form that can combine the MSA and the colloquial, and the possibility of using it in writing literary texts. Therefore, the paper can bridge the gap between the different levels of Arabic.

Keywords: modern standard arabic, dialect or vernacular, diglossia, third language

Procedia PDF Downloads 103
12331 Problems of Translating Technical Terms from English into Arabic

Authors: Nisreen Naji Al-Khawaldeh, Lara Ahmad Mansour El-Awar

Abstract:

The present study investigated the strategies MA translation students used for translating technical terms, the most common obstacles they encountered in translating such terms, and the motives behind using such terms as they are in their original form despite their translatability into Arabic. To achieve these objectives, a translation test was administered to 100 MA students specialising in translation at both Hashemite University and The University of Jordan. It consisted of two parts: (a) 50 English technical terms to be translated (b) two questions to be answered concerning the challenges or problems encountered while translating the previous technical terms and the motives that drive them to use most of the English technical terms as they are despite their translatability into Arabic. The analysis of the results revealed that MA translation students faced problems in translating technical terms, namely the inability to find the equivalent form for the given technical terms, the use of literal translation, and the wider use of loan-words type. Besides, the students used different strategies to translate the technical terms, namely borrowing (i.e., loan- words), paraphrasing, synonymy, naturalization, equivalence, and literal translation. Moreover, it was also revealed that most technical terms were used as they are in the source language despite their translatability into Arabic because these technical terms are easier to use in English rather than in Arabic. Also, when these terms were introduced to the Arab world, they were introduced in English, not in Arabic. So, the brain links these objects to their English terms.

Keywords: arabic, english, technical terms, translation strategies, translation problems

Procedia PDF Downloads 242
12330 University Arabic/Foreign Language Teacher's Competences, Professionalism and the Challenges and Opportunities

Authors: Abeer Heider

Abstract:

The article considers the definitions of teacher’s competences and professionalism from different perspectives of Arab and foreign scientists. A special attention is paid to the definition, classification of the stages and components of University Arabic /foreign language teacher’s professionalism. The results of the survey are offered and recommendations are given. In this paper, only some of the problems of defining professional competence and professionalism of the university Arabic/ foreign language teacher have been mentioned. It needs much more analysis and discussion, because the quality of training today’s competitive and mobile students with a good knowledge of foreign languages depends directly on the teachers’ professional level.

Keywords: teacher’s professional competences, Arabic/ foreign language teacher’s professionalism, teacher evaluation, teacher quality

Procedia PDF Downloads 422
12329 Misconception on Multilingualism in Glorious Quran

Authors: Muhammed Unais

Abstract:

The holy Quran is a pure Arabic book completely ensured the absence of non Arabic term. If it was revealed in a multilingual way including various foreign languages besides the Arabic, it can be easily misunderstood that the Arabs became helpless to compile such a work positively responding to the challenge of Allah due to their lack of knowledge in other languages in which the Quran is compiled. As based on the presence of some non Arabic terms in Quran like Istabrq, Saradiq, Rabbaniyyoon, etc. some oriental scholars argued that the holy Quran is not a book revealed in Arabic. We can see some Muslim scholars who either support or deny the presence of foreign terms in Quran but all of them agree that the roots of these words suspected as non Arabic are from foreign languages and are assimilated to the Arabic and using as same in that foreign language. After this linguistic assimilation was occurred and the assimilated non Arabic words became familiar among the Arabs, the Quran revealed as using these words in such a way stating that all words it contains are Arabic either pure or assimilated. Hence the two of opinions around the authenticity and reliability of etymology of these words are right. Those who argue the presence of foreign words he is right by the way of the roots of that words are from foreign and those who argue its absence he is right for that are assimilated and changed as the pure Arabic. The possibility of multilingualism in a monolingual book is logically negative but its significance is being changed according to time and place. The problem of multilingualism in Quran is the misconception raised by some oriental scholars that the Arabs became helpless to compile a book equal to Quran not because of their weakness in Arabic but because the Quran is revealed in languages they are ignorant on them. Really, the Quran was revealed in pure Arabic, the most literate language of the Arabs, and the whole words and its meaning were familiar among them. If one become positively aware of the linguistic and cultural assimilation ever found in whole civilizations and cultural sets he will have not any question in this respect. In this paper the researcher intends to shed light on the possibility of multilingualism in a monolingual book and debates among scholars in this issue, foreign terms in Quran and the logical justifications along with the exclusive features of Quran.

Keywords: Quran, foreign Terms, multilingualism, language

Procedia PDF Downloads 359
12328 Conceptual Metaphors of Responsibility in Arabic to English Translation of Political Speeches: A Corpus-Based Study

Authors: Amr Anany

Abstract:

This study offers a corpus-based analysis of the conceptual metaphors of RESPONSIBILITY inherent in the Arabic political speeches of King Abdulla II and their English translations rendered by the translators of the Royal Hashemite Court ("RHC translators"). In view of the Conceptual Metaphor Theory (CMT), the current study aims to uncover the extent to which the dominant ideology in the source Arabic speeches of King Abdulla II is conveyed into the target English translation. The study explores a bilingual corpus, including eleven authentic Arabic speeches delivered by King Abdulla II and their English translations. The study finds that both Arabic and English share several metaphorical expressions of RESPONSIBILITY that are based on bodily experience such as RESPONSIBILITY IS UP, RESPONSIBILITY IS AN OBJECT, and RESPONSIBILITY IS AN HONOR. Apparently, the study concludes that RHC translators succeed to convey the dominant ideology from the source Arabic speeches to the English ones using specific translation strategies.

Keywords: cognitive linguistics, CDA, conceptual metaphor theory, ideology, responsibility

Procedia PDF Downloads 39
12327 Preserving Digital Arabic Text Integrity Using Blockchain Technology

Authors: Zineb Touati Hamad, Mohamed Ridda Laouar, Issam Bendib

Abstract:

With the massive development of technology today, the Arabic language has gained a prominent position among the languages most used for writing articles, expressing opinions, and also for citing in many websites, defying its growing sensitivity in terms of structure, language skills, diacritics, writing methods, etc. In the context of the spread of the Arabic language, the Holy Quran represents the most prevalent Arabic text today in many applications and websites for citation purposes or for the reading and learning rituals. The Quranic verses / surahs are published quickly and without cost, which may cause great concern to ensure the safety of the content from tampering and alteration. To protect the content of texts from distortion, it is necessary to refer to the original database and conduct a comparison process to extract the percentage of distortion. The disadvantage of this method is that it takes time, in addition to the lack of any guarantee on the integrity of the database itself as it belongs to one central party. Blockchain technology today represents the best way to maintain immutable content. Blockchain is a distributed database that stores information in blocks linked to each other through encryption, where the modification of each block can be easily known. To exploit these advantages, we seek in this paper to justify the use of this technique in preserving the integrity of Arabic texts sensitive to change by building a decentralized framework to authenticate and verify the integrity of the digital Quranic verses/surahs spread on websites.

Keywords: arabic text, authentication, blockchain, integrity, quran, verification

Procedia PDF Downloads 133
12326 Automatic Extraction of Water Bodies Using Whole-R Method

Authors: Nikhat Nawaz, S. Srinivasulu, P. Kesava Rao

Abstract:

Feature extraction plays an important role in many remote sensing applications. Automatic extraction of water bodies is of great significance in many remote sensing applications like change detection, image retrieval etc. This paper presents a procedure for automatic extraction of water information from remote sensing images. The algorithm uses the relative location of R-colour component of the chromaticity diagram. This method is then integrated with the effectiveness of the spatial scale transformation of whole method. The whole method is based on water index fitted from spectral library. Experimental results demonstrate the improved accuracy and effectiveness of the integrated method for automatic extraction of water bodies.

Keywords: feature extraction, remote sensing, image retrieval, chromaticity, water index, spectral library, integrated method

Procedia PDF Downloads 344
12325 English Loanwords in the Egyptian Variety of Arabic: Morphological and Phonological Changes

Authors: Mohamed Yacoub

Abstract:

This paper investigates the English loanwords in the Egyptian variety of Arabic and reaches three findings. Data, in the first finding, were collected from Egyptian movies and soap operas; over two hundred words have been borrowed from English, code-switching was not included. These words then have been put into eleven different categories according to their use and part of speech. Finding two addresses the morphological and phonological change that occurred to these words. Regarding the phonological change, eight categories were found in both consonant and vowel variation, five for consonants and three for vowels. Examples were given for each. Regarding the morphological change, five categories were found including the masculine, feminine, dual, broken, and non-pluralize-able nouns. The last finding is the answers to a four-question survey that addresses forty eight native speakers of Egyptian Arabic and found that most participants did not recognize English borrowed words and thought they were originally Arabic and could not give Arabic equivalents for the loanwords that they could recognize.

Keywords: sociolinguistics, loanwords, borrowing, morphology, phonology, variation, Egyptian dialect

Procedia PDF Downloads 357
12324 A Neural Approach for the Offline Recognition of the Arabic Handwritten Words of the Algerian Departments

Authors: Salim Ouchtati, Jean Sequeira, Mouldi Bedda

Abstract:

In this work we present an off line system for the recognition of the Arabic handwritten words of the Algerian departments. The study is based mainly on the evaluation of neural network performances, trained with the gradient back propagation algorithm. The used parameters to form the input vector of the neural network are extracted on the binary images of the handwritten word by several methods: the parameters of distribution, the moments centered of the different projections and the Barr features. It should be noted that these methods are applied on segments gotten after the division of the binary image of the word in six segments. The classification is achieved by a multi layers perceptron. Detailed experiments are carried and satisfactory recognition results are reported.

Keywords: handwritten word recognition, neural networks, image processing, pattern recognition, features extraction

Procedia PDF Downloads 484
12323 Biomedical Definition Extraction Using Machine Learning with Synonymous Feature

Authors: Jian Qu, Akira Shimazu

Abstract:

OOV (Out Of Vocabulary) terms are terms that cannot be found in many dictionaries. Although it is possible to translate such OOV terms, the translations do not provide any real information for a user. We present an OOV term definition extraction method by using information available from the Internet. We use features such as occurrence of the synonyms and location distances. We apply machine learning method to find the correct definitions for OOV terms. We tested our method on both biomedical type and name type OOV terms, our work outperforms existing work with an accuracy of 86.5%.

Keywords: information retrieval, definition retrieval, OOV (out of vocabulary), biomedical information retrieval

Procedia PDF Downloads 466
12322 Voice Commands Recognition of Mentor Robot in Noisy Environment Using HTK

Authors: Khenfer-Koummich Fatma, Hendel Fatiha, Mesbahi Larbi

Abstract:

this paper presents an approach based on Hidden Markov Models (HMM: Hidden Markov Model) using HTK tools. The goal is to create a man-machine interface with a voice recognition system that allows the operator to tele-operate a mentor robot to execute specific tasks as rotate, raise, close, etc. This system should take into account different levels of environmental noise. This approach has been applied to isolated words representing the robot commands spoken in two languages: French and Arabic. The recognition rate obtained is the same in both speeches, Arabic and French in the neutral words. However, there is a slight difference in favor of the Arabic speech when Gaussian white noise is added with a Signal to Noise Ratio (SNR) equal to 30 db, the Arabic speech recognition rate is 69% and 80% for French speech recognition rate. This can be explained by the ability of phonetic context of each speech when the noise is added.

Keywords: voice command, HMM, TIMIT, noise, HTK, Arabic, speech recognition

Procedia PDF Downloads 353
12321 Altasreef: Automated System of Quran Verbs for Urdu Language

Authors: Haq Nawaz, Muhammad Amjad Iqbal, Kamran Malik

Abstract:

"Altasreef" is an automated system available for Web and Android users which provide facility to the users to learn the Quran verbs. It provides the facility to the users to practice the learned material and also provide facility of exams of Arabic verbs variation focusing on Quran text. Arabic is a highly inflectional language. Almost all of its words connect to roots of three, four or five letters which approach the meaning of all their inflectional forms. In Arabic, a verb is formed by inserting the consonants into one of a set of verb patterns. Suffixes and prefixes are then added to generate the meaning of number, person, and gender. The active/passive voice and perfective aspect and other patterns are than generated. This application is designed for learners of Quranic Arabic who already have learn basics of Arabic conjugation. Application also provides the facility of translation of generated patterns. These translations are generated with the help of rule-based approach to give 100% results to the learners.

Keywords: NLP, Quran, Computational Linguistics, E Learning

Procedia PDF Downloads 134
12320 Analytical Study of Cobalt(II) and Nickel(II) Extraction with Salicylidene O-, M-, and P-Toluidine in Chloroform

Authors: Sana Almi, Djamel Barkat

Abstract:

The solvent extraction of cobalt (II) and nickel (II) from aqueous sulfate solutions were investigated with the analytical methods of slope analysis using salicylidene aniline and the three isomeric o-, m- and p-salicylidene toluidine diluted with chloroform at 25°C. By a statistical analysis of the extraction data, it was concluded that the extracted species are CoL2 with CoL2(HL) and NiL2 (HL denotes HSA, HSOT, HSMT, and HSPT). The extraction efficiency of Co(II) was higher than Ni(II). This tendency is confirmed from numerical extraction constants for each metal cations. The best extraction was according to the following order: HSMT > HSPT > HSOT > HSA for Co2+ and Ni2+.

Keywords: solvent extraction, nickel(II), cobalt(II), salicylidene aniline, o-, m-, and p-salicylidene toluidine

Procedia PDF Downloads 455
12319 Towards an Analysis of Rhetoric of Digital Arabic Discourse

Authors: Gameel Abdelmageed

Abstract:

Arabs have a rhetorical heritage which has greatly contributed to the monitoring and analyzing of the rhetoric of the Holy Quran, Hadith, and Arabic texts on poetry and oratory. But Arab scholars - as far as the researcher knows – have not contributed to monitoring and analyzing the rhetoric of digital Arabic discourse although it has prominence, particularly in social media and has strong effectiveness in the political and social life of Arab society. This discourse has made its impact by using very new rhetorical techniques in language, voice, image, painting and video clips which are known as “Multimedia” and belong to “Digital Rhetoric”. This study suggests that it is time to draw the attention of Arab scholars and invite them to monitor and analyze the rhetoric of digital Arabic discourse.

Keywords: digital discourse, digital rhetoric, Facebook, social media

Procedia PDF Downloads 346
12318 The Conceptual Relationships in N+N Compounds in Arabic Compared to English

Authors: Abdel Rahman Altakhaineh

Abstract:

This paper has analysed the conceptual relations between the elements of NN compounds in Arabic and compared them to those found in English based on the framework of Conceptual Semantics and a modified version of Parallel Architecture referred to as Relational Morphology. The analysis revealed that the repertoire of possible semantic relations between the two nouns in Arabic NN compounds reproduces that in English NN compounds and that, therefore, the main difference is in headedness (right-headed in English, left-headed in Arabic). Adopting RM allows productive and idiosyncratic elements to interweave with each other naturally. Semantically transparent compounds can be stored in memory or produced and understood online, while compounds with different degrees of semantic idiosyncrasy are stored in memory. Furthermore, the predictable parts of idiosyncratic compounds are captured by general schemas. In compounds, such schemas pick out the range of possible semantic relations between the two nouns. Finally, conducting a cross-linguistic study of the systematic patterns of possible conceptual relationships between compound elements is an area worthy of further exploration. In addition, comparing and contrasting compounding in Arabic and Hebrew, especially as they are both Semitic languages, is another area that needs to be investigated thoroughly. It will help morphologists understand the extent to which Jackendoff’s repertoire of semantic relations in compounds is universal. That is, if a language as distant from English as Arabic displays a similar range of cases, this is evidence for a (relatively) universal set of relations from which individual languages may pick and choose.

Keywords: conceptual semantics, morphology, compounds, arabic, english

Procedia PDF Downloads 73
12317 Extraction of M. paradisiaca L. Inflorescences Using Compressed Propane

Authors: Michele C. Mesomo, Madeline de Souza Correa, Roberta L. Kruger, Luis R. S. Kanda, Marcos L. Corazza

Abstract:

Natural extracts of plants have been used for many years for different purposes and recently they have been screened for their potential use as alternative remedies and food preservatives. Inflorescences of M. paradisiaca L., also known as the heart of the banana, have great economic interest due to its fruit. All parts of the banana are used for many different purposes, including use in folk medicine. The use of extraction via supercritical technology has grown in recent years, though it is still necessary to obtain experimental information for the construction of industrial plants. This work reports the extraction of Musa paradisiaca L. using compressed propane as solvent. The effects of the supercritical extraction conditions, pressure and temperature on the yield were evaluated. The raw material, inflorescences banana, was dried at 313.15 K and milled. The particle size used for the packaging of the extraction cell was 12 mesh (23.5%), 16 mesh (23.5%), 32 mesh (34.5%), 48 mesh (18.5%). The extractions were performed in a laboratory scale unit at pressures of 3.0 MPa, 6.5 MPa and 10.0 MPa and at 308.15 K, 323.15 K and 338.15 K. The operating conditions tested achieved a maximum yield of 2.94 wt% for the CO2 extraction at 10.0 MPa and 338.15 K, higher pressure and temperature. The lower yield, 2.29 wt%, was obtained in the condition of lower pressure and higher temperature. Temperature presented significant and positive effect on the extraction yield with supercritical CO2, while pressure had no effect on the yield. The overall extraction curves showed typical behavior obtained for the supercritical extraction procedure and and reached a constant extraction rate of about 80 to 100 min. The largest amount of extract was obtained at the beginning of the process, within 10 to 60 min.

Keywords: banana, natural products, supercritical extraction, temperature

Procedia PDF Downloads 577
12316 Information Extraction Based on Search Engine Results

Authors: Mohammed R. Elkobaisi, Abdelsalam Maatuk

Abstract:

The search engines are the large scale information retrieval tools from the Web that are currently freely available to all. This paper explains how to convert the raw resulted number of search engines into useful information. This represents a new method for data gathering comparing with traditional methods. When a query is submitted for a multiple numbers of keywords, this take a long time and effort, hence we develop a user interface program to automatic search by taking multi-keywords at the same time and leave this program to collect wanted data automatically. The collected raw data is processed using mathematical and statistical theories to eliminate unwanted data and converting it to usable data.

Keywords: search engines, information extraction, agent system

Procedia PDF Downloads 399
12315 Extraction of Essential Oil From Orange Peels

Authors: Aayush Bhisikar, Neha Rajas, Aditya Bhingare, Samarth Bhandare, Amruta Amrurkar

Abstract:

Orange peels are currently thrown away as garbage in India after orange fruits' edible components are consumed. However, the nation depends on important essential oils for usage in companies that produce goods, including food, beverages, cosmetics, and medicines. This study was conducted to show how to effectively use it. By using various extraction techniques, orange peel is used in the creation of essential oils. Stream distillation, water distillation, and solvent extraction were the techniques taken into consideration in this paper. Due to its relative prevalence among the extraction techniques, Design Expert 7.0 was used to plan an experimental run for solvent extraction. Oil was examined to ascertain its physical and chemical characteristics after extraction. It was determined from the outcomes that the orange peels.

Keywords: orange peels, extraction, essential oil, distillation

Procedia PDF Downloads 46
12314 Extraction of Essential Oil from Orange Peels

Authors: Neha Rajas, Aayush Bhisikar, Samarth Bhandare, Aditya Bhingare, Amruta Amrutkar

Abstract:

Orange peels are currently thrown away as garbage in India after orange fruits' edible components are consumed. However, the nation depends on important essential oils for usage in companies that produce goods, including food, beverages, cosmetics, and medicines. This study was conducted to show how to effectively use it. By using various extraction techniques, orange peel is used in the creation of essential oils. Stream distillation, water distillation, and solvent extraction were the techniques taken into consideration in this paper. Due to its relative prevalence among the extraction techniques, Design Expert 7.0 was used to plan an experimental run for solvent extraction. Oil was examined to ascertain its physical and chemical characteristics after extraction. It was determined from the outcomes that the orange peels.

Keywords: orange peels, extraction, distillation, essential oil

Procedia PDF Downloads 43
12313 On Exploring Search Heuristics for improving the efficiency in Web Information Extraction

Authors: Patricia Jiménez, Rafael Corchuelo

Abstract:

Nowadays the World Wide Web is the most popular source of information that relies on billions of on-line documents. Web mining is used to crawl through these documents, collect the information of interest and process it by applying data mining tools in order to use the gathered information in the best interest of a business, what enables companies to promote theirs. Unfortunately, it is not easy to extract the information a web site provides automatically when it lacks an API that allows to transform the user-friendly data provided in web documents into a structured format that is machine-readable. Rule-based information extractors are the tools intended to extract the information of interest automatically and offer it in a structured format that allow mining tools to process it. However, the performance of an information extractor strongly depends on the search heuristic employed since bad choices regarding how to learn a rule may easily result in loss of effectiveness and/or efficiency. Improving search heuristics regarding efficiency is of uttermost importance in the field of Web Information Extraction since typical datasets are very large. In this paper, we employ an information extractor based on a classical top-down algorithm that uses the so-called Information Gain heuristic introduced by Quinlan and Cameron-Jones. Unfortunately, the Information Gain relies on some well-known problems so we analyse an intuitive alternative, Termini, that is clearly more efficient; we also analyse other proposals in the literature and conclude that none of them outperforms the previous alternative.

Keywords: information extraction, search heuristics, semi-structured documents, web mining.

Procedia PDF Downloads 308
12312 The Formation of the Diminutive in Colloquial Jordanian Arabic

Authors: Yousef Barahmeh

Abstract:

This paper is a linguistic and pragmatic analysis of the use of the diminutive in Colloquial Jordanian Arabic (CJA). It demonstrates a peculiar form of the diminutive in CJA inflected by means of feminine plural ends with -aat suffix. The analysis shows that the pragmatic function(s) of the diminutive in CJA refers primarily to ‘littleness’ while the morphological inflection conveys the message of ‘the plethora’. Examples of this linguistic phenomenon are intelligible and often include a large number of words that are culture-specific to the rural dialect in the north of Jordan. In both cases, the diminutive in CJA is an adaptive strategy relative to its pragmatic and social contexts.

Keywords: Colloquial Jordanian Arabic, diminutive, morphology, pragmatics

Procedia PDF Downloads 238
12311 Convolutional Neural Networks-Optimized Text Recognition with Binary Embeddings for Arabic Expiry Date Recognition

Authors: Mohamed Lotfy, Ghada Soliman

Abstract:

Recognizing Arabic dot-matrix digits is a challenging problem due to the unique characteristics of dot-matrix fonts, such as irregular dot spacing and varying dot sizes. This paper presents an approach for recognizing Arabic digits printed in dot matrix format. The proposed model is based on Convolutional Neural Networks (CNN) that take the dot matrix as input and generate embeddings that are rounded to generate binary representations of the digits. The binary embeddings are then used to perform Optical Character Recognition (OCR) on the digit images. To overcome the challenge of the limited availability of dotted Arabic expiration date images, we developed a True Type Font (TTF) for generating synthetic images of Arabic dot-matrix characters. The model was trained on a synthetic dataset of 3287 images and 658 synthetic images for testing, representing realistic expiration dates from 2019 to 2027 in the format of yyyy/mm/dd. Our model achieved an accuracy of 98.94% on the expiry date recognition with Arabic dot matrix format using fewer parameters and less computational resources than traditional CNN-based models. By investigating and presenting our findings comprehensively, we aim to contribute substantially to the field of OCR and pave the way for advancements in Arabic dot-matrix character recognition. Our proposed approach is not limited to Arabic dot matrix digit recognition but can also be extended to text recognition tasks, such as text classification and sentiment analysis.

Keywords: computer vision, pattern recognition, optical character recognition, deep learning

Procedia PDF Downloads 51
12310 Tibyan Automated Arabic Correction Using Machine-Learning in Detecting Syntactical Mistakes

Authors: Ashwag O. Maghraby, Nida N. Khan, Hosnia A. Ahmed, Ghufran N. Brohi, Hind F. Assouli, Jawaher S. Melibari

Abstract:

The Arabic language is one of the most important languages. Learning it is so important for many people around the world because of its religious and economic importance and the real challenge lies in practicing it without grammatical or syntactical mistakes. This research focused on detecting and correcting the syntactic mistakes of Arabic syntax according to their position in the sentence and focused on two of the main syntactical rules in Arabic: Dual and Plural. It analyzes each sentence in the text, using Stanford CoreNLP morphological analyzer and machine-learning approach in order to detect the syntactical mistakes and then correct it. A prototype of the proposed system was implemented and evaluated. It uses support vector machine (SVM) algorithm to detect Arabic grammatical errors and correct them using the rule-based approach. The prototype system has a far accuracy 81%. In general, it shows a set of useful grammatical suggestions that the user may forget about while writing due to lack of familiarity with grammar or as a result of the speed of writing such as alerting the user when using a plural term to indicate one person.

Keywords: Arabic language acquisition and learning, natural language processing, morphological analyzer, part-of-speech

Procedia PDF Downloads 125
12309 Current Trends in the Arabic Linguistics Development: Between National Tradition and Global Tendencies

Authors: Olga Bernikova, Oleg Redkin

Abstract:

Globalization is a process of worldwide economic, political and cultural integration. Obviously, this phenomenon has both positive and negative issues. This article analyzes the impact of the modern process of globalization on the national traditions of language teaching and research. In this context, the problem of the ratio of local to global can be viewed from several sides. Firstly, since English is the language of over 80 percent of scientific and technical research worldwide, what should be the language of science in certain region? Secondly, language 'globality' is not always associated with English, because intercultural communications may have their regional peculiarities. For example, in the Arab world, Modern Standard Arabic can also be regarded as 'global' phenomenon, since the mother-tongue languages of the population are local Arabic dialects. In addition, the correlation 'local' versus 'global' is manifested not only in the linguistic sphere but also in the methodology used in language acquisition and research. Thus, the major principles of the Arabic philological tradition, which goes back to the 7th century, are still spread in the modern Arab world. At the same time, the terminology and methods of language research that are peculiar to this tradition are quite far from the issues of general linguistics that underlies the description of all the languages of the world. The present research relies on a comparative analysis of sources in Arabic linguistics, including original works in Arabic dating back to the 12th-13th centuries. As a case study, interaction of local and global is also considered on the example of the Arabic teaching and research in Russia. Speaking about the correlation between local and global it is possible to forecast development of two parallel tendencies: the spread of the phenomena of globalization on one hand, and local implementation of a language policy aimed at preserving native languages, including Arabic, on the other.

Keywords: Arabic, global, language, local, tradition

Procedia PDF Downloads 228
12308 Compilation and Statistical Analysis of an Arabic-English Legal Corpus in Sketch Engine

Authors: C. Brierley, H. El-Farahaty, A. Farhan

Abstract:

The Leeds Parallel Corpus of Arabic-English Constitutions is a parallel corpus for the Arabic legal domain. Analysis of legal language via Corpus Linguistics techniques is an important development. In legal proceedings, a corpus-based approach to disambiguating meaning is set to replace the dictionary as an interpretative tool, and legal scholarship in the States is now attuned to the potential for Text Analytics over vast quantities of text-based legal material, following the business and medical industries. This trend is reflected in Europe: the interdisciplinary research group in Computer Assisted Legal Linguistics mines big data collections of legal and non-legal texts to analyse: legal interpretations; legal discourse; the comprehensibility of legal texts; conflict resolution; and linguistic human rights. This paper focuses on ‘dignity’ as an important aspect of the overarching concept of human rights in current constitutions across the Arab world. We have compiled a parallel, Arabic-English raw text corpus (169,861 Arabic words and 205,893 English words) from reputable websites such as the World Intellectual Property Organisation and CONSTITUTE, and uploaded and queried our corpus in Sketch Engine. Our most challenging task was sentence-level alignment of Arabic-English data. This entailed manual intervention to ensure correspondence on a one-to-many basis since Arabic sentences differ from English in length and punctuation. We have searched for morphological variants of ‘dignity’ (رامة ك, karāma) in the Arabic data and inspected their English translation equivalents. The term occurs most frequently in the Sudanese constitution (10 instances), and not at all in the constitution of Palestine. Its most frequent collocate, determined via the logDice statistic in Sketch Engine, is ‘human’ as in ‘human dignity’.

Keywords: Arabic constitution, corpus-based legal linguistics, human rights, parallel Arabic-English legal corpora

Procedia PDF Downloads 152