Search results for: Arabic text.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 649

Search results for: Arabic text.

499 The Canonical Object and Other Objects in Arabic

Authors: Safiah A. Madkhali

Abstract:

The grammatical relation object has not attracted the same attention in the literature as subject has. Where there is a clearly monotransitive verb such as kick, the criteria for identifying the grammatical relation may converge. However, the term object is also used to refer to phenomena that do not subsume all, or even most, of the recognized properties of the canonical object. Instances of such phenomena include non-canonical objects such as the ones in the so-called double-object construction i.e., the indirect object and the direct object as in (He bought his dog a new collar). In this paper, it is demonstrated how criteria of identifying the grammatical relation object that are found in the theoretical and typological literature can be applied to Arabic. Also, further language-specific criteria are here derived from the regularities of the canonical object in the language. The criteria established in this way are then applied to the non-canonical objects to demonstrate how far they conform to, or diverge from, the canonical object. Contrary to the claim that the direct object is more similar to the canonical object than is the indirect object, it was found that it is, in fact, the indirect object rather than the direct object that shares most of the aspects of the canonical object in monotransitive clauses.

Keywords: Canonical objects, double-object constructions, direct object, indirect object, non-canonical objects.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 525
498 Bottom Up Text Mining through Hierarchical Document Representation

Authors: Y. Djouadi., F. Souam.

Abstract:

Most of the existing text mining approaches are proposed, keeping in mind, transaction databases model. Thus, the mined dataset is structured using just one concept: the “transaction", whereas the whole dataset is modeled using the “set" abstract type. In such cases, the structure of the whole dataset and the relationships among the transactions themselves are not modeled and consequently, not considered in the mining process. We believe that taking into account structure properties of hierarchically structured information (e.g. textual document, etc ...) in the mining process, can leads to best results. For this purpose, an hierarchical associations rule mining approach for textual documents is proposed in this paper and the classical set-oriented mining approach is reconsidered profits to a Direct Acyclic Graph (DAG) oriented approach. Natural languages processing techniques are used in order to obtain the DAG structure. Based on this graph model, an hierarchical bottom up algorithm is proposed. The main idea is that each node is mined with its parent node.

Keywords: Graph based association rules mining, Hierarchical document structure, Text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2010
497 A New Vector Quantization Front-End Process for Discrete HMM Speech Recognition System

Authors: M. Debyeche, J.P Haton, A. Houacine

Abstract:

The paper presents a complete discrete statistical framework, based on a novel vector quantization (VQ) front-end process. This new VQ approach performs an optimal distribution of VQ codebook components on HMM states. This technique that we named the distributed vector quantization (DVQ) of hidden Markov models, succeeds in unifying acoustic micro-structure and phonetic macro-structure, when the estimation of HMM parameters is performed. The DVQ technique is implemented through two variants. The first variant uses the K-means algorithm (K-means- DVQ) to optimize the VQ, while the second variant exploits the benefits of the classification behavior of neural networks (NN-DVQ) for the same purpose. The proposed variants are compared with the HMM-based baseline system by experiments of specific Arabic consonants recognition. The results show that the distributed vector quantization technique increase the performance of the discrete HMM system.

Keywords: Hidden Markov Model, Vector Quantization, Neural Network, Speech Recognition, Arabic Language

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2010
496 Word Stemming Algorithms and Retrieval Effectiveness in Malay and Arabic Documents Retrieval Systems

Authors: Tengku Mohd T. Sembok

Abstract:

Documents retrieval in Information Retrieval Systems (IRS) is generally about understanding of information in the documents concern. The more the system able to understand the contents of documents the more effective will be the retrieval outcomes. But understanding of the contents is a very complex task. Conventional IRS apply algorithms that can only approximate the meaning of document contents through keywords approach using vector space model. Keywords may be unstemmed or stemmed. When keywords are stemmed and conflated in retrieving process, we are a step forwards in applying semantic technology in IRS. Word stemming is a process in morphological analysis under natural language processing, before syntactic and semantic analysis. We have developed algorithms for Malay and Arabic and incorporated stemming in our experimental systems in order to measure retrieval effectiveness. The results have shown that the retrieval effectiveness has increased when stemming is used in the systems.

Keywords: Information Retrieval, Natural Language Processing, Artificial Intelligence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2223
495 Analyzing Political Cartoons in Arabic-Language Media after Trump's Jerusalem Move: A Multimodal Discourse Perspective

Authors: Inas Hussein

Abstract:

Communication in the modern world is increasingly becoming multimodal due to globalization and the digital space we live in which have remarkably affected how people communicate. Accordingly, Multimodal Discourse Analysis (MDA) is an emerging paradigm in discourse studies with the underlying assumption that other semiotic resources such as images, colours, scientific symbolism, gestures, actions, music and sound, etc. combine with language in order to  communicate meaning. One of the effective multimodal media that combines both verbal and non-verbal elements to create meaning is political cartoons. Furthermore, since political and social issues are mirrored in political cartoons, these are regarded as potential objects of discourse analysis since they not only reflect the thoughts of the public but they also have the power to influence them. The aim of this paper is to analyze some selected cartoons on the recognition of Jerusalem as Israel's capital by the American President, Donald Trump, adopting a multimodal approach. More specifically, the present research examines how the various semiotic tools and resources utilized by the cartoonists function in projecting the intended meaning. Ten political cartoons, among a surge of editorial cartoons highlighted by the Anti-Defamation League (ADL) - an international Jewish non-governmental organization based in the United States - as publications in different Arabic-language newspapers in Egypt, Saudi Arabia, UAE, Oman, Iran and UK, were purposively selected for semiotic analysis. These editorial cartoons, all published during 6th–18th December 2017, invariably suggest one theme: Jewish and Israeli domination of the United States. The data were analyzed using the framework of Visual Social Semiotics. In accordance with this methodological framework, the selected visual compositions were analyzed in terms of three aspects of meaning: representational, interactive and compositional. In analyzing the selected cartoons, an interpretative approach is being adopted. This approach prioritizes depth to breadth and enables insightful analyses of the chosen cartoons. The findings of the study reveal that semiotic resources are key elements of political cartoons due to the inherent political communication they convey. It is proved that adequate interpretation of the three aspects of meaning is a prerequisite for understanding the intended meaning of political cartoons. It is recommended that further research should be conducted to provide more insightful analyses of political cartoons from a multimodal perspective.

Keywords: Multimodal discourse analysis, multimodal text, political cartoons, visual modality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1496
494 The Impact of Gender Differences on the Expressions of Refusal in Jordanian Arabic

Authors: Hanan Yousef, Nisreen Naji Al-Khawaldeh

Abstract:

The present study investigates the use of the expression of refusal by native speakers of Jordanian Arabic (NSsJA) in different social situations (i.e. invitations, suggestions, and offers). It also investigates the influence of gender on the refusal realization patterns within the Jordanian culture to provide a better insight into the relation between situations, strategies and gender in the Jordanian culture. To that end, a group of 70 participants, including 35 male and 35 female students from different departments at the Hashemite University (HU) participated in this study using mixed methods (i.e. Discourse Completion Test (DCT), interviews and naturally occurring data). Data were analyzed in light of a developed coding scheme. The results showed that NSsJA preferred indirect strategies which mitigate the interaction such as "excuse, reason and, explanation" strategy more than other strategies which aggravate the interaction such as "face-threatening" strategy. Moreover, the analysis of this study has revealed a considerable impact of gender on the use of linguistic forms expressing refusal among NSsJA. Significant differences in the results of the Chi-square test relating the effect of participants' gender indicate that both males and females were conscious of the gender of their interlocutors. The findings provide worthwhile insights into the relation amongst types of communicative acts and the rapport between people in social interaction. They assert that refusal should not be labeled as face threatening act since it does not always pose a threat in some cases especially where refusal is expressed among friends, relatives and family members. They highlight some distinctive culture-specific features of the communicative acts of refusal.

Keywords: Speech act, refusals, semantic formulas, politeness, Jordanian Arabic, mixed methodology, gender.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 869
493 A Prediction of Attractive Evaluation Objects Based On Complex Sequential Data

Authors: Shigeaki Sakurai, Makino Kyoko, Shigeru Matsumoto

Abstract:

This paper proposes a method that predicts attractive evaluation objects. In the learning phase, the method inductively acquires trend rules from complex sequential data. The data is composed of two types of data. One is numerical sequential data. Each evaluation object has respective numerical sequential data. The other is text sequential data. Each evaluation object is described in texts. The trend rules represent changes of numerical values related to evaluation objects. In the prediction phase, the method applies new text sequential data to the trend rules and evaluates which evaluation objects are attractive. This paper verifies the effect of the proposed method by using stock price sequences and news headline sequences. In these sequences, each stock brand corresponds to an evaluation object. This paper discusses validity of predicted attractive evaluation objects, the process time of each phase, and the possibility of application tasks.

Keywords: Trend rule, frequent pattern, numerical sequential data, text sequential data, evaluation object.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1190
492 Word Base Line Detection in Handwritten Text Recognition Systems

Authors: Kamil R. Aida-zade, Jamaladdin Z. Hasanov

Abstract:

An approach is offered for more precise definition of base lines- borders in handwritten cursive text and general problems of handwritten text segmentation have also been analyzed. An offered method tries to solve problems arose in handwritten recognition with specific slant or in other words, where the letters of the words are not on the same vertical line. As an informative features, some recognition systems use ascending and descending parts of the letters, found after the word-s baseline detection. In such recognition systems, problems in baseline detection, impacts the quality of the recognition and decreases the rate of the recognition. Despite other methods, here borders are found by small pieces containing segmentation elements and defined as a set of linear functions. In this method, separate borders for top and bottom border lines are found. At the end of the paper, as a result, azerbaijani cursive handwritten texts written in Latin alphabet by different authors has been analyzed.

Keywords: Azeri, azerbaijani, latin, segmentation, cursive, HWR, handwritten, recognition, baseline, ascender, descender, symbols.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2436
491 A Supervised Text-Independent Speaker Recognition Approach

Authors: Tudor Barbu

Abstract:

We provide a supervised speech-independent voice recognition technique in this paper. In the feature extraction stage we propose a mel-cepstral based approach. Our feature vector classification method uses a special nonlinear metric, derived from the Hausdorff distance for sets, and a minimum mean distance classifier.

Keywords: Text-independent speaker recognition, mel cepstral analysis, speech feature vector, Hausdorff-based metric, supervised classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1787
490 Concrete Recycling in Egypt for Construction Applications: A technical and Financial Feasibility Model

Authors: Omar Farahat Hassanein, A. Samer Ezeldin

Abstract:

The construction industry is a very dynamic field. Every day new technologies and methods are developed to fasten the process and increase its efficiency. Hence, if a project uses fewer resources it will be more efficient.

This paper examines the recycling of concrete construction and demolition (C&D) waste to reuse it as aggregates in on-site applications for construction projects in Egypt and possibly in the Middle East. The study focuses on a stationary plant setting. The machinery set-up used in the plant is analyzed technically and financially.

The findings are gathered and grouped to obtain a comprehensive cost-benefit financial model to demonstrate the feasibility of establishing and operating a concrete recycling plant. Furthermore, a detailed business plan including the time and hierarchy is proposed. 

Keywords: Construction wastes, recycling, sustainability, financial model, concrete recycling, concrete life cycle.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3265
489 Ontology-based Concept Weighting for Text Documents

Authors: Hmway Hmway Tar, Thi Thi Soe Nyaunt

Abstract:

Documents clustering become an essential technology with the popularity of the Internet. That also means that fast and high-quality document clustering technique play core topics. Text clustering or shortly clustering is about discovering semantically related groups in an unstructured collection of documents. Clustering has been very popular for a long time because it provides unique ways of digesting and generalizing large amounts of information. One of the issues of clustering is to extract proper feature (concept) of a problem domain. The existing clustering technology mainly focuses on term weight calculation. To achieve more accurate document clustering, more informative features including concept weight are important. Feature Selection is important for clustering process because some of the irrelevant or redundant feature may misguide the clustering results. To counteract this issue, the proposed system presents the concept weight for text clustering system developed based on a k-means algorithm in accordance with the principles of ontology so that the important of words of a cluster can be identified by the weight values. To a certain extent, it has resolved the semantic problem in specific areas.

Keywords: Clustering, Concept Weight, Document clustering, Feature Selection, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2362
488 Development of Fake News Model Using Machine Learning through Natural Language Processing

Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini

Abstract:

Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Keywords: Fake news detection, types of fake news, machine learning, natural language processing, classification techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1449
487 Teachers- Perceptions on the Use of E-Books as Textbooks in the Classroom

Authors: Abd Mutalib Embong, Azelin M Noor, Razol Mahari M Ali, Zulqarnain Abu Bakar, Abdur- Rahman Mohamed Amin

Abstract:

At the time where electronic books, or e-Books, offer students a fun way of learning , teachers who are used to the paper text books may find it as a new challenge to use it as a part of learning process. Precisely, there are various types of e-Books available to suit students- knowledge, characteristics, abilities, and interests. The paper discusses teachers- perceptions on the use of ebooks as a paper text book in the classroom. A survey was conducted on 72 teachers who use e-books as textbooks. It was discovered that a majority of these teachers had good perceptions on the use of ebooks. However, they had little problems using the devices. It can be overcome with some strategies and a suggested framework.

Keywords: Classroom, E-books, perception, teacher.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5695
486 Semi-Automatic Analyzer to Detect Authorial Intentions in Scientific Documents

Authors: Kanso Hassan, Elhore Ali, Soule-dupuy Chantal, Tazi Said

Abstract:

Information Retrieval has the objective of studying models and the realization of systems allowing a user to find the relevant documents adapted to his need of information. The information search is a problem which remains difficult because the difficulty in the representing and to treat the natural languages such as polysemia. Intentional Structures promise to be a new paradigm to extend the existing documents structures and to enhance the different phases of documents process such as creation, editing, search and retrieval. The intention recognition of the author-s of texts can reduce the largeness of this problem. In this article, we present intentions recognition system is based on a semi-automatic method of extraction the intentional information starting from a corpus of text. This system is also able to update the ontology of intentions for the enrichment of the knowledge base containing all possible intentions of a domain. This approach uses the construction of a semi-formal ontology which considered as the conceptualization of the intentional information contained in a text. An experiments on scientific publications in the field of computer science was considered to validate this approach.

Keywords: Information research, text analyzes, intentionalstructure, segmentation, ontology, natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1601
485 A Character Detection Method for Ancient Yi Books Based on Connected Components and Regressive Character Segmentation

Authors: Xu Han, Shanxiong Chen, Shiyu Zhu, Xiaoyu Lin, Fujia Zhao, Dingwang Wang

Abstract:

Character detection is an important issue for character recognition of ancient Yi books. The accuracy of detection directly affects the recognition effect of ancient Yi books. Considering the complex layout, the lack of standard typesetting and the mixed arrangement between images and texts, we propose a character detection method for ancient Yi books based on connected components and regressive character segmentation. First, the scanned images of ancient Yi books are preprocessed with nonlocal mean filtering, and then a modified local adaptive threshold binarization algorithm is used to obtain the binary images to segment the foreground and background for the images. Second, the non-text areas are removed by the method based on connected components. Finally, the single character in the ancient Yi books is segmented by our method. The experimental results show that the method can effectively separate the text areas and non-text areas for ancient Yi books and achieve higher accuracy and recall rate in the experiment of character detection, and effectively solve the problem of character detection and segmentation in character recognition of ancient books.

Keywords: Computing methodologies, interest point, salient region detections, image segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 793
484 Industrial Waste Monitoring

Authors: Khairuddin Bin Osman, Ngo Boon Kiat, A. Hamid Bin hamidon, Khairul Azha Bin A. Aziz, Hazli Rafis Bin Abdul Rahman, Mazran Bin Esro

Abstract:

Conventional industrial monitoring systems are tedious, inefficient and the at times integrity of the data is unreliable. The objective of this system is to monitor industrial processes specifically the fluid level which will measure the instantaneous fluid level parameter and respond by text messaging the exact value of the parameter to the user when being enquired by a privileged access user. The development of the embedded program code and the circuit for fluid level measuring are discussed as well. Suggestions for future implementations and efficient remote monitoring works are included.

Keywords: Industrial monitoring system, text messaging, embedded programming.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1633
483 Optimal Document Archiving and Fast Information Retrieval

Authors: Hazem M. El-Bakry, Ahmed A. Mohammed

Abstract:

In this paper, an intelligent algorithm for optimal document archiving is presented. It is kown that electronic archives are very important for information system management. Minimizing the size of the stored data in electronic archive is a main issue to reduce the physical storage area. Here, the effect of different types of Arabic fonts on electronic archives size is discussed. Simulation results show that PDF is the best file format for storage of the Arabic documents in electronic archive. Furthermore, fast information detection in a given PDF file is introduced. Such approach uses fast neural networks (FNNs) implemented in the frequency domain. The operation of these networks relies on performing cross correlation in the frequency domain rather than spatial one. It is proved mathematically and practically that the number of computation steps required for the presented FNNs is less than that needed by conventional neural networks (CNNs). Simulation results using MATLAB confirm the theoretical computations.

Keywords: Information Storage and Retrieval, Electronic Archiving, Fast Information Detection, Cross Correlation, Frequency Domain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1532
482 AGHAZ : An Expert System Based approach for the Translation of English to Urdu

Authors: Uzair Muhammad, Kashif Bilal, Atif Khan, M. Nasir Khan

Abstract:

Machine Translation (MT 3) of English text to its Urdu equivalent is a difficult challenge. Lot of attempts has been made, but a few limited solutions are provided till now. We present a direct approach, using an expert system to translate English text into its equivalent Urdu, using The Unicode Standard, Version 4.0 (ISBN 0-321-18578-1) Range: 0600–06FF. The expert system works with a knowledge base that contains grammatical patterns of English and Urdu, as well as a tense and gender-aware dictionary of Urdu words (with their English equivalents).

Keywords: Machine Translation, Multiword Expressions, Urdulanguage processing, POS12 Tagging for Urdu, Expert Systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2309
481 National Image in the Age of Mass Self-Communication: An Analysis of Internet Users' Perception of Portugal

Authors: L. Godinho, N. Teixeira

Abstract:

Nowadays, massification of Internet access represents one of the major challenges to the traditional powers of the State, among which the power to control its external image. The virtual world has also sparked the interest of social sciences which consider it a new field of study, an immense open text where sense is expressed. In this paper, that immense text has been accessed to so as to understand the perception Internet users from all over the world have of Portugal. Ours is a quantitative and qualitative approach, as we have resorted to buzz, thematic and category analysis. The results confirm the predominance of sea stereotype in others' vision of the Portuguese people, and evidence that national image has adapted to network communication through processes of individuation and paganization.

Keywords: Internet, national image, perception, web analytics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1020
480 The Challenges of Hyper-Textual Learning Approach for Religious Education

Authors: Elham Shirvani–Ghadikolaei, Seyed Mahdi Sajjadi

Abstract:

State of the art technology has the tremendous impact on our life, in this situation education system have been influenced as well as. In this paper, tried to compare two space of learning text and hypertext with each other, and some challenges of using hypertext in religious education. Regarding the fact that, hypertext is an undeniable part of learning in this world and it has highly beneficial for the education process from class to office and home. In this paper tried to solve this question: the consequences and challenges of applying hypertext in religious education. Also, the consequences of this survey demonstrate the role of curriculum designer and planner of education to solve this problem.

Keywords: Hyper-textual, education, religious text, religious education.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1356
479 A Study of the Variability of Very Low Resolution Characters and the Feasibility of Their Discrimination Using Geometrical Features

Authors: Farshideh Einsele, Rolf Ingold

Abstract:

Current OCR technology does not allow to accurately recognizing small text images, such as those found in web images. Our goal is to investigate new approaches to recognize very low resolution text images containing antialiased character shapes. This paper presents a preliminary study on the variability of such characters and the feasibility to discriminate them by using geometrical features. In a first stage we analyze the distribution of these features. In a second stage we present a study on the discriminative power for recognizing isolated characters, using various rendering methods and font properties. Finally we present interesting results of our evaluation tests leading to our conclusion and future focus.

Keywords: World Wide Web, document analysis, pattern recognition, Optical Character Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1334
478 Emotions in Health Tweets: Analysis of American Government Official Accounts

Authors: García López

Abstract:

The Government Departments of Health have the task of informing and educating citizens about public health issues. For this, they use channels like Twitter, key in the search for health information and the propagation of content. The tweets, important in the virality of the content, may contain emotions that influence the contagion and exchange of knowledge. The goal of this study is to perform an analysis of the emotional projection of health information shared on Twitter by official American accounts: the disease control account CDCgov, National Institutes of Health, NIH, the government agency HHSGov, and the professional organization PublicHealth. For this, we used Tone Analyzer, an International Business Machines Corporation (IBM) tool specialized in emotion detection in text, corresponding to the categorical model of emotion representation. For 15 days, all tweets from these accounts were analyzed with the emotional analysis tool in text. The results showed that their tweets contain an important emotional load, a determining factor in the success of their communications. This exposes that official accounts also use subjective language and contain emotions. The predominance of emotion joy over sadness and the strong presence of emotions in their tweets stimulate the virality of content, a key in the work of informing that government health departments have.

Keywords: Emotions in tweets emotion detection in text, health information on Twitter, American health official accounts, emotions on Twitter, emotions and content.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 660
477 Techniques with Statistics for Web Page Watermarking

Authors: Mohamed Lahcen BenSaad, Sun XingMing

Abstract:

Information hiding, especially watermarking is a promising technique for the protection of intellectual property rights. This technology is mainly advanced for multimedia but the same has not been done for text. Web pages, like other documents, need a protection against piracy. In this paper, some techniques are proposed to show how to hide information in web pages using some features of the markup language used to describe these pages. Most of the techniques proposed here use the white space to hide information or some varieties of the language in representing elements. Experiments on a very small page and analysis of five thousands web pages show that these techniques have a wide bandwidth available for information hiding, and they might form a solid base to develop a robust algorithm for web page watermarking.

Keywords: Digital Watermarking, Information Hiding, Markup Language, Text watermarking, Software Watermarking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1750
476 Compression of Semistructured Documents

Authors: Leo Galambos, Jan Lansky, Katsiaryna Chernik

Abstract:

EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit list contains URL and title of the hits, and also some snippet which tries to shortly show a match. The snippet can be almost always assembled by an algorithm that has a full knowledge of the original document (mostly HTML page). It implies that the search engine is required to store the full text of the documents as a part of the index. Such a requirement leads us to pick up an appropriate compression algorithm which would reduce the space demand. One of the solutions could be to use common compression methods, for instance gzip or bzip2, but it might be preferable if we develop a new method which would take advantage of the document structure, or rather, the textual character of the documents. There already exist a special compression text algorithms and methods for a compression of XML documents. The aim of this paper is an integration of the two approaches to achieve an optimal level of the compression ratio

Keywords: Compression, search engine, HTML, XML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1532
475 A Combined Cipher Text Policy Attribute-Based Encryption and Timed-Release Encryption Method for Securing Medical Data in Cloud

Authors: G. Shruthi, Purohit Shrinivasacharya

Abstract:

The biggest problem in cloud is securing an outsourcing data. A cloud environment cannot be considered to be trusted. It becomes more challenging when outsourced data sources are managed by multiple outsourcers with different access rights. Several methods have been proposed to protect data confidentiality against the cloud service provider to support fine-grained data access control. We propose a method with combined Cipher Text Policy Attribute-based Encryption (CP-ABE) and Timed-release encryption (TRE) secure method to control medical data storage in public cloud.

Keywords: Attribute, encryption, security, trapdoor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 691
474 Event Information Extraction System (EIEE): FSM vs HMM

Authors: Shaukat Wasi, Zubair A. Shaikh, Sajid Qasmi, Hussain Sachwani, Rehman Lalani, Aamir Chagani

Abstract:

Automatic Extraction of Event information from social text stream (emails, social network sites, blogs etc) is a vital requirement for many applications like Event Planning and Management systems and security applications. The key information components needed from Event related text are Event title, location, participants, date and time. Emails have very unique distinctions over other social text streams from the perspective of layout and format and conversation style and are the most commonly used communication channel for broadcasting and planning events. Therefore we have chosen emails as our dataset. In our work, we have employed two statistical NLP methods, named as Finite State Machines (FSM) and Hidden Markov Model (HMM) for the extraction of event related contextual information. An application has been developed providing a comparison among the two methods over the event extraction task. It comprises of two modules, one for each method, and works for both bulk as well as direct user input. The results are evaluated using Precision, Recall and F-Score. Experiments show that both methods produce high performance and accuracy, however HMM was good enough over Title extraction and FSM proved to be better for Venue, Date, and time.

Keywords: Emails, Event Extraction, Event Detection, Finite state machines, Hidden Markov Model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2273
473 Designing Ontology-Based Knowledge Integration for Preprocessing of Medical Data in Enhancing a Machine Learning System for Coding Assignment of a Multi-Label Medical Text

Authors: Phanu Waraporn

Abstract:

This paper discusses the designing of knowledge integration of clinical information extracted from distributed medical ontologies in order to ameliorate a machine learning-based multilabel coding assignment system. The proposed approach is implemented using a decision tree technique of the machine learning on the university hospital data for patients with Coronary Heart Disease (CHD). The preliminary results obtained show a satisfactory finding that the use of medical ontologies improves the overall system performance.

Keywords: Medical Ontology, Knowledge Integration, Machine Learning, Medical Coding, Text Assignment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1805
472 An Analysis of Language Borrowing among Algerian University Students Using Online Facebook Conversations

Authors: Messaouda Annab

Abstract:

The rapid development of technology has led to an important context in which different languages and structures are used in the same conversations. This paper investigates the practice of language borrowing within social media platform, namely, Facebook among Algerian Vernacular Arabic (AVA) students. In other words, this study will explore how Algerian students have incorporated lexical English borrowing in their online conversations. This paper will examine the relationships between language, culture and identity among a multilingual group. The main objective is to determine the cultural and linguistic functions that borrowing fulfills in social media and to explain the possible factors underlying English borrowing. The nature of the study entails the use of an online research method that includes ten online Facebook conversations in the form of private messages collected from Bachelor and Masters Algerian students recruited from the English department at the University of Oum El-Bouaghi. The analysis of data revealed that social media platform provided the users with opportunities to shift from one language to another. This practice was noticed in students’ online conversations. English borrowing was the most relevant language performance in accordance with Arabic which is the mother tongue of the chosen sample. The analysis has assumed that participants are skilled in more than one language.

Keywords: Borrowing, language performance, linguistic background, social media.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1231
471 Feature Selection Methods for an Improved SVM Classifier

Authors: Daniel Morariu, Lucian N. Vintan, Volker Tresp

Abstract:

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, three feature selection methods are evaluated: Random Selection, Information Gain (IG) and Support Vector Machine feature selection (called SVM_FS). We show that the best results were obtained with SVM_FS method for a relatively small dimension of the feature vector. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

Keywords: Feature Selection, Learning with Kernels, SupportVector Machine, and Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1778
470 Speaker Independent Quranic Recognizer Basedon Maximum Likelihood Linear Regression

Authors: Ehab Mourtaga, Ahmad Sharieh, Mousa Abdallah

Abstract:

An automatic speech recognition system for the formal Arabic language is needed. The Quran is the most formal spoken book in Arabic, it is spoken all over the world. In this research, an automatic speech recognizer for Quranic based speakerindependent was developed and tested. The system was developed based on the tri-phone Hidden Markov Model and Maximum Likelihood Linear Regression (MLLR). The MLLR computes a set of transformations which reduces the mismatch between an initial model set and the adaptation data. It uses the regression class tree, as well as, estimates a set of linear transformations for the mean and variance parameters of a Gaussian mixture HMM system. The 30th Chapter of the Quran, with five of the most famous readers of the Quran, was used for the training and testing of the data. The chapter includes about 2000 distinct words. The advantages of using the Quranic verses as the database in this developed recognizer are the uniqueness of the words and the high level of orderliness between verses. The level of accuracy from the tested data ranged 68 to 85%.

Keywords: Hidden Markov Model (HMM), MaximumLikelihood Linear Regression (MLLR), Quran, Regression ClassTree, Speech Recognition, Speaker-independent.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1873