Search results for: syntactic analysis
8719 Study of Syntactic Errors for Deep Parsing at Machine Translation
Authors: Yukiko Sasaki Alam, Shahid Alam
Abstract:
Syntactic parsing is vital for semantic treatment by many applications related to natural language processing (NLP), because form and content coincide in many cases. However, it has not yet reached the levels of reliable performance. By manually examining and analyzing individual machine translation output errors that involve syntax as well as semantics, this study attempts to discover what is required for improving syntactic and semantic parsing.
Keywords: Machine translation, error analysis, syntactic errors, knowledge required for parsing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12478718 Syntactic Recognition of Distorted Patterns
Authors: Marek Skomorowski
Abstract:
In syntactic pattern recognition a pattern can be represented by a graph. Given an unknown pattern represented by a graph g, the problem of recognition is to determine if the graph g belongs to a language L(G) generated by a graph grammar G. The so-called IE graphs have been defined in [1] for a description of patterns. The IE graphs are generated by so-called ETPL(k) graph grammars defined in [1]. An efficient, parsing algorithm for ETPL(k) graph grammars for syntactic recognition of patterns represented by IE graphs has been presented in [1]. In practice, structural descriptions may contain pattern distortions, so that the assignment of a graph g, representing an unknown pattern, to a graph language L(G) generated by an ETPL(k) graph grammar G is rejected by the ETPL(k) type parsing. Therefore, there is a need for constructing effective parsing algorithms for recognition of distorted patterns. The purpose of this paper is to present a new approach to syntactic recognition of distorted patterns. To take into account all variations of a distorted pattern under study, a probabilistic description of the pattern is needed. A random IE graph approach is proposed here for such a description ([2]).Keywords: Syntactic pattern recognition, Distorted patterns, Random graphs, Graph grammars.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13958717 An Automatic Model Transformation Methodology Based on Semantic and Syntactic Comparisons and the Granularity Issue Involved
Authors: Tiexin Wang, Sebastien Truptil, Frederick Benaben
Abstract:
Model transformation, as a pivotal aspect of Modeldriven engineering, attracts more and more attentions both from researchers and practitioners. Many domains (enterprise engineering, software engineering, knowledge engineering, etc.) use model transformation principles and practices to serve to their domain specific problems; furthermore, model transformation could also be used to fulfill the gap between different domains: by sharing and exchanging knowledge. Since model transformation has been widely used, there comes new requirement on it: effectively and efficiently define the transformation process and reduce manual effort that involved in. This paper presents an automatic model transformation methodology based on semantic and syntactic comparisons, and focuses particularly on granularity issue that existed in transformation process. Comparing to the traditional model transformation methodologies, this methodology serves to a general purpose: crossdomain methodology. Semantic and syntactic checking measurements are combined into a refined transformation process, which solves the granularity issue. Moreover, semantic and syntactic comparisons are supported by software tool; manual effort is replaced in this way.Keywords: Automatic model transformation, granularity issue, model-driven engineering, semantic and syntactic comparisons.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21918716 An Effective Framework for Chinese Syntactic Parsing
Authors: Xing Li, Chengqing Zong
Abstract:
This paper presents an effective framework for Chinesesyntactic parsing, which includes two parts. The first one is a parsing framework, which is based on an improved bottom-up chart parsingalgorithm, and integrates the idea of the beam search strategy of N bestalgorithm and heuristic function of A* algorithm for pruning, then get multiple parsing trees. The second is a novel evaluation model, which integrates contextual and partial lexical information into traditional PCFG model and defines a new score function. Using this model, the tree with the highest score is found out as the best parsing tree. Finally,the contrasting experiment results are given. Keywords?syntactic parsing, PCFG, pruning, evaluation model.
Keywords: syntactic parsing, PCFG, pruning, evaluation model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12218715 A Corpus-Based Analysis on Code-Mixing Features in Mandarin-English Bilingual Children in Singapore
Authors: Xunan Huang, Caicai Zhang
Abstract:
This paper investigated the code-mixing features in Mandarin-English bilingual children in Singapore. First, it examined whether the code-mixing rate was different in Mandarin Chinese and English contexts. Second, it explored the syntactic categories of code-mixing in Singapore bilingual children. Moreover, this study investigated whether morphological information was preserved when inserting syntactic components into the matrix language. Data are derived from the Singapore Bilingual Corpus, in which the recordings and transcriptions of sixty English-Mandarin 5-to-6-year-old children were preserved for analysis. Results indicated that the rate of code-mixing was asymmetrical in the two language contexts, with the rate being significantly higher in the Mandarin context than that in the English context. The asymmetry is related to language dominance in that children are more likely to code-mix when using their nondominant language. Concerning the syntactic categories of code-mixing words in the Singaporean bilingual children, we found that noun-mixing, verb-mixing, and adjective-mixing are the three most frequently used categories in code-mixing in the Mandarin context. This pattern mirrors the syntactic categories of code-mixing in the Cantonese context in Cantonese-English bilingual children, and the general trend observed in lexical borrowing. Third, our results also indicated that English vocabularies that carry morphological information are embedded in bare forms in the Mandarin context. These findings shed light upon how bilingual children take advantage of the two languages in mixed utterances in a bilingual environment.
Keywords: Code-mixing, Mandarin Chinese, English, bilingual children.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11218714 The Role of Paraphrase in Interpreting Students’ Writing
Authors: Maya Lisa Aryanti, S. S. M. Hum
Abstract:
To improve students’ skill, writing is the most challenging skill to be developed. The reason is that besides helping the students to develop their skill, this activity also helps them to express themselves. This paper depicts how paraphrasing is very helpful to interpret students’ writing. Syntactic units, used tenses and meanings will indeed change once the writings were paraphrased. The objectives of this research are to reveal the inappropriate structure of syntactic units, to show what types of sentences the students often make, and to show how paraphrasing can help to infer the message. The methodology of this research is descriptive qualitative research. In addition, theories of linguistics are also included. This includes theory of Syntax to describe syntactic units and tenses and theory of Semantics to describe theories of meaning and how paraphrasing works. The theories of general linguistics, grammar and writing are also provided to support the theories of Syntax and Semantics. The results of this research are concerned with how the message is received in the end. The message written in the students’ essay is not clear because of the improper structure of syntactic units and use of incorrect of tenses. The students tend to use simple sentences, compound sentences and complex sentences with a few mistakes in their writing. In addition, they tend to create unnecessary phrases. The last point is that this research shows how paraphrase works to attain complete meaning of a sentence.
Keywords: Paraphrase, meanings, syntactic units and tenses.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10188713 Automatic Intelligent Analysis of Malware Behaviour
Authors: H. Dornhackl, K. Kadletz, R. Luh, P. Tavolato
Abstract:
In this paper, we describe the use of formal methods to model malware behaviour. The modelling of harmful behaviour rests upon syntactic structures that represent malicious procedures inside malware. The malicious activities are modelled by a formal grammar, where API calls’ components are the terminals and the set of API calls used in combination to achieve a goal are designated non-terminals. The combination of different non-terminals in various ways and tiers make up the attack vectors that are used by harmful software. Based on these syntactic structures a parser can be generated which takes execution traces as input for pattern recognition.
Keywords: Malware behaviour, modelling, parsing, search, pattern matching.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15248712 Aspect Oriented Software Architecture
Authors: Pradip Peter Dey, Ronald F. Gonzales, Gordon W. Romney, Mohammad Amin, Bhaskar Raj Sinha
Abstract:
Natural language processing systems pose a unique challenge for software architectural design as system complexity has increased continually and systems cannot be easily constructed from loosely coupled modules. Lexical, syntactic, semantic, and pragmatic aspects of linguistic information are tightly coupled in a manner that requires separation of concerns in a special way in design, implementation and maintenance. An aspect oriented software architecture is proposed in this paper after critically reviewing relevant architectural issues. For the purpose of this paper, the syntactic aspect is characterized by an augmented context-free grammar. The semantic aspect is composed of multiple perspectives including denotational, operational, axiomatic and case frame approaches. Case frame semantics matured in India from deep thematic analysis. It is argued that lexical, syntactic, semantic and pragmatic aspects work together in a mutually dependent way and their synergy is best represented in the aspect oriented approach. The software architecture is presented with an augmented Unified Modeling Language.Keywords: Language engineering, parsing, software design, user experience.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17438711 The Sign in the Communication Process
Authors: S. Pesina, T. Solonchak
Abstract:
In the process of information transmission (concept verbalization) we deal mostly with the substance (contents), and then pay attention to the form. Recalling events from the remote past, often we cannot exactly reproduce specific heard or pronounced words, as well as the syntactic structures. We remember events, feelings, images; we recall the general contents of the discourse. The thought gets a specific language form only during the concept verbalization phase. With minimum time for pondering, depending on the language competence level, the grammar and syntactic shaping often occurs automatically with the use of famous models and stereotypes. This means that the language form adapts itself to the consciousness, and not vice versa.
Keywords: Lexical eidos, phenomenology, noema, polysemantic word, semantic core.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19428710 Probability and Instruction Effects in Syllogistic Conditional Reasoning
Authors: Olimpia Matarazzo, Ivana Baldassarre
Abstract:
The main aim of this study was to examine whether people understand indicative conditionals on the basis of syntactic factors or on the basis of subjective conditional probability. The second aim was to investigate whether the conditional probability of q given p depends on the antecedent and consequent sizes or derives from inductive processes leading to establish a link of plausible cooccurrence between events semantically or experientially associated. These competing hypotheses have been tested through a 3 x 2 x 2 x 2 mixed design involving the manipulation of four variables: type of instructions (“Consider the following statement to be true", “Read the following statement" and condition with no conditional statement); antecedent size (high/low); consequent size (high/low); statement probability (high/low). The first variable was between-subjects, the others were within-subjects. The inferences investigated were Modus Ponens and Modus Tollens. Ninety undergraduates of the Second University of Naples, without any prior knowledge of logic or conditional reasoning, participated in this study. Results suggest that people understand conditionals in a syntactic way rather than in a probabilistic way, even though the perception of the conditional probability of q given p is at least partially involved in the conditionals- comprehension. They also showed that, in presence of a conditional syllogism, inferences are not affected by the antecedent or consequent sizes. From a theoretical point of view these findings suggest that it would be inappropriate to abandon the idea that conditionals are naturally understood in a syntactic way for the idea that they are understood in a probabilistic way.Keywords: Conditionals, conditional probability, conditional syllogism, inferential task.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15638709 Investigating Medical Students’ Perspectives toward University Teachers’ Talking Features in an English as a Foreign Language Context in Urmia, Iran
Authors: Ismail Baniadam, Nafisa Tadayyon, Javid Fereidoni
Abstract:
This study aimed to investigate medical students’ attitudes toward some teachers’ talking features regarding their gender in the Iranian context. To do so, 60 male and 60 female medical students of Urmia University of Medical Sciences (UMSU) participated in the research. A researcher made Likert-type questionnaire which was initially piloted and was used to gather the data. Comparing the four different factors regarding the features of teacher talk, it was revealed that visual and extra-linguistic information factor, Lexical and syntactic familiarity, Speed of speech, and the use of Persian language had the highest to the lowest mean score, respectively. It was also indicated that female students rather than male students were significantly more in favor of speed of speech and lexical and syntactic familiarity.
Keywords: Attitude, gender, medical student, teacher talk.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8018708 Natural Language Database Interface for Selection of Data Using Grammar and Parsing
Authors: N. D. Karande, G. A. Patil
Abstract:
Databases have become ubiquitous. Almost all IT applications are storing into and retrieving information from databases. Retrieving information from the database requires knowledge of technical languages such as Structured Query Language (SQL). However majority of the users who interact with the databases do not have a technical background and are intimidated by the idea of using languages such as SQL. This has led to the development of a few Natural Language Database Interfaces (NLDBIs). A NLDBI allows the user to query the database in a natural language. This paper highlights on architecture of new NLDBI system, its implementation and discusses on results obtained. In most of the typical NLDBI systems the natural language statement is converted into an internal representation based on the syntactic and semantic knowledge of the natural language. This representation is then converted into queries using a representation converter. A natural language query is translated to an equivalent SQL query after processing through various stages. The work has been experimented on primitive database queries with certain constraints.
Keywords: Natural language database interface, representation converter, syntactic and semantic knowledge
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27058707 Dynamic Analyze of Snake Robot
Authors: Seif Dalilsafaei
Abstract:
Crawling movement as a motive mode seen in nature of some animals such as snakes possesses a specific syntactic and dynamic analysis. Serpentine robot designed by inspiration from nature and snake-s crawling motion, is regarded as a crawling robot. In this paper, a serpentine robot with spiral motion model will be analyzed. The purpose of this analysis is to calculate the vertical and tangential forces along snake-s body and to determine the parameters affecting on these forces. Two types of serpentine robots have been designed in order to examine the achieved relations explained below.Keywords: Force, Dynamic analyze, Joint and Snake robot.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19438706 Author Profiling: Prediction of Learners’ Gender on a MOOC Platform Based on Learners’ Comments
Authors: Tahani Aljohani, Jialin Yu, Alexandra. I. Cristea
Abstract:
The more an educational system knows about a learner, the more personalised interaction it can provide, which leads to better learning. However, asking a learner directly is potentially disruptive, and often ignored by learners. Especially in the booming realm of MOOC Massive Online Learning platforms, only a very low percentage of users disclose demographic information about themselves. Thus, in this paper, we aim to predict learners’ demographic characteristics, by proposing an approach using linguistically motivated Deep Learning Architectures for Learner Profiling, particularly targeting gender prediction on a FutureLearn MOOC platform. Additionally, we tackle here the difficult problem of predicting the gender of learners based on their comments only – which are often available across MOOCs. The most common current approaches to text classification use the Long Short-Term Memory (LSTM) model, considering sentences as sequences. However, human language also has structures. In this research, rather than considering sentences as plain sequences, we hypothesise that higher semantic - and syntactic level sentence processing based on linguistics will render a richer representation. We thus evaluate, the traditional LSTM versus other bleeding edge models, which take into account syntactic structure, such as tree-structured LSTM, Stack-augmented Parser-Interpreter Neural Network (SPINN) and the Structure-Aware Tag Augmented model (SATA). Additionally, we explore using different word-level encoding functions. We have implemented these methods on Our MOOC dataset, which is the most performant one comparing with a public dataset on sentiment analysis that is further used as a cross-examining for the models' results.
Keywords: Deep learning, data mining, gender predication, MOOCs.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13638705 Assessing Semantic Consistency of Business Process Models
Authors: Bernhard G. Humm, Janina Fengel
Abstract:
Business process modeling has become an accepted means for designing and describing business operations. Thereby, consistency of business process models, i.e., the absence of modeling faults, is of upmost importance to organizations. This paper presents a concept and subsequent implementation for detecting faults in business process models and for computing a measure of their consistency. It incorporates not only syntactic consistency but also semantic consistency, i.e., consistency regarding the meaning of model elements from a business perspective.Keywords: Business process modeling, model analysis, semantic consistency, Semantic Web
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18538704 Behavioral and EEG Reactions in Native Turkic-Speaking Inhabitants of Siberia and Siberian Russians during Recognition of Syntactic Errors in Sentences in Native and Foreign Languages
Authors: Tatiana N. Astakhova, Alexander E. Saprygin, Tatiana A. Golovko, Alexander N. Savostyanov, Mikhail S. Vlasov, Natalia V. Borisova, Alexandera G. Karpova, Urana N. Kavai-ool, Elena Mokur-ool, Nikolay A. Kolchano, Lyubomir I. Aftanas
Abstract:
The aim of the study is to compare behavioral and EEG reactions in Turkic-speaking inhabitants of Siberia (Tuvinians and Yakuts) and Russians during the recognition of syntax errors in native and foreign languages. Sixty-three healthy aboriginals of the Tyva Republic, 29 inhabitants of the Sakha (Yakutia) Republic, and 55 Russians from Novosibirsk participated in the study. EEG were recorded during execution of error-recognition task in Russian and English language (in all participants) and in native languages (Tuvinian or Yakut Turkic-speaking inhabitants). Reaction time (RT) and quality of task execution were chosen as behavioral measures. Amplitude and cortical distribution of P300 and P600 peaks of ERP were used as a measure of speech-related brain activity. In Tuvinians, there were no differences in the P300 and P600 amplitudes as well as in cortical topology for Russian and Tuvinian languages, but there was a difference for English. In Yakuts, the P300 and P600 amplitudes and topology of ERP for Russian language were the same as Russians had for native language. In Yakuts, brain reactions during Yakut and English language comprehension had no difference, while the Russian language comprehension was differed from both Yakut and English. We found out that the Tuvinians recognized both Russian and Tuvinian as native languages, and English as a foreign language. The Yakuts recognized both English and Yakut as foreign languages, but Russian as a native language. According to the inquirer, both Tuvinians and Yakuts use the national language as a spoken language, whereas they do not use it for writing. It can well be a reason that Yakuts perceive the Yakut writing language as a foreign language while writing Russian as their native.Keywords: EEG, brain activity, syntactic analysis, native and foreign language.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20658703 Word Stemming Algorithms and Retrieval Effectiveness in Malay and Arabic Documents Retrieval Systems
Authors: Tengku Mohd T. Sembok
Abstract:
Documents retrieval in Information Retrieval Systems (IRS) is generally about understanding of information in the documents concern. The more the system able to understand the contents of documents the more effective will be the retrieval outcomes. But understanding of the contents is a very complex task. Conventional IRS apply algorithms that can only approximate the meaning of document contents through keywords approach using vector space model. Keywords may be unstemmed or stemmed. When keywords are stemmed and conflated in retrieving process, we are a step forwards in applying semantic technology in IRS. Word stemming is a process in morphological analysis under natural language processing, before syntactic and semantic analysis. We have developed algorithms for Malay and Arabic and incorporated stemming in our experimental systems in order to measure retrieval effectiveness. The results have shown that the retrieval effectiveness has increased when stemming is used in the systems.Keywords: Information Retrieval, Natural Language Processing, Artificial Intelligence.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22588702 Chinese Event Detection Technique Based on Dependency Parsing and Rule Matching
Authors: Weitao Lin
Abstract:
To quickly extract adequate information from large-scale unstructured text data, this paper studies the representation of events in Chinese scenarios and performs the regularized abstraction. It proposes a Chinese event detection technique based on dependency parsing and rule matching. The method first performs dependency parsing on the original utterance, then performs pattern matching at the word or phrase granularity based on the results of dependent syntactic analysis, filters out the utterances with prominent non-event characteristics, and obtains the final results. The experimental results show the effectiveness of the method.
Keywords: Natural Language Processing, Chinese event detection, rules matching, dependency parsing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1758701 Use XML Format like a Model of Data Backup
Authors: Souleymane Oumtanaga, Kadjo Tanon Lambert, Koné Tiémoman, Tety Pierre, Dowa N’sreke Florent
Abstract:
Nowadays data backup format doesn-t cease to appear raising so the anxiety on their accessibility and their perpetuity. XML is one of the most promising formats to guarantee the integrity of data. This article suggests while showing one thing man can do with XML. Indeed XML will help to create a data backup model. The main task will consist in defining an application in JAVA able to convert information of a database in XML format and restore them later.
Keywords: Backup, Proprietary format, parser, syntactic tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17308700 5iD Viewer - Observation of Fish School Behaviour in Labyrinths and Use of Semantic and Syntactic Entropy for School Structure Definition
Authors: Dalibor Štys, Dalibor Štys Jr., Jana Pečenková, Kryštof M. Štys, Maryia Chkalova, Petr Kouba, Aliaksandr Pautsina, Denis Durniev, Tomáš Náhlík, Petr Císař
Abstract:
In this article is reported a construction and some properties of the 5iD viewer, the system recording simultaneously 5 views of a given experimental object. Properties of the system are demonstrated on the analysis of fish schooling behaviour. It is demonstrated the method of instrument calibration which allows inclusion of image distortion and it is proposed and partly tested also the method of distance assessment in the case that only two opposite cameras are available. Finally, we demonstrate how the state trajectory of the behaviour of the fish school may be constructed from the entropy of the system.
Keywords: 3D positioning, school behavior, distance calibration, space vision, space distortion.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19348699 A Relationship Extraction Method from Literary Fiction Considering Korean Linguistic Features
Authors: Hee-Jeong Ahn, Kee-Won Kim, Seung-Hoon Kim
Abstract:
The knowledge of the relationship between characters can help readers to understand the overall story or plot of the literary fiction. In this paper, we present a method for extracting the specific relationship between characters from a Korean literary fiction. Generally, methods for extracting relationships between characters in text are statistical or computational methods based on the sentence distance between characters without considering Korean linguistic features. Furthermore, it is difficult to extract the relationship with direction from text, such as one-sided love, because they consider only the weight of relationship, without considering the direction of the relationship. Therefore, in order to identify specific relationships between characters, we propose a statistical method considering linguistic features, such as syntactic patterns and speech verbs in Korean. The result of our method is represented by a weighted directed graph of the relationship between the characters. Furthermore, we expect that proposed method could be applied to the relationship analysis between characters of other content like movie or TV drama.
Keywords: Data mining, Korean linguistic feature, literary fiction, relationship extraction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17958698 Street Network in Bandung City, Indonesia: Comparison between City Center and New Commercial Area
Authors: Siska Soesanti, Norihiro Nakai
Abstract:
Bandung city center can be deemed as economic, social and cultural center. However the city center suffers from deterioration. The retail activities tend to shift outward the city center. Numerous idyllic residences changed into business premises in two villages situated in the north part of the city during 1990s, especially after a new highway and flyover opened. According to space syntax theory, the pattern of spatial integration in the urban grid is a prime determinant of movement patterns in the system. The syntactic analysis results show the flyover has insignificant influence on street network in the city center. However the flyover has been generating a major difference in the new commercial area since it has become relatively as strategic as the city center. Besides street network, local government policy, rapid private motorization and particular condition of each site also played important roles in encouraging the current commercial areas to flourish.
Keywords: City center, commercial area, space syntax, street network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18188697 Detecting Email Forgery using Random Forests and Naïve Bayes Classifiers
Authors: Emad E Abdallah, A.F. Otoom, ArwaSaqer, Ola Abu-Aisheh, Diana Omari, Ghadeer Salem
Abstract:
As emails communications have no consistent authentication procedure to ensure the authenticity, we present an investigation analysis approach for detecting forged emails based on Random Forests and Naïve Bays classifiers. Instead of investigating the email headers, we use the body content to extract a unique writing style for all the possible suspects. Our approach consists of four main steps: (1) The cybercrime investigator extract different effective features including structural, lexical, linguistic, and syntactic evidence from previous emails for all the possible suspects, (2) The extracted features vectors are normalized to increase the accuracy rate. (3) The normalized features are then used to train the learning engine, (4) upon receiving the anonymous email (M); we apply the feature extraction process to produce a feature vector. Finally, using the machine learning classifiers the email is assigned to one of the suspects- whose writing style closely matches M. Experimental results on real data sets show the improved performance of the proposed method and the ability of identifying the authors with a very limited number of features.Keywords: Digital investigation, cybercrimes, emails forensics, anonymous emails, writing style, and authorship analysis
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 52558696 Composite Kernels for Public Emotion Recognition from Twitter
Authors: Chien-Hung Chen, Yan-Chun Hsing, Yung-Chun Chang
Abstract:
The Internet has grown into a powerful medium for information dispersion and social interaction that leads to a rapid growth of social media which allows users to easily post their emotions and perspectives regarding certain topics online. Our research aims at using natural language processing and text mining techniques to explore the public emotions expressed on Twitter by analyzing the sentiment behind tweets. In this paper, we propose a composite kernel method that integrates tree kernel with the linear kernel to simultaneously exploit both the tree representation and the distributed emotion keyword representation to analyze the syntactic and content information in tweets. The experiment results demonstrate that our method can effectively detect public emotion of tweets while outperforming the other compared methods.
Keywords: Public emotion recognition, natural language processing, composite kernel, sentiment analysis, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7738695 A Weighted-Profiling Using an Ontology Basefor Semantic-Based Search
Authors: Hikmat A. M. Abd-El-Jaber, Tengku M. T. Sembok
Abstract:
The information on the Web increases tremendously. A number of search engines have been developed for searching Web information and retrieving relevant documents that satisfy the inquirers needs. Search engines provide inquirers irrelevant documents among search results, since the search is text-based rather than semantic-based. Information retrieval research area has presented a number of approaches and methodologies such as profiling, feedback, query modification, human-computer interaction, etc for improving search results. Moreover, information retrieval has employed artificial intelligence techniques and strategies such as machine learning heuristics, tuning mechanisms, user and system vocabularies, logical theory, etc for capturing user's preferences and using them for guiding the search based on the semantic analysis rather than syntactic analysis. Although a valuable improvement has been recorded on search results, the survey has shown that still search engines users are not really satisfied with their search results. Using ontologies for semantic-based searching is likely the key solution. Adopting profiling approach and using ontology base characteristics, this work proposes a strategy for finding the exact meaning of the query terms in order to retrieve relevant information according to user needs. The evaluation of conducted experiments has shown the effectiveness of the suggested methodology and conclusion is presented.Keywords: information retrieval, user profiles, semantic Web, ontology, search engine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 32198694 PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts
Authors: Rania A. Abul Seoud, Nahed H. Solouma, Abou-Baker M. Youssef, Yasser M. Kadah
Abstract:
Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising.Keywords: Link Grammar Parser, Interaction extraction, protein-protein interaction, Natural language processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22548693 Age and Second Language Acquisition: A Case Study from Maldives
Authors: Aaidha Hammad
Abstract:
The age a child to be exposed to a second language is a controversial issue in communities such as the Maldives where English is taught as a second language. It has been observed that different stakeholders have different viewpoints towards the issue. Some believe that the earlier children are exposed to a second language, the better they learn, while others disagree with the notion. Hence, this case study investigates whether children learn a second language better when they are exposed at an earlier age or not. The spoken and written data collected confirm that earlier exposure helps in mastering the sound pattern and speaking fluency with more native-like accent, while a later age is better for learning more abstract and concrete aspects such as grammar and syntactic rules.Keywords: Age, development of language skills, fluency, second language acquisition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36398692 Ontology Population via NLP Techniques in Risk Management
Authors: Jawad Makki, Anne-Marie Alquier, Violaine Prince
Abstract:
In this paper we propose an NLP-based method for Ontology Population from texts and apply it to semi automatic instantiate a Generic Knowledge Base (Generic Domain Ontology) in the risk management domain. The approach is semi-automatic and uses a domain expert intervention for validation. The proposed approach relies on a set of Instances Recognition Rules based on syntactic structures, and on the predicative power of verbs in the instantiation process. It is not domain dependent since it heavily relies on linguistic knowledge. A description of an experiment performed on a part of the ontology of the PRIMA1 project (supported by the European community) is given. A first validation of the method is done by populating this ontology with Chemical Fact Sheets from Environmental Protection Agency2. The results of this experiment complete the paper and support the hypothesis that relying on the predicative power of verbs in the instantiation process improves the performance.Keywords: Information Extraction, Instance Recognition Rules, Ontology Population, Risk Management, Semantic analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15368691 Syntax Sensitive and Language Independent Detection of Code Clones
Authors: Kazuaki Maeda
Abstract:
This paper proposes a new technique to detect code clones from the lexical and syntactic point of view, which is based on PALEX source code representation. The PALEX code contains the recorded parsing actions and also lexical formatting information including white spaces and comments. We can record a list of parsing actions (shift, reduce, and reading a token) during a compiling process after a compiler finishes analyzing the source code. The proposed technique has advantages for syntax sensitive approach and language independency.Keywords: Code Clones, Source Code Representation, XML, Parser, Parser Generator
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14618690 Morphological and Syntactic Meaning: An Interactive Crossword Puzzle Approach
Authors: Ibrahim Garba
Abstract:
This research involved the use of word distributions and morphological knowledge by speakers of Arabic learning English connected different allomorphs in order to realize how the morphology and syntax of English gives meaning through using interactive crossword puzzles (ICP). Fifteen chapters covered with a class of nine learners over an academic year of an intensive English program were reviewed using the ICP. Learners were questioned about how the use of this gaming element enhanced and motivated their learning of English. The findings were positive indicating a successful implementation of ICP both at creational and user levels. This indicated a positive role technology had when learning and teaching English through adopting an interactive gaming element for learning English.Keywords: Distribution, gaming, interactive-crossword-puzzle, morphology.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2390