Search results for: text mining analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 28231

Search results for: text mining analysis

27931 Deep Learning Based-Object-classes Semantic Classification of Arabic Texts

Authors: Imen Elleuch, Wael Ouarda, Gargouri Bilel

Abstract:

We proposes in this paper a Deep Learning based approach to classify text in order to enrich an Arabic ontology based on the objects classes of Gaston Gross. Those object classes are defined by taking into account the syntactic and semantic features of the treated language. Thus, our proposed approach is a hybrid one. In fact, it is based on the one hand on the object classes that represents a knowledge based-approach on classification of text and in the other hand it uses the deep learning approach that use the word embedding-based-approach to classify text. We have applied our proposed approach on a corpus constructed from an Arabic dictionary. The obtained semantic classification of text will enrich the Arabic objects classes ontology. In fact, new classes can be added to the ontology or an expansion of the features that characterizes each object class can be updated. The obtained results are compared to a similar work that treats the same object with a classical linguistic approach for the semantic classification of text. This comparison highlight our hybrid proposed approach that can be ameliorated by broaden the dataset used in the deep learning process.

Keywords: deep-learning approach, object-classes, semantic classification, Arabic

Procedia PDF Downloads 50
27930 Structures and Analytical Crucibles in Nigerian Indigenous Art Music

Authors: Albert Oluwole Uzodimma Authority

Abstract:

Nigeria is a diverse nation with a rich cultural heritage that has produced numerous art musicians and a vast range of art songs. The compositional styles, tonal rhythm, text rhythm, word painting, and text-tone relationship vary extensively from one dialect to another, indicating the need for standardized tools for the structural and analytical deconstruction of Nigerian indigenous art music. The purpose of this research is to examine the structures of Nigerian indigenous art music and outline some crucibles for analyzing it, by investigating how dialectical inflection influences the choice of text tone, scale mode, tonal rhythm, and the general ambiance of Nigerian art music. The research used a structured questionnaire to collect data from 50 musicologists, out of which 41 responded. The study's focus was on the works of two prominent twentieth-century composers, Stephen Olusoji, and Nwamara Alvan-Ikoku, titled "Oyigiyigi" and "O Chineke, Inozikwa omee," respectively. The data collected was presented in percentages using pie charts and tables. The study shows that in Nigerian Indigenous music, several aspects are to be considered for proper analysis, such as linguistic sensitivity, dialectical inflection influences text-tone relationship, text rhythm and tonal rhythm, which help to convey the proper meanings of messages in songs. It also highlights the lack of standardized rubrics for analysis, which necessitated the proposal of robust criteria for analyzing African music, known as Neo-Eclectic-Crucibles. Hinging on eclectic approach, this research makes significant contributions to music scholarship by addressing the need for standardized tools and crucibles for the structural and analytical deconstruction of Nigerian indigenous art music. It provides a template for further studies leading to standardized rubrics for analyzing African music. This research collected data through a structured questionnaire and analyzed it using pie charts and tables to present the findings accurately. The analysis focused on the respondents' perspectives on the research objectives and structural analysis of two indigenous music compositions by Olusoji and Nwamara. This research answers the questions on the structures and analytical crucibles used in Nigerian indigenous art music, how dialectical inflection influences text-tone relationship, scale mode, tonal rhythm, and the general ambiance of Nigerian art music. This paper demonstrates the need for standardized tools and crucibles for the structural and analytical deconstruction of Nigerian indigenous art music. It highlights several aspects that are crucial to analyzing Nigerian indigenous music and proposes the Neo-Eclectic-Crucibles criteria for analyzing African music. The contribution of this research to music scholarship is significant, providing a template for further studies and research in the field.

Keywords: art-music, crucibles, dialectical inflections, indigenous, text-tone, tonal rhythm, word-painting

Procedia PDF Downloads 67
27929 Systemic Functional Grammar Analysis of Barack Obama's Second Term Inaugural Speech

Authors: Sadiq Aminu, Ahmed Lamido

Abstract:

This research studies Barack Obama’s second inaugural speech using Halliday’s Systemic Functional Grammar (SFG). SFG is a text grammar which describes how language is used, so that the meaning of the text can be better understood. The primary source of data in this research work is Barack Obama’s second inaugural speech which was obtained from the internet. The analysis of the speech was based on the ideational and textual metafunctions of Systemic Functional Grammar. Specifically, the researcher analyses the Process Types and Participants (ideational) and the Theme/Rheme (textual). It was found that material process (process of doing) was the most frequently used ‘Process type’ and ‘We’ which refers to the people of America was the frequently used ‘Theme’. Application of the SFG theory, therefore, gives a better meaning to Barack Obama’s speech.

Keywords: ideational, metafunction, rheme, textual, theme

Procedia PDF Downloads 131
27928 Agriculture Water Quality Evaluation in Minig Basin

Authors: Ben Salah Nahla

Abstract:

The problem of water in Tunisia affects the quality and quantity. Tunisia is in a situation of water shortage. It was estimated that 4.6 Mm3/an. Moreover, the quality of water in Tunisia is also mediocre. In fact, 50% of the water has a high salinity (> 1.5g/l). There are several parameters which affect water quality such as sodium, fluoride. An excess of this parameter may induce some human health. Furthermore, the mining basin area has a problem of industrial waste. This problem may affect the water quality of the groundwater. Therefore, the purpose of this work is to assess the water quality in Basin Mining and the impact of fluorine. For this research, some water samples were done in the field and specific water analysis was implemented in the laboratory. Sampling is carried out on eight drilling in the area of the mining region. In the following, we will look at water view composition, physical and chemical quality. A physical-chemical analysis of water from a survey of the Mining area of Tunisia was performed and showed an excess for the following items: fluorine, sodium, sulfate. So many chemicals may be present in water. However, only a small number of them immediately concern in terms of health in all circumstances. Fluorine (F) is one particular chemical that is considered both necessary for the human body, but an excess of the rate of this chemical causes serious diseases. Sodium fluoride and sodium silicofluoride are more soluble and may spread in animals and plants where their toxicity largest organizations. The more complex particles such as cryolite and fluorite, almost insoluble, are more stable and less toxic. Thereafter, we will study the problem of excess fluorine in the water. The latter intended for human consumption must always comply with the limits for microbiological quality parameters and physical-chemical parameters defined by European standards (1.5 mg/l) and Tunisian (2 mg/l).

Keywords: water, minier basin, fluorine, silicofluoride

Procedia PDF Downloads 556
27927 A Study on the Nostalgia Contents Analysis of Hometown Alumni in the Online Community

Authors: Heejin Yun, Juanjuan Zang

Abstract:

This study aims to analyze the text terms posted on an online community of people from the same hometown and to understand the topic and trend of nostalgia composed online. For this purpose, this study collected 144 writings which the natives of Yeongjong Island, Incheon, South-Korea have posted on an online community. And it analyzed association relations. As a result, online community texts means that just defining nostalgia as ‘a mind longing for hometown’ is not an enough explanation. Second, texts composed online have abstractness rather than persons’ individual stories. This study figured out the relationship that had the most critical and closest mutual association among the terms that constituted nostalgia through literature research and association rule concerning nostalgia. The result of this study has a characteristic that it summed up the core terms and emotions related to nostalgia.

Keywords: nostalgia, cultural memory, data mining, association rule

Procedia PDF Downloads 212
27926 Making Sense of Places: A Comparative Study of Three Contexts in Thailand

Authors: Thirayu Jumsai Na Ayudhya

Abstract:

The study of what architecture means to people in their everyday lives inadequately addresses the contextualized and holistic theoretical framework. This article succinctly presents theoretical framework obtained from the comparative study of how people experience the everyday architecture in three different contexts including 1) Bangkok CBD, 2) Phuket island old-town, and 3) Nan province old-town. The way people make sense of the everyday architecture can be addressed in four super-ordinate themes; (1) building in urban (text), (2) building in (text), (3) building in human (text), (4) and building in time (text). In this article, these super-ordinate themes were verified whether they recur in three studied-contexts. In each studied-context, the participants were divided into two groups, 1) local people, 2) visitors. Participants were asked to take photographs of the everyday architecture during the everyday routine and to participate the elicit-interview with photographs produced by themselves. Interpretative phenomenological analysis (IPA) was adopted to interpret elicit-interview data. Sub-themes emerging in each studied-context were brought into the cross-comparison among three studied- contexts. It is found that four super-ordinate themes recur with additional distinctive sub-themes. Further studies in other different contexts, such as socio-political, economic, cultural differences, are recommended to complete the theoretical framework.

Keywords: sense of place, the everyday architecture, architectural experience, the everyday

Procedia PDF Downloads 133
27925 Emotion Classification Using Recurrent Neural Network and Scalable Pattern Mining

Authors: Jaishree Ranganathan, MuthuPriya Shanmugakani Velsamy, Shamika Kulkarni, Angelina Tzacheva

Abstract:

Emotions play an important role in everyday life. An-alyzing these emotions or feelings from social media platforms like Twitter, Facebook, blogs, and forums based on user comments and reviews plays an important role in various factors. Some of them include brand monitoring, marketing strategies, reputation, and competitor analysis. The opinions or sentiments mined from such data helps understand the current state of the user. It does not directly provide intuitive insights on what actions to be taken to benefit the end user or business. Actionable Pattern Mining method provides suggestions or actionable recommendations on what changes or actions need to be taken in order to benefit the end user. In this paper, we propose automatic classification of emotions in Twitter data using Recurrent Neural Network - Gated Recurrent Unit. We achieve training accuracy of 87.58% and validation accuracy of 86.16%. Also, we extract action rules with respect to the user emotion that helps to provide actionable suggestion.

Keywords: emotion mining, twitter, recurrent neural network, gated recurrent unit, actionable pattern mining

Procedia PDF Downloads 142
27924 Towards a Deconstructive Text: Beyond Language and the Politics of Absences in Samuel Beckett’s Waiting for Godot

Authors: Afia Shahid

Abstract:

The writing of Samuel Beckett is associated with meaning in the meaninglessness and the production of what he calls ‘literature of unword’. The casual escape from the world of words in the form of silences and pauses, in his play Waiting for Godot, urges to ask question of their existence and ultimately leads to investigate the theory behind their use in the play. This paper proposes that these absences (silence and pause) in Beckett’s play force to think ‘beyond’ language. This paper asks how silence and pause in Beckett’s text speak for the emergence of poststructuralist text. It aims to identify the significant features of the philosophy of deconstruction in the play of Beckett to demystify the hostile complicity between literature and philosophy. With the interpretive paradigm of poststructuralism this research focuses on the text as a research data. It attempts to delineate the relationship between poststructuralist theoretical concerns and text of Beckett. Keeping in view the theoretical concerns of Poststructuralist theorist Jacques Derrida, the main concern of the discussion is directed towards the notion of ‘beyond’ language into the absences that are aimed at silencing the existing discourse with the ‘radical irony’ of this anti-formal art that contains its own denial and thus represents the idea of ceaseless questioning and radical contradiction in art and any text. This article asks how text of Beckett vibrates with loud silence and has disrupted language to demonstrate the emptiness of words and thus exploring the limitless void of absences. Beckett’s text resonates with silence and pause that is neither negation nor affirmation rather a poststructuralist’s suspension of reality that is ever changing with the undecidablity of all meanings. Within the theoretical notion of Derrida’s Différance this study interprets silence and pause in Beckett’s art. The silence and pause behave like Derrida’s Différance and have questioned their own existence in the text to deconstruct any definiteness and finality of reality to extend an undecidable threshold of poststructuralists that aims to evade the ‘labyrinth of language’.

Keywords: Différance, language, pause, poststructuralism, silence, text

Procedia PDF Downloads 180
27923 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning

Authors: Walid Cherif

Abstract:

Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.

Keywords: data mining, knowledge discovery, machine learning, similarity measurement, supervised classification

Procedia PDF Downloads 442
27922 Business-Intelligence Mining of Large Decentralized Multimedia Datasets with a Distributed Multi-Agent System

Authors: Karima Qayumi, Alex Norta

Abstract:

The rapid generation of high volume and a broad variety of data from the application of new technologies pose challenges for the generation of business-intelligence. Most organizations and business owners need to extract data from multiple sources and apply analytical methods for the purposes of developing their business. Therefore, the recently decentralized data management environment is relying on a distributed computing paradigm. While data are stored in highly distributed systems, the implementation of distributed data-mining techniques is a challenge. The aim of this technique is to gather knowledge from every domain and all the datasets stemming from distributed resources. As agent technologies offer significant contributions for managing the complexity of distributed systems, we consider this for next-generation data-mining processes. To demonstrate agent-based business intelligence operations, we use agent-oriented modeling techniques to develop a new artifact for mining massive datasets.

Keywords: agent-oriented modeling (AOM), business intelligence model (BIM), distributed data mining (DDM), multi-agent system (MAS)

Procedia PDF Downloads 400
27921 Q-Map: Clinical Concept Mining from Clinical Documents

Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala

Abstract:

Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.

Keywords: information retrieval, unified medical language system, syntax based analysis, natural language processing, medical informatics

Procedia PDF Downloads 110
27920 The Platform for Digitization of Georgian Documents

Authors: Erekle Magradze, Davit Soselia, Levan Shughliashvili, Irakli Koberidze, Shota Tsiskaridze, Victor Kakhniashvili, Tamar Chaghiashvili

Abstract:

Since the beginning of active publishing activity in Georgia, voluminous printed material has been accumulated, the digitization of which is an important task. Digitized materials will be available to the audience, and it will be possible to find text in them and conduct various factual research. Digitizing scanned documents means scanning documents, extracting text from the scanned documents, and processing the text into a corresponding language model to detect inaccuracies and grammatical errors. Implementing these stages requires a unified, scalable, and automated platform, where the digital service developed for each stage will perform the task assigned to it; at the same time, it will be possible to develop these services dynamically so that there is no interruption in the work of the platform.

Keywords: NLP, OCR, BERT, Kubernetes, transformers

Procedia PDF Downloads 120
27919 Predicting Groundwater Areas Using Data Mining Techniques: Groundwater in Jordan as Case Study

Authors: Faisal Aburub, Wael Hadi

Abstract:

Data mining is the process of extracting useful or hidden information from a large database. Extracted information can be used to discover relationships among features, where data objects are grouped according to logical relationships; or to predict unseen objects to one of the predefined groups. In this paper, we aim to investigate four well-known data mining algorithms in order to predict groundwater areas in Jordan. These algorithms are Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbor (kNN) and Classification Based on Association Rule (CBA). The experimental results indicate that the SVMs algorithm outperformed other algorithms in terms of classification accuracy, precision and F1 evaluation measures using the datasets of groundwater areas that were collected from Jordanian Ministry of Water and Irrigation.

Keywords: classification, data mining, evaluation measures, groundwater

Procedia PDF Downloads 254
27918 Disowning of ‘Our Lady of Alice Bhatti’ by Mohammad Hanif Through Gendered and Religious Discourse

Authors: Abrar Ajmal

Abstract:

The language used in literature reveals the culture and social gestalt of any society in which it has been constructed and consumed. This paper carries the same rationale, which aims to track certain socio-religious and cultural-economic disparities and discrepancies towards minorities, particularly Christians, in an Islamic re(public) where there is a clear majority of Muslims with the help of analysis of instances of language used in the narratives “Our Lady of Alice Bhatt” by Mohammad Hanif. It would highlight social inequalities practiced deeply in sociocultural discourse. Moreover, this research would also touch upon the question of gender discrimination and gender construction as a female entity in a male-chauvinistic scenic turnout using language since the novel revolves around communicative forfeits of Alice Bhatti’s life where she is fraying in fisticuffs to befit herself in a miss-fitted society. It would employ using Fairclough's framework for analysis to conduct a critical discourse analysis of the text at three axiom levels namely textual analysis, discursive practices, and socio-cultural analysis. Thus, the results would reveal textual findings in linguistic analysis, a range of embedded discourses in discursive practices, and consumption of the text into socio-cultural explications with the use of language and lexicalization employed in the selected excerpts.

Keywords: gendered discourse, socio-economic disparities minorities, Islamization, analytical framework

Procedia PDF Downloads 27
27917 Analyzing Tools and Techniques for Classification In Educational Data Mining: A Survey

Authors: D. I. George Amalarethinam, A. Emima

Abstract:

Educational Data Mining (EDM) is one of the newest topics to emerge in recent years, and it is concerned with developing methods for analyzing various types of data gathered from the educational circle. EDM methods and techniques with machine learning algorithms are used to extract meaningful and usable information from huge databases. For scientists and researchers, realistic applications of Machine Learning in the EDM sectors offer new frontiers and present new problems. One of the most important research areas in EDM is predicting student success. The prediction algorithms and techniques must be developed to forecast students' performance, which aids the tutor, institution to boost the level of student’s performance. This paper examines various classification techniques in prediction methods and data mining tools used in EDM.

Keywords: classification technique, data mining, EDM methods, prediction methods

Procedia PDF Downloads 101
27916 ExactData Smart Tool For Marketing Analysis

Authors: Aleksandra Jonas, Aleksandra Gronowska, Maciej Ścigacz, Szymon Jadczak

Abstract:

Exact Data is a smart tool which helps with meaningful marketing content creation. It helps marketers achieve this by analyzing the text of an advertisement before and after its publication on social media sites like Facebook or Instagram. In our research we focus on four areas of natural language processing (NLP): grammar correction, sentiment analysis, irony detection and advertisement interpretation. Our research has identified a considerable lack of NLP tools for the Polish language, which specifically aid online marketers. In light of this, our research team has set out to create a robust and versatile NLP tool for the Polish language. The primary objective of our research is to develop a tool that can perform a range of language processing tasks in this language, such as sentiment analysis, text classification, text correction and text interpretation. Our team has been working diligently to create a tool that is accurate, reliable, and adaptable to the specific linguistic features of Polish, and that can provide valuable insights for a wide range of marketers needs. In addition to the Polish language version, we are also developing an English version of the tool, which will enable us to expand the reach and impact of our research to a wider audience. Another area of focus in our research involves tackling the challenge of the limited availability of linguistically diverse corpora for non-English languages, which presents a significant barrier in the development of NLP applications. One approach we have been pursuing is the translation of existing English corpora, which would enable us to use the wealth of linguistic resources available in English for other languages. Furthermore, we are looking into other methods, such as gathering language samples from social media platforms. By analyzing the language used in social media posts, we can collect a wide range of data that reflects the unique linguistic characteristics of specific regions and communities, which can then be used to enhance the accuracy and performance of NLP algorithms for non-English languages. In doing so, we hope to broaden the scope and capabilities of NLP applications. Our research focuses on several key NLP techniques including sentiment analysis, text classification, text interpretation and text correction. To ensure that we can achieve the best possible performance for these techniques, we are evaluating and comparing different approaches and strategies for implementing them. We are exploring a range of different methods, including transformers and convolutional neural networks (CNNs), to determine which ones are most effective for different types of NLP tasks. By analyzing the strengths and weaknesses of each approach, we can identify the most effective techniques for specific use cases, and further enhance the performance of our tool. Our research aims to create a tool, which can provide a comprehensive analysis of advertising effectiveness, allowing marketers to identify areas for improvement and optimize their advertising strategies. The results of this study suggest that a smart tool for advertisement analysis can provide valuable insights for businesses seeking to create effective advertising campaigns.

Keywords: NLP, AI, IT, language, marketing, analysis

Procedia PDF Downloads 58
27915 Hydro Geochemistry and Water Quality in a River Affected by Lead Mining in Southern Spain

Authors: Rosendo Mendoza, María Carmen Hidalgo, María José Campos-Suñol, Julián Martínez, Javier Rey

Abstract:

The impact of mining environmental liabilities and mine drainage on surface water quality has been investigated in the hydrographic basin of the La Carolina mining district (southern Spain). This abandoned mining district is characterized by the existence of important mineralizations of sulfoantimonides of Pb - Ag, and sulfides of Cu - Fe. All surface waters reach the main river of this mining area, the Grande River, which ends its course in the Rumblar reservoir. This waterbody is intended to supply 89,000 inhabitants, as well as irrigation and livestock. Therefore, the analysis and control of the metal(loid) concentration that exists in these surface waters is an important issue because of the potential pollution derived from metallic mining. A hydrogeochemical campaign consisting of 20 water sampling points was carried out in the hydrographic network of the Grande River, as well as two sampling points in the Rumbler reservoir and at the main tailings impoundment draining to the river. Although acid mine drainage (pH below 4) is discharged into the Grande river from some mine adits, the pH values in the river water are always neutral or slightly alkaline. This is mainly the result of a dilution process of the small volumes of mine waters by net alkaline waters of the river. However, during the dry season, the surface waters present high mineralization due to a constant discharge from the abandoned flooded mines and a decrease in the contribution of surface runoff. The concentrations of dissolved Cd and Pb in the water reach values of 2 and 81 µg/l, respectively, exceeding the limit established by the Environmental Quality Standard for surface water. In addition, the concentrations of dissolved As, Cu, and Pb in the waters of the Rumblar reservoir reached values of 10, 20, and 11 µg/l, respectively. These values are higher than the maximum allowable concentration for human consumption, a circumstance that is especially alarming.

Keywords: environmental quality, hydrogeochemistry, metal mining, surface water

Procedia PDF Downloads 118
27914 A New Approach for Improving Accuracy of Multi Label Stream Data

Authors: Kunal Shah, Swati Patel

Abstract:

Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. Classification is used to predict class of unseen instance as accurate as possible. Multi label classification is a variant of single label classification where set of labels associated with single instance. Multi label classification is used by modern applications, such as text classification, functional genomics, image classification, music categorization etc. This paper introduces the task of multi-label classification, methods for multi-label classification and evolution measure for multi-label classification. Also, comparative analysis of multi label classification methods on the basis of theoretical study, and then on the basis of simulation was done on various data sets.

Keywords: binary relevance, concept drift, data stream mining, MLSC, multiple window with buffer

Procedia PDF Downloads 563
27913 Identify Users Behavior from Mobile Web Access Logs Using Automated Log Analyzer

Authors: Bharat P. Modi, Jayesh M. Patel

Abstract:

Mobile Internet is acting as a major source of data. As the number of web pages continues to grow the Mobile web provides the data miners with just the right ingredients for extracting information. In order to cater to this growing need, a special term called Mobile Web mining was coined. Mobile Web mining makes use of data mining techniques and deciphers potentially useful information from web data. Web Usage mining deals with understanding the behavior of users by making use of Mobile Web Access Logs that are generated on the server while the user is accessing the website. A Web access log comprises of various entries like the name of the user, his IP address, a number of bytes transferred time-stamp etc. A variety of Log Analyzer tools exists which help in analyzing various things like users navigational pattern, the part of the website the users are mostly interested in etc. The present paper makes use of such log analyzer tool called Mobile Web Log Expert for ascertaining the behavior of users who access an astrology website. It also provides a comparative study between a few log analyzer tools available.

Keywords: mobile web access logs, web usage mining, web server, log analyzer

Procedia PDF Downloads 340
27912 Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition

Authors: L. Hamsaveni, Navya Prakash, Suresha

Abstract:

Document Image Analysis recognizes text and graphics in documents acquired as images. An approach without Optical Character Recognition (OCR) for degraded document image analysis has been adopted in this paper. The technique involves document imaging methods such as Image Fusing and Speeded Up Robust Features (SURF) Detection to identify and extract the degraded regions from a set of document images to obtain an original document with complete information. In case, degraded document image captured is skewed, it has to be straightened (deskew) to perform further process. A special format of image storing known as YCbCr is used as a tool to convert the Grayscale image to RGB image format. The presented algorithm is tested on various types of degraded documents such as printed documents, handwritten documents, old script documents and handwritten image sketches in documents. The purpose of this research is to obtain an original document for a given set of degraded documents of the same source.

Keywords: grayscale image format, image fusing, RGB image format, SURF detection, YCbCr image format

Procedia PDF Downloads 353
27911 Reduction of Plants Biodiversity in Hyrcanian Forest by Coal Mining Activities

Authors: Mahsa Tavakoli, Seyed Mohammad Hojjati, Yahya Kooch

Abstract:

Considering that coal mining is one of the important industrial activities, it may cause damages to environment. According to the author’s best knowledge, the effect of traditional coal mining activities on plant biodiversity has not been investigated in the Hyrcanian forests. Therefore, in this study, the effect of coal mining activities on vegetation and tree diversity was investigated in Hyrcanian forest, North Iran. After filed visiting and determining the mine, 16 plots (20×20 m2) were established by systematic-randomly (60×60 m2) in an area of 4 ha (200×200 m2-mine entrance placed at center). An area adjacent to the mine was not affected by the mining activity, and it is considered as the control area. In each plot, the data about trees such as number and type of species were recorded. The biodiversity of vegetation cover was considered 5 square sub-plots (1 m2) in each plot. PAST software and Ecological Methodology were used to calculate Biodiversity indices. The value of Shannon Wiener and Simpson diversity indices for tree cover in control area (1.04±0.34 and 0.62±0.20) was significantly higher than mining area (0.78±0.27 and 0.45±0.14). The value of evenness indices for tree cover in the mining area was significantly lower than that of the control area. The value of Shannon Wiener and Simpson diversity indices for vegetation cover in the control area (1.37±0.06 and 0.69±0.02) was significantly higher than the mining area (1.02±0.13 and 0.50±0.07). The value of evenness index in the control area was significantly higher than the mining area. Plant communities are a good indicator of the changes in the site. Study about changes in vegetation biodiversity and plant dynamics in the degraded land can provide necessary information for forest management and reforestation of these areas.

Keywords: vegetation biodiversity, species composition, traditional coal mining, Caspian forest

Procedia PDF Downloads 156
27910 Encryption and Decryption of Nucleic Acid Using Deoxyribonucleic Acid Algorithm

Authors: Iftikhar A. Tayubi, Aabdulrahman Alsubhi, Abdullah Althrwi

Abstract:

The deoxyribonucleic acid text provides a single source of high-quality Cryptography about Deoxyribonucleic acid sequence for structural biologists. We will provide an intuitive, well-organized and user-friendly web interface that allows users to encrypt and decrypt Deoxy Ribonucleic Acid sequence text. It includes complex, securing by using Algorithm to encrypt and decrypt Deoxy Ribonucleic Acid sequence. The utility of this Deoxy Ribonucleic Acid Sequence Text is that, it can provide a user-friendly interface for users to Encrypt and Decrypt store the information about Deoxy Ribonucleic Acid sequence. These interfaces created in this project will satisfy the demands of the scientific community by providing fully encrypt of Deoxy Ribonucleic Acid sequence during this website. We have adopted a methodology by using C# and Active Server Page.NET for programming which is smart and secure. Deoxy Ribonucleic Acid sequence text is a wonderful piece of equipment for encrypting large quantities of data, efficiently. The users can thus navigate from one encoding and store orange text, depending on the field for user’s interest. Algorithm classification allows a user to Protect the deoxy ribonucleic acid sequence from change, whether an alteration or error occurred during the Deoxy Ribonucleic Acid sequence data transfer. It will check the integrity of the Deoxy Ribonucleic Acid sequence data during the access.

Keywords: algorithm, ASP.NET, DNA, encrypt, decrypt

Procedia PDF Downloads 208
27909 An Emphasis on Creativity-Speak Words Increases Crowdfunding Success

Authors: Trayan Kushev, E. Shaunn Mattingly, Andrew S. Manikas

Abstract:

This study utilizes computer-aided text analysis (CATA) on the descriptions of 248,614 Kickstarter crowdfunding campaigns to reveal that backers are more likely to provide funding to projects that contain a higher percentage of creativity-speak words. Further, this relationship is observed to be stronger for product-based campaigns (e.g., games, technology, design) and weaker for content-based campaigns (e.g., film, music, publishing). In addition, both positive linguistic tone and the use of words expressing gratitude in the text of the campaign strengthen the positive effect of creativity-speak on campaign success.

Keywords: creativity-speak, crowdfunding, entrepreneurship, gratitude, tone

Procedia PDF Downloads 47
27908 Linguistic Analysis of Argumentation Structures in Georgian Political Speeches

Authors: Mariam Matiashvili

Abstract:

Argumentation is an integral part of our daily communications - formal or informal. Argumentative reasoning, techniques, and language tools are used both in personal conversations and in the business environment. Verbalization of the opinions requires the use of extraordinary syntactic-pragmatic structural quantities - arguments that add credibility to the statement. The study of argumentative structures allows us to identify the linguistic features that make the text argumentative. Knowing what elements make up an argumentative text in a particular language helps the users of that language improve their skills. Also, natural language processing (NLP) has become especially relevant recently. In this context, one of the main emphases is on the computational processing of argumentative texts, which will enable the automatic recognition and analysis of large volumes of textual data. The research deals with the linguistic analysis of the argumentative structures of Georgian political speeches - particularly the linguistic structure, characteristics, and functions of the parts of the argumentative text - claims, support, and attack statements. The research aims to describe the linguistic cues that give the sentence a judgmental/controversial character and helps to identify reasoning parts of the argumentative text. The empirical data comes from the Georgian Political Corpus, particularly TV debates. Consequently, the texts are of a dialogical nature, representing a discussion between two or more people (most often between a journalist and a politician). The research uses the following approaches to identify and analyze the argumentative structures Lexical Classification & Analysis - Identify lexical items that are relevant in argumentative texts creating process - Creating the lexicon of argumentation (presents groups of words gathered from a semantic point of view); Grammatical Analysis and Classification - means grammatical analysis of the words and phrases identified based on the arguing lexicon. Argumentation Schemas - Describe and identify the Argumentation Schemes that are most likely used in Georgian Political Speeches. As a final step, we analyzed the relations between the above mentioned components. For example, If an identified argument scheme is “Argument from Analogy”, identified lexical items semantically express analogy too, and they are most likely adverbs in Georgian. As a result, we created the lexicon with the words that play a significant role in creating Georgian argumentative structures. Linguistic analysis has shown that verbs play a crucial role in creating argumentative structures.

Keywords: georgian, argumentation schemas, argumentation structures, argumentation lexicon

Procedia PDF Downloads 51
27907 ViraPart: A Text Refinement Framework for Automatic Speech Recognition and Natural Language Processing Tasks in Persian

Authors: Narges Farokhshad, Milad Molazadeh, Saman Jamalabbasi, Hamed Babaei Giglou, Saeed Bibak

Abstract:

The Persian language is an inflectional subject-object-verb language. This fact makes Persian a more uncertain language. However, using techniques such as Zero-Width Non-Joiner (ZWNJ) recognition, punctuation restoration, and Persian Ezafe construction will lead us to a more understandable and precise language. In most of the works in Persian, these techniques are addressed individually. Despite that, we believe that for text refinement in Persian, all of these tasks are necessary. In this work, we proposed a ViraPart framework that uses embedded ParsBERT in its core for text clarifications. First, used the BERT variant for Persian followed by a classifier layer for classification procedures. Next, we combined models outputs to output cleartext. In the end, the proposed model for ZWNJ recognition, punctuation restoration, and Persian Ezafe construction performs the averaged F1 macro scores of 96.90%, 92.13%, and 98.50%, respectively. Experimental results show that our proposed approach is very effective in text refinement for the Persian language.

Keywords: Persian Ezafe, punctuation, ZWNJ, NLP, ParsBERT, transformers

Procedia PDF Downloads 182
27906 Applying Dictogloss Technique to Improve Auditory Learners’ Writing Skills in Second Language Learning

Authors: Aji Budi Rinekso

Abstract:

There are some common problems that are often faced by students in writing. The problems are related to macro and micro skills of writing, such as incorrect spellings, inappropriate diction, grammatical errors, random ideas, and irrelevant supporting sentences. Therefore, it is needed a teaching technique that can solve those problems. Dictogloss technique is a teaching technique that involves listening practices. So, it is a suitable teaching technique for students with auditory learning style. Dictogloss technique comprises of four basic steps; (1) warm up, (2) dictation, (3) reconstruction and (4) analysis and correction. Warm up is when students find out about topics and do some preparatory vocabulary works. Then, dictation is when the students listen to texts read at normal speed by a teacher. The text is read by the teacher twice where at the first reading the students only listen to the teacher and at the second reading the students listen to the teacher again and take notes. Next, reconstruction is when the students discuss the information from the text read by the teacher and start to write a text. Lastly, analysis and correction are when the students check their writings and revise them. Dictogloss offers some advantages in relation to the efforts of improving writing skills. Through the use of dictogloss technique, students can solve their problems both on macro skills and micro skills. Easier to generate ideas and better writing mechanics are the benefits of dictogloss.

Keywords: auditory learners, writing skills, dictogloss technique, second language learning

Procedia PDF Downloads 123
27905 Feature Selection for Production Schedule Optimization in Transition Mines

Authors: Angelina Anani, Ignacio Ortiz Flores, Haitao Li

Abstract:

The use of underground mining methods have increased significantly over the past decades. This increase has also been spared on by several mines transitioning from surface to underground mining. However, determining the transition depth can be a challenging task, especially when coupled with production schedule optimization. Several researchers have simplified the problem by excluding operational features relevant to production schedule optimization. Our research objective is to investigate the extent to which operational features of transition mines accounted for affect the optimal production schedule. We also provide a framework for factors to consider in production schedule optimization for transition mines. An integrated mixed-integer linear programming (MILP) model is developed that maximizes the NPV as a function of production schedule and transition depth. A case study is performed to validate the model, with a comparative sensitivity analysis to obtain operational insights.

Keywords: underground mining, transition mines, mixed-integer linear programming, production schedule

Procedia PDF Downloads 140
27904 Comparative Analysis of Classification Methods in Determining Non-Active Student Characteristics in Indonesia Open University

Authors: Dewi Juliah Ratnaningsih, Imas Sukaesih Sitanggang

Abstract:

Classification is one of data mining techniques that aims to discover a model from training data that distinguishes records into the appropriate category or class. Data mining classification methods can be applied in education, for example, to determine the classification of non-active students in Indonesia Open University. This paper presents a comparison of three methods of classification: Naïve Bayes, Bagging, and C.45. The criteria used to evaluate the performance of three methods of classification are stratified cross-validation, confusion matrix, the value of the area under the ROC Curve (AUC), Recall, Precision, and F-measure. The data used for this paper are from the non-active Indonesia Open University students in registration period of 2004.1 to 2012.2. Target analysis requires that non-active students were divided into 3 groups: C1, C2, and C3. Data analyzed are as many as 4173 students. Results of the study show: (1) Bagging method gave a high degree of classification accuracy than Naïve Bayes and C.45, (2) the Bagging classification accuracy rate is 82.99 %, while the Naïve Bayes and C.45 are 80.04 % and 82.74 % respectively, (3) the result of Bagging classification tree method has a large number of nodes, so it is quite difficult in decision making, (4) classification of non-active Indonesia Open University student characteristics uses algorithms C.45, (5) based on the algorithm C.45, there are 5 interesting rules which can describe the characteristics of non-active Indonesia Open University students.

Keywords: comparative analysis, data mining, clasiffication, Bagging, Naïve Bayes, C.45, non-active students, Indonesia Open University

Procedia PDF Downloads 296
27903 Understanding the Challenges of Lawbook Translation via the Framework of Functional Theory of Language

Authors: Tengku Sepora Tengku Mahadi

Abstract:

Where the speed of book writing lags behind the high need for such material for tertiary studies, translation offers a way to enhance the equilibrium in this demand-supply equation. Nevertheless, translation is confronted by obstacles that threaten its effectiveness. The primary challenge to the production of efficient translations may well be related to the text-type and in terms of its complexity. A text that is intricately written with unique rhetorical devices, subject-matter foundation and cultural references will undoubtedly challenge the translator. Longer time and greater effort would be the consequence. To understand these text-related challenges, the present paper set out to analyze a lawbook entitled Learning the Law by David Melinkoff. The book is chosen because it has often been used as a textbook or for reference in many law courses in the United Kingdom and has seen over thirteen editions; therefore, it can be said to be a worthy book for studies in law. Another reason is the existence of a ready translation in Malay. Reference to this translation enables confirmation to some extent of the potential problems that might occur in its translation. Understanding the organization and the language of the book will help translators to prepare themselves better for the task. They can anticipate the research and time that may be needed to produce an effective translation. Another premise here is that this text-type implies certain ways of writing and organization. Accordingly, it seems practicable to adopt the functional theory of language as suggested by Michael Halliday as its theoretical framework. Concepts of the context of culture, the context of situation and measures of the field, tenor and mode form the instruments for analysis. Additional examples from similar materials can also be used to validate the findings. Some interesting findings include the presence of several other text-types or sub-text-types in the book and the dependence on literary discourse and devices to capture the meanings better or add color to the dry field of law. In addition, many elements of culture can be seen, for example, the use of familiar alternatives, allusions, and even terminology and references that date back to various periods of time and languages. Also found are parts which discuss origins of words and terms that may be relevant to readers within the United Kingdom but make little sense to readers of the book in other languages. In conclusion, the textual analysis in terms of its functions and the linguistic and textual devices used to achieve them can then be applied as a guide to determine the effectiveness of the translation that is produced.

Keywords: functional theory of language, lawbook text-type, rhetorical devices, culture

Procedia PDF Downloads 125
27902 Improved Processing Speed for Text Watermarking Algorithm in Color Images

Authors: Hamza A. Al-Sewadi, Akram N. A. Aldakari

Abstract:

Copyright protection and ownership proof of digital multimedia are achieved nowadays by digital watermarking techniques. A text watermarking algorithm for protecting the property rights and ownership judgment of color images is proposed in this paper. Embedding is achieved by inserting texts elements randomly into the color image as noise. The YIQ image processing model is found to be faster than other image processing methods, and hence, it is adopted for the embedding process. An optional choice of encrypting the text watermark before embedding is also suggested (in case required by some applications), where, the text can is encrypted using any enciphering technique adding more difficulty to hackers. Experiments resulted in embedding speed improvement of more than double the speed of other considered systems (such as least significant bit method, and separate color code methods), and a fairly acceptable level of peak signal to noise ratio (PSNR) with low mean square error values for watermarking purposes.

Keywords: steganography, watermarking, time complexity measurements, private keys

Procedia PDF Downloads 122