Search results for: text extraction
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3078

Search results for: text extraction

2928 Poetics of the Connecting ha’: A Textual Study in the Poetry of Al-Husari Al-Qayrawani

Authors: Mahmoud al-Ashiriy

Abstract:

This paper begins from the idea that the real history of literature is the history of its style. And since the rhyme –as known- is not merely the last letter, that have received a lot of analysis and investigation, but it is a collection of other values in addition to its different markings. This paper will explore the work of the connecting ha’ and its effectiveness in shaping the text of poetry, since it establishes vocal rhythms in addition to its role in indicating references through the pronoun, vertically through the poem through the sequence of its verses, also horizontally through what environs the one verse of sentences. If the scientific formation of prosody stopped at the possibilities and prohibitions; literary criticism and poetry studies should explore what is above the rule of aesthetic horizon of poetic effectiveness that varies from a text to another, a poet to another, a literary period to another, or from a poetic taste to another. Then the paper will explore this poetic essence in the texts of the famous Andalusian Poet Al-Husari Al-Qayrawani through his well-known Daliyya (a poem that its verses end with the letter D), and the role of the connecting ha’ in fulfilling its text and the accomplishment of its poetics, departing from this to the diwan (the big collection of poems) also as a higher text that surpasses the text/poem, and through what it represents of effectiveness the work of the phenomenon in accomplishing the poetics of the poem of Al-Husari Al-Qayrawani who is one of the pillars of Arabic poetics in Andalusia.

Keywords: Al-Husari Al-Qayrawni, poetics, rhyme, stylistics, science of the text

Procedia PDF Downloads 533
2927 Improvement of Protein Extraction From Shrimp by Product Used for Electrospinning by Applying Emerging Technologies

Authors: Mario Pérez-Won, Vilbett Briones L., Guido Trautmann, María José Bugueño, Gipsy Tabilo-Munizaga, Luis Gonzalez-Cavieres

Abstract:

The fishing industry generates a significant amount of shrimp byproducts, which often result in environmental contamination. Protein extraction from these by-products is a potential solution to minimize waste and revalue the by-products. To improve the extraction of proteins (by chemical method) from shrimp (Pleuroncodes monodon) by-products, the emerging technologies of ohmic heating (OH), microwaves (MW) and pulsed electric fields (PEF) were used. The results show that microwaves, electrical pulses, and ohmic heating improved performance by 28.19%, 19.25%, and 3.65%, respectively. Furthermore, conformational changes were studied by DSC and FTIR. Subsequently, the use of these proteins in electrospinning technology was evaluated. In conclusion, this study demonstrates that the application of emerging technologies, can significantly improve the extraction yield of proteins from shrimp by-products.

Keywords: electrospinning, emerging technologies, improving extraction, shrimp by-products

Procedia PDF Downloads 43
2926 Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition

Authors: L. Hamsaveni, Navya Prakash, Suresha

Abstract:

Document Image Analysis recognizes text and graphics in documents acquired as images. An approach without Optical Character Recognition (OCR) for degraded document image analysis has been adopted in this paper. The technique involves document imaging methods such as Image Fusing and Speeded Up Robust Features (SURF) Detection to identify and extract the degraded regions from a set of document images to obtain an original document with complete information. In case, degraded document image captured is skewed, it has to be straightened (deskew) to perform further process. A special format of image storing known as YCbCr is used as a tool to convert the Grayscale image to RGB image format. The presented algorithm is tested on various types of degraded documents such as printed documents, handwritten documents, old script documents and handwritten image sketches in documents. The purpose of this research is to obtain an original document for a given set of degraded documents of the same source.

Keywords: grayscale image format, image fusing, RGB image format, SURF detection, YCbCr image format

Procedia PDF Downloads 342
2925 Detecting Paraphrases in Arabic Text

Authors: Amal Alshahrani, Allan Ramsay

Abstract:

Paraphrasing is one of the important tasks in natural language processing; i.e. alternative ways to express the same concept by using different words or phrases. Paraphrases can be used in many natural language applications, such as Information Retrieval, Machine Translation, Question Answering, Text Summarization, or Information Extraction. To obtain pairs of sentences that are paraphrases we create a system that automatically extracts paraphrases from a corpus, which is built from different sources of news article since these are likely to contain paraphrases when they report the same event on the same day. There are existing simple standard approaches (e.g. TF-IDF vector space, cosine similarity) and alignment technique (e.g. Dynamic Time Warping (DTW)) for extracting paraphrase which have been applied to the English. However, the performance of these approaches could be affected when they are applied to another language, for instance Arabic language, due to the presence of phenomena which are not present in English, such as Free Word Order, Zero copula, and Pro-dropping. These phenomena will affect the performance of these algorithms. Thus, if we can analysis how the existing algorithms for English fail for Arabic then we can find a solution for Arabic. The results are promising.

Keywords: natural language processing, TF-IDF, cosine similarity, dynamic time warping (DTW)

Procedia PDF Downloads 354
2924 Web Data Scraping Technology Using Term Frequency Inverse Document Frequency to Enhance the Big Data Quality on Sentiment Analysis

Authors: Sangita Pokhrel, Nalinda Somasiri, Rebecca Jeyavadhanam, Swathi Ganesan

Abstract:

Tourism is a booming industry with huge future potential for global wealth and employment. There are countless data generated over social media sites every day, creating numerous opportunities to bring more insights to decision-makers. The integration of Big Data Technology into the tourism industry will allow companies to conclude where their customers have been and what they like. This information can then be used by businesses, such as those in charge of managing visitor centers or hotels, etc., and the tourist can get a clear idea of places before visiting. The technical perspective of natural language is processed by analysing the sentiment features of online reviews from tourists, and we then supply an enhanced long short-term memory (LSTM) framework for sentiment feature extraction of travel reviews. We have constructed a web review database using a crawler and web scraping technique for experimental validation to evaluate the effectiveness of our methodology. The text form of sentences was first classified through Vader and Roberta model to get the polarity of the reviews. In this paper, we have conducted study methods for feature extraction, such as Count Vectorization and TFIDF Vectorization, and implemented Convolutional Neural Network (CNN) classifier algorithm for the sentiment analysis to decide the tourist’s attitude towards the destinations is positive, negative, or simply neutral based on the review text that they posted online. The results demonstrated that from the CNN algorithm, after pre-processing and cleaning the dataset, we received an accuracy of 96.12% for the positive and negative sentiment analysis.

Keywords: counter vectorization, convolutional neural network, crawler, data technology, long short-term memory, web scraping, sentiment analysis

Procedia PDF Downloads 53
2923 A Clustering Algorithm for Massive Texts

Authors: Ming Liu, Chong Wu, Bingquan Liu, Lei Chen

Abstract:

Internet users have to face the massive amount of textual data every day. Organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering, in fact, is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection. This situation mainly attributes to the high- dimensional vectors generated from texts. To effectively and efficiently cluster large-scale text collection, this paper proposes a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster’s representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature’s weight is fine-tuned by iterative process. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster’s representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high quality on both small-scale and large-scale text collections.

Keywords: vector reconstruction, large-scale text clustering, partial tuning sub-process, overall tuning sub-process

Procedia PDF Downloads 404
2922 A Text Classification Approach Based on Natural Language Processing and Machine Learning Techniques

Authors: Rim Messaoudi, Nogaye-Gueye Gning, François Azelart

Abstract:

Automatic text classification applies mostly natural language processing (NLP) and other AI-guided techniques to automatically classify text in a faster and more accurate manner. This paper discusses the subject of using predictive maintenance to manage incident tickets inside the sociality. It focuses on proposing a tool that treats and analyses comments and notes written by administrators after resolving an incident ticket. The goal here is to increase the quality of these comments. Additionally, this tool is based on NLP and machine learning techniques to realize the textual analytics of the extracted data. This approach was tested using real data taken from the French National Railways (SNCF) company and was given a high-quality result.

Keywords: machine learning, text classification, NLP techniques, semantic representation

Procedia PDF Downloads 63
2921 Multi-source Question Answering Framework Using Transformers for Attribute Extraction

Authors: Prashanth Pillai, Purnaprajna Mangsuli

Abstract:

Oil exploration and production companies invest considerable time and efforts to extract essential well attributes (like well status, surface, and target coordinates, wellbore depths, event timelines, etc.) from unstructured data sources like technical reports, which are often non-standardized, multimodal, and highly domain-specific by nature. It is also important to consider the context when extracting attribute values from reports that contain information on multiple wells/wellbores. Moreover, semantically similar information may often be depicted in different data syntax representations across multiple pages and document sources. We propose a hierarchical multi-source fact extraction workflow based on a deep learning framework to extract essential well attributes at scale. An information retrieval module based on the transformer architecture was used to rank relevant pages in a document source utilizing the page image embeddings and semantic text embeddings. A question answering framework utilizingLayoutLM transformer was used to extract attribute-value pairs incorporating the text semantics and layout information from top relevant pages in a document. To better handle context while dealing with multi-well reports, we incorporate a dynamic query generation module to resolve ambiguities. The extracted attribute information from various pages and documents are standardized to a common representation using a parser module to facilitate information comparison and aggregation. Finally, we use a probabilistic approach to fuse information extracted from multiple sources into a coherent well record. The applicability of the proposed approach and related performance was studied on several real-life well technical reports.

Keywords: natural language processing, deep learning, transformers, information retrieval

Procedia PDF Downloads 166
2920 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance

Procedia PDF Downloads 115
2919 N-Type GaN Thinning for Enhancing Light Extraction Efficiency in GaN-Based Thin-Film Flip-Chip Ultraviolet (UV) Light Emitting Diodes (LED)

Authors: Anil Kawan, Soon Jae Yu, Jong Min Park

Abstract:

GaN-based 365 nm wavelength ultraviolet (UV) light emitting diodes (LED) have various applications: curing, molding, purification, deodorization, and disinfection etc. However, their usage is limited by very low output power, because of the light absorption in the GaN layers. In this study, we demonstrate a method utilizing removal of 365 nm absorption layer buffer GaN and thinning the n-type GaN so as to improve the light extraction efficiency of the GaN-based 365 nm UV LED. The UV flip chip LEDs of chip size 1.3 mm x 1.3 mm were fabricated using GaN epilayers on a sapphire substrate. Via-hole n-type contacts and highly reflective Ag metal were used for efficient light extraction. LED wafer was aligned and bonded to AlN carrier wafer. To improve the extraction efficiency of the flip chip LED, sapphire substrate and absorption layer buffer GaN were removed by using laser lift-off and dry etching, respectively. To further increase the extraction efficiency of the LED, exposed n-type GaN thickness was reduced by using inductively coupled plasma etching.

Keywords: extraction efficiency, light emitting diodes, n-GaN thinning, ultraviolet

Procedia PDF Downloads 392
2918 Symmetric Key Encryption Algorithm Using Indian Traditional Musical Scale for Information Security

Authors: Aishwarya Talapuru, Sri Silpa Padmanabhuni, B. Jyoshna

Abstract:

Cryptography helps in preventing threats to information security by providing various algorithms. This study introduces a new symmetric key encryption algorithm for information security which is linked with the "raagas" which means Indian traditional scale and pattern of music notes. This algorithm takes the plain text as input and starts its encryption process. The algorithm then randomly selects a raaga from the list of raagas that is assumed to be present with both sender and the receiver. The plain text is associated with the thus selected raaga and an intermediate cipher-text is formed as the algorithm converts the plain text characters into other characters, depending upon the rules of the algorithm. This intermediate code or cipher text is arranged in various patterns in three different rounds of encryption performed. The total number of rounds in the algorithm is equal to the multiples of 3. To be more specific, the outcome or output of the sequence of first three rounds is again passed as the input to this sequence of rounds recursively, till the total number of rounds of encryption is performed. The raaga selected by the algorithm and the number of rounds performed will be specified at an arbitrary location in the key, in addition to important information regarding the rounds of encryption, embedded in the key which is known by the sender and interpreted only by the receiver, thereby making the algorithm hack proof. The key can be constructed of any number of bits without any restriction to the size. A software application is also developed to demonstrate this process of encryption, which dynamically takes the plain text as input and readily generates the cipher text as output. Therefore, this algorithm stands as one of the strongest tools for information security.

Keywords: cipher text, cryptography, plaintext, raaga

Procedia PDF Downloads 261
2917 Optimization of Ultrasonic Assisted Extraction of Antioxidants and Phenolic Compounds from Coleus Using Response Surface Methodology

Authors: Reihaneh Ahmadzadeh Ghavidel

Abstract:

Free radicals such as reactive oxygen species (ROS) have detrimental effects on human health through several mechanisms. On the other hand, antioxidant molecules reduce free radical generation in biologic systems. Synthetic antioxidants, which are used in food industry, have also negative impact on human health. Therefore recognition of natural antioxidants such as anthocyanins can solve these problems simultaneously. Coleus (Solenostemon scutellarioides) with red leaves is a rich source of anthocyanins compounds. In this study we evaluated the effect of time (10, 20 and 30 min) and temperature (40, 50 and 60° C) on optimization of anthocyanin extraction using surface response method. In addition, the study was aimed to determine maximum extraction for anthocyanin from coleus plant using ultrasound method. The results indicated that the optimum conditions for extraction were 39.84 min at 69.25° C. At this point, total compounds were achieved 3.7451 mg 100 ml⁻¹. Furthermore, under optimum conditions, anthocyanin concentration, extraction efficiency, ferric reducing ability, total phenolic compounds and EC50 were registered 3.221931, 6.692765, 223.062, 3355.605 and 2.614045, respectively.

Keywords: anthocyanin, antioxidant, coleus, extraction, sonication

Procedia PDF Downloads 290
2916 Finding Related Scientific Documents Using Formal Concept Analysis

Authors: Nadeem Akhtar, Hira Javed

Abstract:

An important aspect of research is literature survey. Availability of a large amount of literature across different domains triggers the need for optimized systems which provide relevant literature to researchers. We propose a search system based on keywords for text documents. This experimental approach provides a hierarchical structure to the document corpus. The documents are labelled with keywords using KEA (Keyword Extraction Algorithm) and are automatically organized in a lattice structure using Formal Concept Analysis (FCA). This groups the semantically related documents together. The hierarchical structure, based on keywords gives out only those documents which precisely contain them. This approach open doors for multi-domain research. The documents across multiple domains which are indexed by similar keywords are grouped together. A hierarchical relationship between keywords is obtained. To signify the effectiveness of the approach, we have carried out the experiment and evaluation on Semeval-2010 Dataset. Results depict that the presented method is considerably successful in indexing of scientific papers.

Keywords: formal concept analysis, keyword extraction algorithm, scientific documents, lattice

Procedia PDF Downloads 303
2915 Moderate Electric Field Influence on Carotenoids Extraction Time from Heterochlorella luteoviridis

Authors: Débora P. Jaeschke, Eduardo A. Merlo, Rosane Rech, Giovana D. Mercali, Ligia D. F. Marczak

Abstract:

Carotenoids are high value added pigments that can be alternatively extracted from some microalgae species. However, the application of carotenoids synthetized by microalgae is still limited due to the utilization of organic toxic solvents. In this context, studies involving alternative extraction methods have been conducted with more sustainable solvents to replace and reduce the solvent volume and the extraction time. The aim of the present work was to evaluate the extraction time of carotenoids from the microalgae Heterochlorella luteoviridis using moderate electric field (MEF) as a pre-treatment to the extraction. The extraction methodology consisted of a pre-treatment in the presence of MEF (180 V) and ethanol (25 %, v/v) for 10 min, followed by a diffusive step performed for 50 min using a higher ethanol concentration (75 %, v/v). The extraction experiments were conducted at 30 °C and, to keep the temperature at this value, it was used an extraction cell with a water jacket that was connected to a water bath. Also, to enable the evaluation of MEF effect on the extraction, control experiments were performed using the same cell and conditions without voltage application. During the extraction experiments, samples were withdrawn at 1, 5 and 10 min of the pre-treatment and at 1, 5, 30, 40 and 50 min of the diffusive step. Samples were, then, centrifuged and carotenoids analyses were performed in the supernatant. Furthermore, an exhaustive extraction with ethyl acetate and methanol was performed, and the carotenoids content found for this analyses was considered as the total carotenoids content of the microalgae. The results showed that the application of MEF as a pre-treatment to the extraction influenced the extraction yield and the extraction time during the diffusive step; after the MEF pre-treatment and 50 min of the diffusive step, it was possible to extract up to 60 % of the total carotenoids content. Also, results found for carotenoids concentration of the extracts withdrawn at 5 and 30 min of the diffusive step did not presented statistical difference, meaning that carotenoids diffusion occurs mainly in the very beginning of the extraction. On the other hand, the results for control experiments showed that carotenoids diffusion occurs mostly during 30 min of the diffusive step, which evidenced MEF effect on the extraction time. Moreover, carotenoids concentration on samples withdrawn during the pre-treatment (1, 5 and 10 min) were below the quantification limit of the analyses, indicating that the extraction occurred in the diffusive step, when ethanol (75 %, v/v) was added to the medium. It is possible that MEF promoted cell membrane permeabilization and, when ethanol (75 %) was added, carotenoids interacted with the solvent and the diffusion occurred easily. Based on the results, it is possible to infer that MEF promoted the decrease of carotenoids extraction time due to the increasing of the permeability of the cell membrane which facilitates the diffusion from the cell to the medium.

Keywords: moderate electric field (MEF), pigments, microalgae, ethanol

Procedia PDF Downloads 429
2914 Optimizing Microwave Assisted Extraction of Anti-Diabetic Plant Tinospora cordifolia Used in Ayush System for Estimation of Berberine Using Taguchi L-9 Orthogonal Design

Authors: Saurabh Satija, Munish Garg

Abstract:

Present work reports an efficient extraction method using microwaves based solvent–sample duo-heating mechanism, for the extraction of an important anti-diabetic plant Tinospora cordifolia from AYUSH system for estimation of berberine content. The process is based on simultaneous heating of sample matrix and extracting solvent under microwave energy. Methanol was used as the extracting solvent, which has excellent berberine solubilizing power and warms up under microwave attributable to its great dispersal factor. Extraction conditions like time of irradition, microwave power, solute-solvent ratio and temperature were optimized using Taguchi design and berberine was quantified using high performance thin layer chromatography. The ranked optimized parameters were microwave power (rank 1), irradiation time (rank 2) and temperature (rank 3). This kind of extraction mechanism under dual heating provided choice of extraction parameters for better precision and higher yield with significant reduction in extraction time under optimum extraction conditions. This developed extraction protocol will lead to extract higher amounts of berberine which is a major anti-diabetic moiety in Tinospora cordifolia which can lead to development of cheaper formulations of the plant Tinospora cordifolia and can help in rapid prevention of diabetes in the world.

Keywords: berberine, microwave, optimization, Taguchi

Procedia PDF Downloads 312
2913 The Effects of Watching Text-Relevant Video Segments with/without Subtitles on Vocabulary Development of Arabic as a Foreign Language Learners

Authors: Amirreza Karami, Hawraa Nafea Hameed Alzouwain, Freddie A. Bowles

Abstract:

This study investigates the effects of watching text-relevant video segments with/without subtitles on vocabulary development of Arabic as a Foreign Language (AFL) learners. The participants of the study were assigned to two groups: one control group and one experimental group. The control group received no video-based instruction while the experimental group watched a text-relevant video segment in three stages: pre, while, and post-instruction. The preliminary results of the pre-test and post-test show that watching text-relevant video segments through following a pre-while-post procedure can help the vocabulary development of AFL learners more than non-video-based instruction.

Keywords: text-relevant video segments, vocabulary development, Arabic as a Foreign Language, AFL, pre-while-post instruction

Procedia PDF Downloads 134
2912 Response Surface Modeling of Lactic Acid Extraction by Emulsion Liquid Membrane: Box-Behnken Experimental Design

Authors: A. Thakur, P. S. Panesar, M. S. Saini

Abstract:

Extraction of lactic acid by emulsion liquid membrane technology (ELM) using n-trioctyl amine (TOA) in n-heptane as carrier within the organic membrane along with sodium carbonate as acceptor phase was optimized by using response surface methodology (RSM). A three level Box-Behnken design was employed for experimental design, analysis of the results and to depict the combined effect of five independent variables, vizlactic acid concentration in aqueous phase (cl), sodium carbonate concentration in stripping phase (cs), carrier concentration in membrane phase (ψ), treat ratio (φ), and batch extraction time (τ) with equal volume of organic and external aqueous phase on lactic acid extraction efficiency. The maximum lactic acid extraction efficiency (ηext) of 98.21%from aqueous phase in a batch reactor using ELM was found at the optimized values for test variables, cl, cs,, ψ, φ and τ as 0.06 [M], 0.18 [M], 4.72 (%,v/v), 1.98 (v/v) and 13.36 min respectively.

Keywords: emulsion liquid membrane, extraction, lactic acid, n-trioctylamine, response surface methodology

Procedia PDF Downloads 354
2911 Surfactant-Assisted Aqueous Extraction of Residual Oil from Palm-Pressed Mesocarp Fibre

Authors: Rabitah Zakaria, Chan M. Luan, Nor Hakimah Ramly

Abstract:

The extraction of vegetable oil using aqueous extraction process assisted by ionic extended surfactant has been investigated as an alternative to hexane extraction. However, the ionic extended surfactant has not been commercialised and its safety with respect to food processing is uncertain. Hence, food-grade non-ionic surfactants (Tween 20, Span 20, and Span 80) were proposed for the extraction of residual oil from palm-pressed mesocarp fibre. Palm-pressed mesocarp fibre contains a significant amount of residual oil ( 5-10 wt %) and its recovery is beneficial as the oil contains much higher content of vitamin E, carotenoids, and sterols compared to crude palm oil. In this study, the formulation of food-grade surfactants using a combination of high hydrophilic-lipophilic balance (HLB) surfactants and low HLB surfactants to produce micro-emulsion with very low interfacial tension (IFT) was investigated. The suitable surfactant formulation was used in the oil extraction process and the efficiency of the extraction was correlated with the IFT, droplet size and viscosity. It was found that a ternary surfactant mixture with a HLB value of 15 (82% Tween 20, 12% Span 20 and 6% Span 80) was able to produce micro-emulsion with very low IFT compared to other HLB combinations. Results suggested that the IFT and droplet size highly affect the oil recovery efficiency. Finally, optimization of the operating parameters shows that the highest extraction efficiency of 78% was achieved at 1:31 solid to liquid ratio, 2 wt % surfactant solution, temperature of 50˚C, and 50 minutes contact time.

Keywords: food-grade surfactants, aqueous extraction of residual oil, palm-pressed mesocarp fibre, interfacial tension

Procedia PDF Downloads 366
2910 A Study of Various Ontology Learning Systems from Text and a Look into Future

Authors: Fatima Al-Aswadi, Chan Yong

Abstract:

With the large volume of unstructured data that increases day by day on the web, the motivation of representing the knowledge in this data in the machine processable form is increased. Ontology is one of the major cornerstones of representing the information in a more meaningful way on the semantic Web. The goal of Ontology learning from text is to elicit and represent domain knowledge in the machine readable form. This paper aims to give a follow-up review on the ontology learning systems from text and some of their defects. Furthermore, it discusses how far the ontology learning process will enhance in the future.

Keywords: concept discovery, deep learning, ontology learning, semantic relation, semantic web

Procedia PDF Downloads 478
2909 Principle Components Updates via Matrix Perturbations

Authors: Aiman Elragig, Hanan Dreiwi, Dung Ly, Idriss Elmabrook

Abstract:

This paper highlights a new approach to look at online principle components analysis (OPCA). Given a data matrix X R,^m x n we characterise the online updates of its covariance as a matrix perturbation problem. Up to the principle components, it turns out that online updates of the batch PCA can be captured by symmetric matrix perturbation of the batch covariance matrix. We have shown that as n→ n0 >> 1, the batch covariance and its update become almost similar. Finally, utilize our new setup of online updates to find a bound on the angle distance of the principle components of X and its update.

Keywords: online data updates, covariance matrix, online principle component analysis, matrix perturbation

Procedia PDF Downloads 167
2908 Determination of Benzatropine in Hair by GC/MS after Liquid-Liquid Extraction (LLE)

Authors: Abdulsallam A. Bakdash, Aiyshah M. Alshehri, Hind M. Alenzi

Abstract:

Benzatropine (benztropine) is used to treat symptoms of Parkinson's disease or involuntary movements due to the side effects of certain psychiatric drugs. We report in this study, results of a procedure for the determination of benzatropine in hair using LLE, once with methanol and second with phosphate buffer (pH 6.0), followed by filtration and then re-extraction with dichloromethane. A GC/MS method was developed and validated for this determination using selected ion monitoring (SIM) detection without derivatization. Linearity established over the concentration range 0.1-20.0 ng/mg hair, and the correlation coefficients were greater than 0.99. Recoveries were 52.2% and 21.1% using methanol and phosphate buffer extraction, respectively. Detection limits of benzatropine in hair were between 0.65 and 3.0 ng/mg hair, while the accuracy were 10.4% and 18.5% (RSD), respectively. We also applied this method to the analysis of soaked hair samples and demonstrated that the LLE using methanol meets the requirement for the analysis of benzatropine in hair.

Keywords: hair analysis, benzatropine, liquid-liquid extraction, GC/MS

Procedia PDF Downloads 380
2907 Solvent Free Microwave Extraction of Essential Oils: A Clean Chemical Processing in the Teaching and Research Laboratory

Authors: M. A. Ferhat, M. N. Boukhatem, F. Chemat

Abstract:

Microwave Clevenger or microwave accelerated distillation (MAD) is a combination of microwave heating and distillation, performed at atmospheric pressure without added any solvent or water. Isolation and concentration of volatile compounds are performed by a single stage. MAD extraction of orange essential oil was studied using fresh orange peel from Valencia late cultivar oranges as the raw material. MAD has been compared with a conventional technique, which used a Clevenger apparatus with hydro-distillation (HD). MAD and HD were compared in term of extraction time, yields, chemical composition and quality of the essential oil, efficiency and costs of the process. Extraction of essential oils from orange peels with MAD was better in terms of energy saving, extraction time (30 min versus 3 h), oxygenated fraction (11.7% versus 7.9%), product yield (0.42% versus 0.39%) and product quality. Orange peels treated by MAD and HD were observed by scanning electronic microscopy (SEM). Micrographs provide evidence of more rapid opening of essential oil glands treated by MAD, in contrast to conventional hydro-distillation.

Keywords: clevenger, microwave, extraction; hydro-distillation, essential oil, orange peel

Procedia PDF Downloads 319
2906 Biofilm Text Classifiers Developed Using Natural Language Processing and Unsupervised Learning Approach

Authors: Kanika Gupta, Ashok Kumar

Abstract:

Biofilms are dense, highly hydrated cell clusters that are irreversibly attached to a substratum, to an interface or to each other, and are embedded in a self-produced gelatinous matrix composed of extracellular polymeric substances. Research in biofilm field has become very significant, as biofilm has shown high mechanical resilience and resistance to antibiotic treatment and constituted as a significant problem in both healthcare and other industry related to microorganisms. The massive information both stated and hidden in the biofilm literature are growing exponentially therefore it is not possible for researchers and practitioners to automatically extract and relate information from different written resources. So, the current work proposes and discusses the use of text mining techniques for the extraction of information from biofilm literature corpora containing 34306 documents. It is very difficult and expensive to obtain annotated material for biomedical literature as the literature is unstructured i.e. free-text. Therefore, we considered unsupervised approach, where no annotated training is necessary and using this approach we developed a system that will classify the text on the basis of growth and development, drug effects, radiation effects, classification and physiology of biofilms. For this, a two-step structure was used where the first step is to extract keywords from the biofilm literature using a metathesaurus and standard natural language processing tools like Rapid Miner_v5.3 and the second step is to discover relations between the genes extracted from the whole set of biofilm literature using pubmed.mineR_v1.0.11. We used unsupervised approach, which is the machine learning task of inferring a function to describe hidden structure from 'unlabeled' data, in the above-extracted datasets to develop classifiers using WinPython-64 bit_v3.5.4.0Qt5 and R studio_v0.99.467 packages which will automatically classify the text by using the mentioned sets. The developed classifiers were tested on a large data set of biofilm literature which showed that the unsupervised approach proposed is promising as well as suited for a semi-automatic labeling of the extracted relations. The entire information was stored in the relational database which was hosted locally on the server. The generated biofilm vocabulary and genes relations will be significant for researchers dealing with biofilm research, making their search easy and efficient as the keywords and genes could be directly mapped with the documents used for database development.

Keywords: biofilms literature, classifiers development, text mining, unsupervised learning approach, unstructured data, relational database

Procedia PDF Downloads 141
2905 Membranes for Direct Lithium Extraction (DLE)

Authors: Amir Razmjou, Elika Karbassi Yazdi

Abstract:

Several direct lithium extraction (DLE) technologies have been developed for Li extraction from different brines. Although laboratory studies showed that they can technically recover Li to 90%, challenges still remain in developing a sustainable process that can serve as a foundation for the lithium dependent low-carbon economy. There is a continuing quest for DLE technologies that do not need extensive pre-treatments, fewer materials, and have simplified extraction processes with high Li selectivity. Here, an overview of DLE technologies will be provided with an emphasis on the basic principles of the materials’ design for the development of membranes with nanochannels and nanopores with Li ion selectivity. We have used a variety of building blocks such as nano-clay, organic frameworks, Graphene/oxide, MXene, etc., to fabricate the membranes. Molecular dynamic simulation (MD) and density functional theory (DFT) were used to reveal new mechanisms by which high Li selectivity was obtained.

Keywords: lithium recovery, membrane, lithium selectivity, decarbonization

Procedia PDF Downloads 72
2904 Teaching Pragmatic Coherence in Literary Text: Analysis of Chimamanda Adichie’s Americanah

Authors: Joy Aworo-Okoroh

Abstract:

Literary texts are mirrors of a real-life situation. Thus, authors choose the linguistic items that would best encode their intended meanings and messages. However, words mean more than they seem. The meaning of words is not static rather, it is dynamic as they constantly enter into relationships within a context. Literary texts can only be meaningful if all pragmatic cues are identified and interpreted. Drawing upon Teun Van Djik's theory of local pragmatic coherence, it is established that words enter into relations in a text and these relations account for sequential speech acts in the texts. Comprehension of the text is dependent on the interpretation of these relations.To show the relevance of pragmatic coherence in literary text analysis, ten conversations were selected in Americanah in order to give a clear idea of the pragmatic relations used. The conversations were analysed, identifying the speech act and epistemic relations inherent in them. A subtle analysis of the structure of the conversations was also carried out. It was discovered that justification is the most commonly used relation and the meaning of the text is dependent on the interpretation of these instances' pragmatic coherence. The study concludes that to effectively teach literature in English, pragmatic coherence should be incorporated as words mean more than they say.

Keywords: pragmatic coherence, epistemic coherence, speech act, Americanah

Procedia PDF Downloads 107
2903 Lexical Semantic Analysis to Support Ontology Modeling of Maintenance Activities– Case Study of Offshore Riser Integrity

Authors: Vahid Ebrahimipour

Abstract:

Word representation and context meaning of text-based documents play an essential role in knowledge modeling. Business procedures written in natural language are meant to store technical and engineering information, management decision and operation experience during the production system life cycle. Context meaning representation is highly dependent upon word sense, lexical relativity, and sematic features of the argument. This paper proposes a method for lexical semantic analysis and context meaning representation of maintenance activity in a mass production system. Our approach constructs a straightforward lexical semantic approach to analyze facilitates semantic and syntactic features of context structure of maintenance report to facilitate translation, interpretation, and conversion of human-readable interpretation into computer-readable representation and understandable with less heterogeneity and ambiguity. The methodology will enable users to obtain a representation format that maximizes shareability and accessibility for multi-purpose usage. It provides a contextualized structure to obtain a generic context model that can be utilized during the system life cycle. At first, it employs a co-occurrence-based clustering framework to recognize a group of highly frequent contextual features that correspond to a maintenance report text. Then the keywords are identified for syntactic and semantic extraction analysis. The analysis exercises causality-driven logic of keywords’ senses to divulge the structural and meaning dependency relationships between the words in a context. The output is a word contextualized representation of maintenance activity accommodating computer-based representation and inference using OWL/RDF.

Keywords: lexical semantic analysis, metadata modeling, contextual meaning extraction, ontology modeling, knowledge representation

Procedia PDF Downloads 78
2902 Optimization of Process Parameters using Response Surface Methodology for the Removal of Zinc(II) by Solvent Extraction

Authors: B. Guezzen, M.A. Didi, B. Medjahed

Abstract:

A factorial design of experiments and a response surface methodology were implemented to investigate the liquid-liquid extraction process of zinc (II) from acetate medium using the 1-Butyl-imidazolium di(2-ethylhexyl) phosphate [BIm+][D2EHP-]. The optimization process of extraction parameters such as the initial pH effect (2.5, 4.5, and 6.6), ionic liquid concentration (1, 5.5, and 10 mM) and salt effect (0.01, 5, and 10 mM) was carried out using a three-level full factorial design (33). The results of the factorial design demonstrate that all these factors are statistically significant, including the square effects of pH and ionic liquid concentration. The results showed that the order of significance: IL concentration > salt effect > initial pH. Analysis of variance (ANOVA) showing high coefficient of determination (R2 = 0.91) and low probability values (P < 0.05) signifies the validity of the predicted second-order quadratic model for Zn (II) extraction. The optimum conditions for the extraction of zinc (II) at the constant temperature (20 °C), initial Zn (II) concentration (1mM) and A/O ratio of unity were: initial pH (4.8), extractant concentration (9.9 mM), and NaCl concentration (8.2 mM). At the optimized condition, the metal ion could be quantitatively extracted.

Keywords: ionic liquid, response surface methodology, solvent extraction, zinc acetate

Procedia PDF Downloads 344
2901 The Effect of Supercritical Fluid on the Extraction Efficiency of Heavy Metal from Soil

Authors: Haifa El-Sadi, Maria Elektorowicz, Reed Rushing, Ammar Badawieh, Asif Chaudry

Abstract:

Clay soils have particular properties that affect the assessment and remediation of contaminated sites. In clay soils, electro-kinetic transport of heavy metals has been carried out. The transport of these metals is predicated on maintaining a low pH throughout the cell, which, in turn, keeps the metals in the pore water phase where they are accessible to electro-kinetic transport. Supercritical fluid extraction and acid digestion were used for the analysis of heavy metals concentrations after the completion of electro-kinetic experimentation. Supercritical fluid (carbon dioxide) extraction is a new technique used to extract the heavy metal (lead, nickel, calcium and potassium) from clayey soil. The comparison between supercritical extraction and acid digestion of different metals was carried out. Supercritical fluid extraction, using ethylenediaminetetraacetic acid (EDTA) as a modifier, proved to be efficient and a safer technique than acid digestion technique in extracting metals from clayey soil. Mixing time of soil with EDTA before extracting heavy metals from clayey soil was investigated. The optimum and most practical shaking time for the extraction of lead, nickel, calcium and potassium was two hours.

Keywords: clay soil, heavy metals, supercritical fluid extraction, acid digestion

Procedia PDF Downloads 434
2900 Optimization of a Method of Total RNA Extraction from Mentha piperita

Authors: Soheila Afkar

Abstract:

Mentha piperita is a medicinal plant that contains a large amount of secondary metabolite that has adverse effect on RNA extraction. Since high quality of RNA is the first step to real time-PCR, in this study optimization of total RNA isolation from leaf tissues of Mentha piperita was evaluated. From this point of view, we researched two different total RNA extraction methods on leaves of Mentha piperita to find the best one that contributes the high quality. The methods tested are RNX-plus, modified RNX-plus (1-5 numbers). RNA quality was analyzed by agarose gel 1.5%. The RNA integrity was also assessed by visualization of ribosomal RNA bands on 1.5% agarose gels. In the modified RNX-plus method (number 2), the integrity of 28S and 18S rRNA was highly satisfactory when analyzed in agarose denaturing gel, so this method is suitable for RNA isolation from Mentha piperita.

Keywords: Mentha piperita, polyphenol, polysaccharide, RNA extraction

Procedia PDF Downloads 156
2899 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 480