Search results for: Statistical language model N-grams
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8878

Search results for: Statistical language model N-grams

8338 Simultaneous Saccharification and Fermentation(SSF) of Sugarcane Bagasse - Kinetics and Modeling

Authors: E.Sasikumar, T.Viruthagiri

Abstract:

Simultaneous Saccharification and Fermentation (SSF) of sugarcane bagasse by cellulase and Pachysolen tannophilus MTCC *1077 were investigated in the present study. Important process variables for ethanol production form pretreated bagasse were optimized using Response Surface Methodology (RSM) based on central composite design (CCD) experiments. A 23 five level CCD experiments with central and axial points was used to develop a statistical model for the optimization of process variables such as incubation temperature (25–45°) X1, pH (5.0–7.0) X2 and fermentation time (24–120 h) X3. Data obtained from RSM on ethanol production were subjected to the analysis of variance (ANOVA) and analyzed using a second order polynomial equation and contour plots were used to study the interactions among three relevant variables of the fermentation process. The fermentation experiments were carried out using an online monitored modular fermenter 2L capacity. The processing parameters setup for reaching a maximum response for ethanol production was obtained when applying the optimum values for temperature (32°C), pH (5.6) and fermentation time (110 h). Maximum ethanol concentration (3.36 g/l) was obtained from 50 g/l pretreated sugarcane bagasse at the optimized process conditions in aerobic batch fermentation. Kinetic models such as Monod, Modified Logistic model, Modified Logistic incorporated Leudeking – Piret model and Modified Logistic incorporated Modified Leudeking – Piret model have been evaluated and the constants were predicted.

Keywords: Sugarcane bagasse, ethanol, optimization, Pachysolen tannophilus.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2302
8337 Evaluating the Validity of Computational Fluid Dynamics Model of Dispersion in a Complex Urban Geometry Using Two Sets of Experimental Measurements

Authors: Mohammad R. Kavian Nezhad, Carlos F. Lange, Brian A. Fleck

Abstract:

This research presents the validation study of a computational fluid dynamics (CFD) model developed to simulate the scalar dispersion emitted from rooftop sources around the buildings at the University of Alberta North Campus. The ANSYS CFX code was used to perform the numerical simulation of the wind regime and pollutant dispersion by solving the 3D steady Reynolds-averaged Navier-Stokes (RANS) equations on a building-scale high-resolution grid. The validation study was performed in two steps. First, the CFD model performance in 24 cases (eight wind directions and three wind speeds) was evaluated by comparing the predicted flow fields with the available data from the previous measurement campaign designed at the North Campus, using the standard deviation method (SDM), while the estimated results of the numerical model showed maximum average percent errors of approximately 53% and 37% for wind incidents from the North and Northwest, respectively. Good agreement with the measurements was observed for the other six directions, with an average error of less than 30%. In the second step, the reliability of the implemented turbulence model, numerical algorithm, modeling techniques, and the grid generation scheme was further evaluated using the Mock Urban Setting Test (MUST) dispersion dataset. Different statistical measures, including the fractional bias (FB), the mean geometric bias (MG), and the normalized mean square error (NMSE), were used to assess the accuracy of the predicted dispersion field. Our CFD results are in very good agreement with the field measurements.

Keywords: CFD, plume dispersion, complex urban geometry, validation study, wind flow.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 374
8336 Evaluating the Effectiveness of the Use of Scharmer’s Theory-U Model in Action-Learning-Based Leadership Development Program

Authors: Donald C. Lantu, Henndy Ginting, M. Yorga Permana, Dany M. A. Ramdlany

Abstract:

We constructed a training program for top-talents of a Bank with Scharmer Theory-U as the model. In this training program, we implemented the action learning perspective, as it is claimed to be the most effective one currently available. In the process, participants were encouraged to be more involved, especially compared to traditional lecturing. The goal of this study is to assess the effectiveness of this particular training. The program consists of six days non-residential workshop within two months. Between each workshop, the participants were involved in the works of action learning group. They were challenged by dealing with the real problem related to their tasks at work. The participants of the program were 30 best talents who were chosen according to their yearly performance. Using paired difference statistical test in the behavioral assessment, we found that the training was not effective to increase participants’ leadership competencies. For the future development program, we suggested to modify the goals of the program toward the next stage of development.

Keywords: Action learning, behaviour, leadership development, Theory-U.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 941
8335 Adaptive MPC Using a Recursive Learning Technique

Authors: Ahmed Abbas Helmy, M. R. M. Rizk, Mohamed El-Sayed

Abstract:

A model predictive controller based on recursive learning is proposed. In this SISO adaptive controller, a model is automatically updated using simple recursive equations. The identified models are then stored in the memory to be re-used in the future. The decision for model update is taken based on a new control performance index. The new controller allows the use of simple linear model predictive controllers in the control of nonlinear time varying processes.

Keywords: Adaptive control, model predictive control, dynamic matrix control, online model identification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1777
8334 A Fast, Portable Computational Framework for Aerodynamic Simulations

Authors: Mehdi Ghommem, Daniel Garcia, Nathan Collier, Victor Calo

Abstract:

We develop a fast, user-friendly implementation of a potential flow solver based on the unsteady vortex lattice method (UVLM). The computational framework uses the Python programming language which has easy integration with the scripts requiring computationally-expensive operations written in Fortran. The mixed-language approach enables high performance in terms of solution time and high flexibility in terms of easiness of code adaptation to different system configurations and applications. This computational tool is intended to predict the unsteady aerodynamic behavior of multiple moving bodies (e.g., flapping wings, rotating blades, suspension bridges...) subject to an incoming air. We simulate different aerodynamic problems to validate and illustrate the usefulness and effectiveness of the developed computational tool.

Keywords: Unsteady aerodynamics, numerical simulations, mixed-language approach, potential flow.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1211
8333 Design of the Mathematical Model of the Respiratory System Using Electro-acoustic Analogy

Authors: M. Rozanek, K. Roubik

Abstract:

The article deals with development, design and implementation of a mathematical model of the human respiratory system. The model is designed in order to simulate distribution of important intrapulmonary parameters along the bronchial tree such as pressure amplitude, tidal volume and effect of regional mechanical lung properties upon the efficiency of various ventilatory techniques. Therefore exact agreement of the model structure with the lung anatomical structure is required. The model is based on the lung morphology and electro-acoustic analogy is used to design the model.

Keywords: Model of the respiratory system, total lung impedance, intrapulmonary parameters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1837
8332 Mining Frequent Patterns with Functional Programming

Authors: Nittaya Kerdprasop, Kittisak Kerdprasop

Abstract:

Frequent patterns are patterns such as sets of features or items that appear in data frequently. Finding such frequent patterns has become an important data mining task because it reveals associations, correlations, and many other interesting relationships hidden in a dataset. Most of the proposed frequent pattern mining algorithms have been implemented with imperative programming languages such as C, Cµ, Java. The imperative paradigm is significantly inefficient when itemset is large and the frequent pattern is long. We suggest a high-level declarative style of programming using a functional language. Our supposition is that the problem of frequent pattern discovery can be efficiently and concisely implemented via a functional paradigm since pattern matching is a fundamental feature supported by most functional languages. Our frequent pattern mining implementation using the Haskell language confirms our hypothesis about conciseness of the program. The performance studies on speed and memory usage support our intuition on efficiency of functional language.

Keywords: Association, frequent pattern mining, functionalprogramming, pattern matching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2135
8331 Texture Feature-Based Language Identification Using Wavelet-Domain BDIP and BVLC Features and FFT Feature

Authors: Ick Hoon Jang, Hoon Jae Lee, Dae Hoon Kwon, Ui Young Pak

Abstract:

In this paper, we propose a texture feature-based language identification using wavelet-domain BDIP (block difference of inverse probabilities) and BVLC (block variance of local correlation coefficients) features and FFT (fast Fourier transform) feature. In the proposed method, wavelet subbands are first obtained by wavelet transform from a test image and denoised by Donoho-s soft-thresholding. BDIP and BVLC operators are next applied to the wavelet subbands. FFT blocks are also obtained by 2D (twodimensional) FFT from the blocks into which the test image is partitioned. Some significant FFT coefficients in each block are selected and magnitude operator is applied to them. Moments for each subband of BDIP and BVLC and for each magnitude of significant FFT coefficients are then computed and fused into a feature vector. In classification, a stabilized Bayesian classifier, which adopts variance thresholding, searches the training feature vector most similar to the test feature vector. Experimental results show that the proposed method with the three operations yields excellent language identification even with rather low feature dimension.

Keywords: BDIP, BVLC, FFT, language identification, texture feature, wavelet transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2149
8330 EFL Learners- Perceptions of Computer-Mediated Communication (CMC) to Facilitate Communication in a Foreign Language

Authors: Lin, Huifen, Fang, Yueh-chiu

Abstract:

This study explores perceptions of English as a Foreign Language (EFL) learners on using computer mediated communication technology in their learner of English. The data consists of observations of both synchronous and asynchronous communication participants engaged in for over a period of 4 months, which included online, and offline communication protocols, open-ended interviews and reflection papers composed by participants. Content analysis of interview data and the written documents listed above, as well as, member check and triangulation techniques are the major data analysis strategies. The findings suggest that participants generally do not benefit from computer-mediated communication in terms of its effect in learning a foreign language. Participants regarded the nature of CMC as artificial, or pseudo communication that did not aid their authentic communicational skills in English. The results of this study sheds lights on insufficient and inconclusive findings, which most quantitative CMC studies previously generated.

Keywords: computer-mediated communication, EFL, writing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2581
8329 The Effect of an Al Andalus Fused Curriculum Model on the Learning Outcomes of Elementary School Students

Authors: Sobhy Fathy A. Hashesh

Abstract:

The study was carried out in the Elementary Classes of Andalus Private Schools, girls section using control and experimental groups formed by Random Assignment Strategy. The study aimed at investigating the effect of Al-Andalus Fused Curriculum (AFC) model of learning and the effect of separate subjects’ approach on the development of students’ conceptual learning and skills acquiring. The society of the study composed of Al-Andalus Private Schools, elementary school students, Girls Section (N=240), while the sample of the study composed of two randomly assigned groups (N=28) with one experimental group and one control group. The study followed the quantitative and qualitative approaches in collecting and analyzing data to investigate the study hypotheses. Results of the study revealed that there were significant statistical differences between students’ conceptual learning and skills acquiring for the favor of the experimental group. The study recommended applying this model on different educational variables and on other age groups to generate more data leading to more educational results for the favor of students’ learning outcomes.

Keywords: AFC, Lego Education, mechatronics, STEAM, Al-Andalus Fused Curriculum.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 859
8328 Model of MSD Risk Assessment at Workplace

Authors: K. Sekulová, M. Šimon

Abstract:

This article focuses on upper-extremity musculoskeletal disorders risk assessment model at workplace. In this model are used risk factors that are responsible for musculoskeletal system damage. Based on statistic calculations the model is able to define what risk of MSD threatens workers who are under risk factors. The model is also able to say how MSD risk would decrease if these risk factors are eliminated.

 

Keywords: Ergonomics, musculoskeletal disorders, occupational diseases, risk factors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2057
8327 A Hydro-Mechanical Model for Unsaturated Soils

Authors: A. Uchaipichat

Abstract:

The hydro-mechanical model for unsaturated soils has been presented based on the effective stress principle taking into account effects of drying-wetting process. The elasto-plastic constitutive equations for stress-strain relations of the soil skeleton have been established. A plasticity model is modified from modified Cam-Clay model. The hardening rule has been established by considering the isotropic consolidation paths. The effect of dryingwetting process is introduced through the ¤ç parameter. All model coefficients are identified in terms of measurable parameters. The simulations from the proposed model are compared with the experimental results. The model calibration was performed to extract the model parameter from the experimental results. Good agreement between the results predicted using proposed model and the experimental results was obtained.

Keywords: Drying-wetting process, Effective stress, Elastoplasticmodel, Unsaturated soils

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1745
8326 Comparison among Various Question Generations for Decision Tree Based State Tying in Persian Language

Authors: Nasibeh Nasiri, Dawood Talebi Khanmiri

Abstract:

Performance of any continuous speech recognition system is highly dependent on performance of the acoustic models. Generally, development of the robust spoken language technology relies on the availability of large amounts of data. Common way to cope with little data for training each state of Markov models is treebased state tying. This tying method applies contextual questions to tie states. Manual procedure for question generation suffers from human errors and is time consuming. Various automatically generated questions are used to construct decision tree. There are three approaches to generate questions to construct HMMs based on decision tree. One approach is based on misrecognized phonemes, another approach basically uses feature table and the other is based on state distributions corresponding to context-independent subword units. In this paper, all these methods of automatic question generation are applied to the decision tree on FARSDAT corpus in Persian language and their results are compared with those of manually generated questions. The results show that automatically generated questions yield much better results and can replace manually generated questions in Persian language.

Keywords: Decision Tree, Markov Models, Speech Recognition, State Tying.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1722
8325 A Brief Study about Nonparametric Adherence Tests

Authors: Vinicius R. Domingues, Luan C. S. M. Ozelim

Abstract:

The statistical study has become indispensable for various fields of knowledge. Not any different, in Geotechnics the study of probabilistic and statistical methods has gained power considering its use in characterizing the uncertainties inherent in soil properties. One of the situations where engineers are constantly faced is the definition of a probability distribution that represents significantly the sampled data. To be able to discard bad distributions, goodness-of-fit tests are necessary. In this paper, three non-parametric goodness-of-fit tests are applied to a data set computationally generated to test the goodness-of-fit of them to a series of known distributions. It is shown that the use of normal distribution does not always provide satisfactory results regarding physical and behavioral representation of the modeled parameters.

Keywords: Kolmogorov-Smirnov, Anderson-Darling, Cramer-Von-Mises, Nonparametric adherence tests.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1843
8324 Abai Kunanbayev's Role in Enrichment of the Kazakh Language

Authors: Y.M. Paltore, B.N. Zhubatova, A.A. Mustafayeva

Abstract:

Abai Kunanbayev is famous for being enlightener, composer, interpreter, social agent, philosopher, reformer, who wanted to enrich Kazakh literature by emergence with Russian and European culture, and also as a founder of Kazakh written literary language. Abai Kunanbayev was born in 1845 in East Kazakhstan area and passed away in 1904 in his hometown. His oeuvre absorbed and reflected all changes in the life of Kazakh society of the second half of XIX century. Because ХІХ century, especially its second half, was an important transition period for Kazakhstan, which radically changed traditional way of Kazakh society and predetermined further development in consequence of activation of Russian colonial policy and approval of commodity-money relations in Steppe Land.Abai Kunanbayev, besides Arabic and Persian common words and loanwords from Quran in his words of edification, had used a lot of words of Arabic, Persian, Latin, Russian, Nogai, Shaghatai, Polish, Greek, Turkish, which are used in the Kazakh language.

Keywords: Abai Kunanbayev, the Kazakh, Russian languages, literature

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2864
8323 Development of a RAM Simulation Model for Acid Gas Removal System

Authors: Ainul Akmar Mokhtar, Masdi Muhammad, Hilmi Hussin, Mohd Amin Abdul Majid

Abstract:

A reliability, availability and maintainability (RAM) model has been built for acid gas removal plant for system analysis that will play an important role in any process modifications, if required, for achieving its optimum performance. Due to the complexity of the plant, the model was based on a Reliability Block Diagram (RBD) with a Monte Carlo simulation engine. The model has been validated against actual plant data as well as local expert opinions, resulting in an acceptable simulation model. The results from the model showed that the operation and maintenance can be further improved, resulting in reduction of the annual production loss.

Keywords: Acid gas removal plant, RAM model, Reliabilityblock diagram

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2343
8322 EGCL: An Extended G-Code Language with Flow Control, Functions and Mnemonic Variables

Authors: Oscar E. Ruiz, S. Arroyave, J. F. Cardona

Abstract:

In the context of computer numerical control (CNC) and computer aided manufacturing (CAM), the capabilities of programming languages such as symbolic and intuitive programming, program portability and geometrical portfolio have special importance. They allow to save time and to avoid errors during part programming and permit code re-usage. Our updated literature review indicates that the current state of art presents voids in parametric programming, program portability and programming flexibility. In response to this situation, this article presents a compiler implementation for EGCL (Extended G-code Language), a new, enriched CNC programming language which allows the use of descriptive variable names, geometrical functions and flow-control statements (if-then-else, while). Our compiler produces low-level generic, elementary ISO-compliant Gcode, thus allowing for flexibility in the choice of the executing CNC machine and in portability. Our results show that readable variable names and flow control statements allow a simplified and intuitive part programming and permit re-usage of the programs. Future work includes allowing the programmer to define own functions in terms of EGCL, in contrast to the current status of having them as library built-in functions.

Keywords: CNC Programming, Compiler, G-code Language, Numerically Controlled Machine-Tools.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2623
8321 A Bayesian Hierarchical 13COBT to Correct Estimates Associated with a Delayed Gastric Emptying

Authors: Leslie J.C.Bluck, Sarah J.Jackson, Georgios Vlasakakis, Adrian Mander

Abstract:

The use of a Bayesian Hierarchical Model (BHM) to interpret breath measurements obtained during a 13C Octanoic Breath Test (13COBT) is demonstrated. The statistical analysis was implemented using WinBUGS, a commercially available computer package for Bayesian inference. A hierarchical setting was adopted where poorly defined parameters associated with a delayed Gastric Emptying (GE) were able to "borrow" strength from global distributions. This is proved to be a sufficient tool to correct model's failures and data inconsistencies apparent in conventional analyses employing a Non-linear least squares technique (NLS). Direct comparison of two parameters describing gastric emptying ng ( tlag -lag phase, t1/ 2 -half emptying time) revealed a strong correlation between the two methods. Despite our large dataset ( n = 164 ), Bayesian modeling was fast and provided a successful fitting for all subjects. On the contrary, NLS failed to return acceptable estimates in cases where GE was delayed.

Keywords: Bayesian hierarchical analysis, 13COBT, Gastricemptying, WinBUGS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1455
8320 A Formulation of the Latent Class Vector Model for Pairwise Data

Authors: Tomoya Okubo, Kuninori Nakamura, Shin-ichi Mayekawa

Abstract:

In this research, a latent class vector model for pairwise data is formulated. As compared to the basic vector model, this model yields consistent estimates of the parameters since the number of parameters to be estimated does not increase with the number of subjects. The result of the analysis reveals that the model was stable and could classify each subject to the latent classes representing the typical scales used by these subjects.

Keywords: finite mixture models, latent class analysis, Thrustone's paired comparison method, vector model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1216
8319 Hand Gesture Recognition: Sign to Voice System (S2V)

Authors: Oi Mean Foong, Tan Jung Low, Satrio Wibowo

Abstract:

Hand gesture is one of the typical methods used in sign language for non-verbal communication. It is most commonly used by people who have hearing or speech problems to communicate among themselves or with normal people. Various sign language systems have been developed by manufacturers around the globe but they are neither flexible nor cost-effective for the end users. This paper presents a system prototype that is able to automatically recognize sign language to help normal people to communicate more effectively with the hearing or speech impaired people. The Sign to Voice system prototype, S2V, was developed using Feed Forward Neural Network for two-sequence signs detection. Different sets of universal hand gestures were captured from video camera and utilized to train the neural network for classification purpose. The experimental results have shown that neural network has achieved satisfactory result for sign-to-voice translation.

Keywords: Hand gesture detection, neural network, signlanguage, sequence detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1856
8318 A Constitutive Model for Time-Dependent Behavior of Clay

Authors: T. N. Mac, B. Shahbodaghkhan, N. Khalili

Abstract:

A new elastic-viscoplastic (EVP) constitutive model is proposed for the analysis of time-dependent behavior of clay. The proposed model is based on the bounding surface plasticity and the concept of viscoplastic consistency framework to establish continuous transition from plasticity to rate dependent viscoplasticity. Unlike the overstress based models, this model will meet the consistency condition in formulating the constitutive equation for EVP model. The procedure of deriving the constitutive relationship is also presented. Simulation results and comparisons with experimental data are then presented to demonstrate the performance of the model.

Keywords: Bounding surface, consistency theory, constitutive model, viscosity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2744
8317 Neural Network Based Speech to Text in Malay Language

Authors: H. F. A. Abdul Ghani, R. R. Porle

Abstract:

Speech to text in Malay language is a system that converts Malay speech into text. The Malay language recognition system is still limited, thus, this paper aims to investigate the performance of ten Malay words obtained from the online Malay news. The methodology consists of three stages, which are preprocessing, feature extraction, and speech classification. In preprocessing stage, the speech samples are filtered using pre emphasis. After that, feature extraction method is applied to the samples using Mel Frequency Cepstrum Coefficient (MFCC). Lastly, speech classification is performed using Feedforward Neural Network (FFNN). The accuracy of the classification is further investigated based on the hidden layer size. From experimentation, the classifier with 40 hidden neurons shows the highest classification rate which is 94%.  

Keywords: Feed-Forward Neural Network, FFNN, Malay speech recognition, Mel Frequency Cepstrum Coefficient, MFCC, speech-to-text.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 746
8316 Effect of Progressive Type-I Right Censoring on Bayesian Statistical Inference of Simple Step–Stress Acceleration Life Testing Plan under Weibull Life Distribution

Authors: Saleem Z. Ramadan

Abstract:

This paper discusses the effects of using progressive Type-I right censoring on the design of the Simple Step Accelerated Life testing using Bayesian approach for Weibull life products under the assumption of cumulative exposure model. The optimization criterion used in this paper is to minimize the expected pre-posterior variance of the Pth percentile time of failures. The model variables are the stress changing time and the stress value for the first step. A comparison between the conventional and the progressive Type-I right censoring is provided. The results have shown that the progressive Type-I right censoring reduces the cost of testing on the expense of the test precision when the sample size is small. Moreover, the results have shown that using strong priors or large sample size reduces the sensitivity of the test precision to the censoring proportion. Hence, the progressive Type-I right censoring is recommended in these cases as progressive Type-I right censoring reduces the cost of the test and doesn't affect the precision of the test a lot. Moreover, the results have shown that using direct or indirect priors affects the precision of the test.

Keywords: Reliability, Accelerated life testing, Cumulative exposure model, Bayesian estimation, Progressive Type-I censoring, Weibull distribution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2161
8315 Named Entity Recognition using Support Vector Machine: A Language Independent Approach

Authors: Asif Ekbal, Sivaji Bandyopadhyay

Abstract:

Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.

Keywords: Named Entity (NE), Named Entity Recognition (NER), Support Vector Machine (SVM), Bengali, Hindi.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3404
8314 An Enhanced Tool for Implementing Dialogue Forms in Conversational Applications

Authors: Ilias Spais, George Bafas

Abstract:

Natural Language Understanding Systems (NLU) will not be widely deployed unless they are technically mature and cost effective to develop. Cost effective development hinges on the availability of tools and techniques enabling the rapid production of NLU applications through minimal human resources. Further, these tools and techniques should allow quick development of applications in a user friendly way and should be easy to upgrade in order to continuously follow the evolving technologies and standards. This paper presents a visual tool for the structuring and editing of dialog forms, the key element of driving conversation in NLU applications based on IBM technology. The main focus is given on the basic component used to describe Human – Machine interactions of that kind, the Dialogue Manager. In essence, the description of a tool that enables the visual representation of the Dialogue Manager mainly during the implementation phase is illustrated.

Keywords: Conversational Applications, Forms Dialogue Manager (FDM), Natural Language Processing, Natural Language Understanding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1453
8313 The Folksongs of Jharkhand: An Intangible Cultural Heritage of Tribal India

Authors: Walter Beck

Abstract:

Jharkhand is newly constituted 28th State in the eastern part of India which is known for the oldest settlement of the indigenous people. In the State of Jharkhand in which broadly three language family are found namely, Austric, Dravidian, and Indo-European. Ex-Mundari, kharia, Ho Santali come from the Austric Language family. Kurukh, Malto under Dravidian language family and Nagpuri Khorta etc. under Indo-European language family. There are 32 Indigenous Communities identified as Scheduled Tribe in the State of Jharkhand. Santhal, Munda, Kahria, Ho and Oraons are some of the major Tribe of the Jharkhand state. Jharkhand has a Rich Cultural heritage which includes Folk art, folklore, Folk Dance, Folk Music, Folk Songs for which diversity can been seen from place to place, season to season and all traditional Culture and practices. The languages as well as the songs are vulnerable to dominant culture and hence needed to be protected. The collection and documentation of these songs in their natural setting adds significant contribution to the conservation and propagation of the cultural elements. This paper reflects to bring out the Originality of the Collected Songs from remote areas of the plateau of Sothern Jharkhand as a rich intangible Cultural heritage of the Country. The research was done through participatory observation. In this research project more than 100 songs which were never documented before.

Keywords: Cultural heritage, India, Indigenous people, songs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2194
8312 Improved Computational Efficiency of Machine Learning Algorithms Based on Evaluation Metrics to Control the Spread of Coronavirus in the UK

Authors: Swathi Ganesan, Nalinda Somasiri, Rebecca Jeyavadhanam, Gayathri Karthick

Abstract:

The COVID-19 crisis presents a substantial and critical hazard to worldwide health. Since the occurrence of the disease in late January 2020 in the UK, the number of infected people confirmed to acquire the illness has increased tremendously across the country, and the number of individuals affected is undoubtedly considerably high. The purpose of this research is to figure out a predictive machine learning (ML) archetypal that could forecast the COVID-19 cases within the UK. This study concentrates on the statistical data collected from 31st January 2020 to 31st March 2021 in the United Kingdom. Information on total COVID-19 cases registered, new cases encountered on a daily basis, total death registered, and patients’ death per day due to Coronavirus is collected from World Health Organization (WHO). Data preprocessing is carried out to identify any missing values, outliers, or anomalies in the dataset. The data are split into 8:2 ratio for training and testing purposes to forecast future new COVID-19 cases. Support Vector Machine (SVM), Random Forest (RF), and linear regression (LR) algorithms are chosen to study the model performance in the prediction of new COVID-19 cases. From the evaluation metrics such as r-squared value and mean squared error, the statistical performance of the model in predicting the new COVID-19 cases is evaluated. RF outperformed the other two ML algorithms with a training accuracy of 99.47% and testing accuracy of 98.26% when n = 30. The mean square error obtained for RF is 4.05e11, which is lesser compared to the other predictive models used for this study. From the experimental analysis, RF algorithm can perform more effectively and efficiently in predicting the new COVID-19 cases, which could help the health sector to take relevant control measures for the spread of the virus.

Keywords: COVID-19, machine learning, supervised learning, unsupervised learning, linear regression, support vector machine, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 172
8311 Customer Need Type Classification Model using Data Mining Techniques for Recommender Systems

Authors: Kyoung-jae Kim

Abstract:

Recommender systems are usually regarded as an important marketing tool in the e-commerce. They use important information about users to facilitate accurate recommendation. The information includes user context such as location, time and interest for personalization of mobile users. We can easily collect information about location and time because mobile devices communicate with the base station of the service provider. However, information about user interest can-t be easily collected because user interest can not be captured automatically without user-s approval process. User interest usually represented as a need. In this study, we classify needs into two types according to prior research. This study investigates the usefulness of data mining techniques for classifying user need type for recommendation systems. We employ several data mining techniques including artificial neural networks, decision trees, case-based reasoning, and multivariate discriminant analysis. Experimental results show that CHAID algorithm outperforms other models for classifying user need type. This study performs McNemar test to examine the statistical significance of the differences of classification results. The results of McNemar test also show that CHAID performs better than the other models with statistical significance.

Keywords: Customer need type, Data mining techniques, Recommender system, Personalization, Mobile user.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2146
8310 The Intonation of Romanian Greetings: A Sociolinguistics Approach

Authors: Anca-Diana Bibiri, Mihaela Mocanu, Adrian Turculeț

Abstract:

In a language the inventory of greetings is dynamic with frequent input and output, although this is hardly noticed by the speakers. In this register, there are a number of constant, conservative elements that survive different language models (among them, the classic formulae: bună ziua! (good afternoon!), bună seara! (good evening!), noapte bună! (good night!), la revedere! (goodbye!) and a number of items that fail to pass the test of time, according to language use at a time (ciao!, pa!, bai!). The source of innovation depends both of internal factors (contraction, conversion, combination of classic formulae of greetings), and of external ones (borrowings and calques). Their use imposes their frequencies at once, namely the elimination of the use of others. This paper presents a sociolinguistic approach of contemporary Romanian greetings, based on prosodic surveys in two research projects: AMPRom, and SoRoEs. Romanian language presents a rich inventory of questions (especially partial interrogatives questions/WH-Q) which are used as greetings, alone or, more commonly accompanying a proper greeting. The representative of the typical formulae is Ce mai faci? (How are you?), which, unlike its English counterpart How do you do?, has not become a stereotype, but retains an obvious emotional impact, while serving as a mark of sociolinguistic group. The analyzed corpus consists of structures containing greetings recorded in the main Romanian cultural (urban) centers. From the methodological point of view, the acoustic analysis of the recorded data is performed using software tools (GoldWave, Praat), identifying intonation patterns related to three sociolinguistics variables: age, sex and level of education. The intonation patterns of the analyzed statements are at the interface between partial questions and typical greetings.

Keywords: acoustic analysis, greetings, Romanian language, sociolinguistics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1695
8309 A Comparison of Artificial Neural Networks for Prediction of Suspended Sediment Discharge in River- A Case Study in Malaysia

Authors: M.R. Mustafa, M.H. Isa, R.B. Rezaur

Abstract:

Prediction of highly non linear behavior of suspended sediment flow in rivers has prime importance in the field of water resources engineering. In this study the predictive performance of two Artificial Neural Networks (ANNs) namely, the Radial Basis Function (RBF) Network and the Multi Layer Feed Forward (MLFF) Network have been compared. Time series data of daily suspended sediment discharge and water discharge at Pari River was used for training and testing the networks. A number of statistical parameters i.e. root mean square error (RMSE), mean absolute error (MAE), coefficient of efficiency (CE) and coefficient of determination (R2) were used for performance evaluation of the models. Both the models produced satisfactory results and showed a good agreement between the predicted and observed data. The RBF network model provided slightly better results than the MLFF network model in predicting suspended sediment discharge.

Keywords: ANN, discharge, modeling, prediction, suspendedsediment,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1725