Search results for: data validation

7491 Data Preprocessing for Supervised Leaning

Authors: S. B. Kotsiantis, D. Kanellopoulos, P. E. Pintelas

Abstract:

Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.

Keywords: Data mining, feature selection, data cleaning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5916

7490 Catchment Yield Prediction in an Ungauged Basin Using PyTOPKAPI

Authors: B. S. Fatoyinbo, D. Stretch, O. T. Amoo, D. Allopi

Abstract:

This study extends the use of the Drainage Area Regionalization (DAR) method in generating synthetic data and calibrating PyTOPKAPI stream yield for an ungauged basin at a daily time scale. The generation of runoff in determining a river yield has been subjected to various topographic and spatial meteorological variables, which integers form the Catchment Characteristics Model (CCM). Many of the conventional CCM models adapted in Africa have been challenged with a paucity of adequate, relevance and accurate data to parameterize and validate the potential. The purpose of generating synthetic flow is to test a hydrological model, which will not suffer from the impact of very low flows or very high flows, thus allowing to check whether the model is structurally sound enough or not. The employed physically-based, watershed-scale hydrologic model (PyTOPKAPI) was parameterized with GIS-pre-processing parameters and remote sensing hydro-meteorological variables. The validation with mean annual runoff ratio proposes a decent graphical understanding between observed and the simulated discharge. The Nash-Sutcliffe efficiency and coefficient of determination (R²) values of 0.704 and 0.739 proves strong model efficiency. Given the current climate variability impact, water planner can now assert a tool for flow quantification and sustainable planning purposes.

Keywords: Ungauged Basin, Catchment Characteristics Model, Synthetic data, GIS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1255

7489 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: Analytics, Big Data in Education, Hadoop, Learning Analytics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4814

7488 Object-Oriented Programming for Modeling and Simulation of Systems in Physiology

Authors: J. Fernandez de Canete

Abstract:

Object-oriented modeling is spreading in current simulation of physiological systems through the use of the individual components of the model and its interconnections to define the underlying dynamic equations. In this paper we describe the use of both the SIMSCAPE and MODELICA simulation environments in the object-oriented modeling of the closed loop cardiovascular system. The performance of the controlled system was analyzed by simulation in light of the existing hypothesis and validation tests previously performed with physiological data. The described approach represents a valuable tool in the teaching of physiology for graduate medical students.

Keywords: Object-Oriented Modeling, SIMSCAPE Simulation Language, MODELICA Simulation Language, Cardiovascular System.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2813

7487 In Search of a Suitable Neural Network Capable of Fast Monitoring of Congestion Level in Electric Power Systems

Authors: Pradyumna Kumar Sahoo, Prasanta Kumar Satpathy

Abstract:

This paper aims at finding a suitable neural network for monitoring congestion level in electrical power systems. In this paper, the input data has been framed properly to meet the target objective through supervised learning mechanism by defining normal and abnormal operating conditions for the system under study. The congestion level, expressed as line congestion index (LCI), is evaluated for each operating condition and is presented to the NN along with the bus voltages to represent the input and target data. Once, the training goes successful, the NN learns how to deal with a set of newly presented data through validation and testing mechanism. The crux of the results presented in this paper rests on performance comparison of a multi-layered feed forward neural network with eleven types of back propagation techniques so as to evolve the best training criteria. The proposed methodology has been tested on the standard IEEE-14 bus test system with the support of MATLAB based NN toolbox. The results presented in this paper signify that the Levenberg-Marquardt backpropagation algorithm gives best training performance of all the eleven cases considered in this paper, thus validating the proposed methodology.

Keywords: Line congestion index, critical bus, contingency, neural network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1741

7486 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.

Keywords: Data cleaning, dependency rules, violation data discovery, data repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2559

7485 Ensembling Adaptively Constructed Polynomial Regression Models

Authors: Gints Jekabsons

Abstract:

The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed – a potentially non-trivial (and long) trial and error process. In our research we consider a potentially more efficient approach – Adaptive Basis Function Construction (ABFC). It lets the model building method itself construct the basis functions necessary for creating a model of arbitrary complexity with adequate predictive performance. However, there are two issues that to some extent plague the methods of both the subset selection and the ABFC, especially when working with relatively small data samples: the selection bias and the selection instability. We try to correct these issues by model post-evaluation using Cross-Validation and model ensembling. To evaluate the proposed method, we empirically compare it to ABFC methods without ensembling, to a widely used method of subset selection, as well as to some other well-known regression modeling methods, using publicly available data sets.

Keywords: Basis function construction, heuristic search, modelensembles, polynomial regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1628

7484 Artificial Neural Network-Based Short-Term Load Forecasting for Mymensingh Area of Bangladesh

Authors: S. M. Anowarul Haque, Md. Asiful Islam

Abstract:

Electrical load forecasting is considered to be one of the most indispensable parts of a modern-day electrical power system. To ensure a reliable and efficient supply of electric energy, special emphasis should have been put on the predictive feature of electricity supply. Artificial Neural Network-based approaches have emerged to be a significant area of interest for electric load forecasting research. This paper proposed an Artificial Neural Network model based on the particle swarm optimization algorithm for improved electric load forecasting for Mymensingh, Bangladesh. The forecasting model is developed and simulated on the MATLAB environment with a large number of training datasets. The model is trained based on eight input parameters including historical load and weather data. The predicted load data are then compared with an available dataset for validation. The proposed neural network model is proved to be more reliable in terms of day-wise load forecasting for Mymensingh, Bangladesh.

Keywords: Load forecasting, artificial neural network, particle swarm optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 611

7483 Identifying Factors Contributing to the Spread of Lyme Disease: A Regression Analysis of Virginia’s Data

Authors: Fatemeh Valizadeh Gamchi, Edward L. Boone

Abstract:

This research focuses on Lyme disease, a widespread infectious condition in the United States caused by the bacterium Borrelia burgdorferi sensu stricto. It is critical to identify environmental and economic elements that are contributing to the spread of the disease. This study examined data from Virginia to identify a subset of explanatory variables significant for Lyme disease case numbers. To identify relevant variables and avoid overfitting, linear poisson, and regularization regression methods such as ridge, lasso, and elastic net penalty were employed. Cross-validation was performed to acquire tuning parameters. The methods proposed can automatically identify relevant disease count covariates. The efficacy of the techniques was assessed using four criteria on three simulated datasets. Finally, using the Virginia Department of Health’s Lyme disease dataset, the study successfully identified key factors, and the results were consistent with previous studies.

Keywords: Lyme disease, Poisson generalized linear model, Ridge regression, Lasso Regression, elastic net regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27

7482 Coalescing Data Marts

Authors: N. Parimala, P. Pahwa

Abstract:

OLAP uses multidimensional structures, to provide access to data for analysis. Traditionally, OLAP operations are more focused on retrieving data from a single data mart. An exception is the drill across operator. This, however, is restricted to retrieving facts on common dimensions of the multiple data marts. Our concern is to define further operations while retrieving data from multiple data marts. Towards this, we have defined six operations which coalesce data marts. While doing so we consider the common as well as the non-common dimensions of the data marts.

Keywords: Data warehouse, Dimension, OLAP, Star Schema.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1514

7481 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2417

7480 Identification, Prediction and Detection of the Process Fault in a Cement Rotary Kiln by Locally Linear Neuro-Fuzzy Technique

Authors: Masoud Sadeghian, Alireza Fatehi

Abstract:

In this paper, we use nonlinear system identification method to predict and detect process fault of a cement rotary kiln. After selecting proper inputs and output, an input-output model is identified for the plant. To identify the various operation points in the kiln, Locally Linear Neuro-Fuzzy (LLNF) model is used. This model is trained by LOLIMOT algorithm which is an incremental treestructure algorithm. Then, by using this method, we obtained 3 distinct models for the normal and faulty situations in the kiln. One of the models is for normal condition of the kiln with 15 minutes prediction horizon. The other two models are for the two faulty situations in the kiln with 7 minutes prediction horizon are presented. At the end, we detect these faults in validation data. The data collected from White Saveh Cement Company is used for in this study.

Keywords: Cement Rotary Kiln, Fault Detection, Delay Estimation Method, Locally Linear Neuro Fuzzy Model, LOLIMOT.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1625

7479 An Expert System Designed to Be Used with MOEAs for Efficient Portfolio Selection

Authors: K. Metaxiotis, K. Liagkouras

Abstract:

This study presents an Expert System specially designed to be used with Multiobjective Evolutionary Algorithms (MOEAs) for the solution of the portfolio selection problem. The validation of the proposed hybrid System is done by using data sets from Hang Seng 31 in Hong Kong, DAX 100 in Germany and FTSE 100 in UK. The performance of the proposed system is assessed in comparison with the Non-dominated Sorting Genetic Algorithm II (NSGAII). The evaluation of the performance is based on different performance metrics that evaluate both the proximity of the solutions to the Pareto front and their dispersion on it. The results show that the proposed hybrid system is efficient for the solution of this kind of problems.

Keywords: Expert Systems, Multiobjective optimization, Evolutionary Algorithms, Portfolio Selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1708

7478 Elasto-Plastic Behavior of Rock during Temperature Drop

Authors: N. Reppas, Y. L. Gui, B. Wetenhall, C. T. Davie, J. Ma

Abstract:

A theoretical constitutive model describing the stress-strain behavior of rock subjected to different confining pressures is presented. A bounding surface plastic model with hardening effects is proposed which includes the effect of temperature drop. The bounding surface is based on a mapping rule and the temperature effect on rock is controlled by Poisson’s ratio. Validation of the results against available experimental data is also presented. The relation of deviatoric stress and axial strain is illustrated at different temperatures to analyze the effect of temperature decrease in terms of stiffness of the material.

Keywords: Bounding surface, cooling of rock, plasticity model, rock deformation, elasto-plastic behavior.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 808

7477 Numerical Analysis and Experimental Validation of a Downhole Stress/Strain Measurement Tool

Authors: Abhay Bodake, Ping Sui, Hafeez Syed, Ratish Kadam

Abstract:

Real-time measurement of applied forces, like tension, compression, torsion, and bending moment, identifies the transferred energies being applied to the bottomhole assembly (BHA). These forces are highly detrimental to measurement/logging-while-drilling tools and downhole equipment. Real-time measurement of the dynamic downhole behavior, including weight, torque, bending on bit, and vibration, establishes a real-time feedback loop between the downhole drilling system and drilling team at the surface. This paper describes the numerical analysis of the strain data acquired by the measurement tool at different locations on the strain pockets. The strain values obtained by FEA for various loading conditions (tension, compression, torque, and bending moment) are compared against experimental results obtained from an identical experimental setup. Numerical analyses results agree with experimental data within 8% and, therefore, substantiate and validate the FEA model. This FEA model can be used to analyze the combined loading conditions that reflect the actual drilling environment.

Keywords: FEA, M/LWD, Oil & Gas, Strain Measurement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2535

7476 Comparative Analysis of Diverse Collection of Big Data Analytics Tools

Authors: S. Vidhya, S. Sarumathi, N. Shanthi

Abstract:

Over the past era, there have been a lot of efforts and studies are carried out in growing proficient tools for performing various tasks in big data. Recently big data have gotten a lot of publicity for their good reasons. Due to the large and complex collection of datasets it is difficult to process on traditional data processing applications. This concern turns to be further mandatory for producing various tools in big data. Moreover, the main aim of big data analytics is to utilize the advanced analytic techniques besides very huge, different datasets which contain diverse sizes from terabytes to zettabytes and diverse types such as structured or unstructured and batch or streaming. Big data is useful for data sets where their size or type is away from the capability of traditional relational databases for capturing, managing and processing the data with low-latency. Thus the out coming challenges tend to the occurrence of powerful big data tools. In this survey, a various collection of big data tools are illustrated and also compared with the salient features.

Keywords: Big data, Big data analytics, Business analytics, Data analysis, Data visualization, Data discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3726

7475 Comparative Study of Ad Hoc Routing Protocols in Vehicular Ad-Hoc Networks for Smart City

Authors: Khadija Raissi, Bechir Ben Gouissem

Abstract:

In this paper, we perform the investigation of some routing protocols in Vehicular Ad-Hoc Network (VANET) context. Indeed, we study the efficiency of protocols like Dynamic Source Routing (DSR), Ad hoc On-demand Distance Vector Routing (AODV), Destination Sequenced Distance Vector (DSDV), Optimized Link State Routing convention (OLSR) and Vehicular Multi-hop algorithm for Stable Clustering (VMASC) in terms of packet delivery ratio (PDR) and throughput. The performance evaluation and comparison between the studied protocols shows that the VMASC is the best protocols regarding fast data transmission and link stability in VANETs. The validation of all results is done by the NS3 simulator.

Keywords: VANET, smart city, AODV, OLSR, DSR, OLSR, VMASC, routing protocols, NS3.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 970

7474 Rheological Modeling for Production of High Quality Polymeric

Authors: H.Hosseini, A.A. Azemati

Abstract:

The fundamental defect inherent to the thermoforming technology is wall-thickness variation of the products due to inadequate thermal processing during production of polymer. A nonlinear viscoelastic rheological model is implemented for developing the process model. This model describes deformation process of a sheet in thermoforming process. Because of relaxation pause after plug-assist stage and also implementation of two stage thermoforming process have minor wall-thickness variation and consequently better mechanical properties of polymeric articles. For model validation, a comparative analysis of the theoretical and experimental data is presented.

Keywords: High-quality polymeric article, Thermal Processing, Rheological model, Minor wall-thickness variation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1567

7473 Design and Simulation of Low Speed Axial Flux Permanent Magnet (AFPM) Machine

Authors: Ahmad Darabi, Hassan Moradi, Hossein Azarinfar

Abstract:

In this paper presented initial design of Low Speed Axial Flux Permanent Magnet (AFPM) Machine with Non-Slotted TORUS topology type by use of certain algorithm (Appendix). Validation of design algorithm studied by means of selected data of an initial prototype machine. Analytically design calculation carried out by means of design algorithm and obtained results compared with results of Finite Element Method (FEM).

Keywords: Axial Flux Permanent Magnet (AFPM) Machine, Design Algorithm, Finite Element Method (FEM), TORUS

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3241

7472 Improving Activity Recognition Classification of Repetitious Beginner Swimming Using a 2-Step Peak/Valley Segmentation Method with Smoothing and Resampling for Machine Learning

Authors: Larry Powell, Seth Polsley, Drew Casey, Tracy Hammond

Abstract:

Human activity recognition (HAR) systems have shown positive performance when recognizing repetitive activities like walking, running, and sleeping. Water-based activities are a reasonably new area for activity recognition. However, water-based activity recognition has largely focused on supporting the elite and competitive swimming population, which already has amazing coordination and proper form. Beginner swimmers are not perfect, and activity recognition needs to support the individual motions to help beginners. Activity recognition algorithms are traditionally built around short segments of timed sensor data. Using a time window input can cause performance issues in the machine learning model. The window’s size can be too small or large, requiring careful tuning and precise data segmentation. In this work, we present a method that uses a time window as the initial segmentation, then separates the data based on the change in the sensor value. Our system uses a multi-phase segmentation method that pulls all peaks and valleys for each axis of an accelerometer placed on the swimmer’s lower back. This results in high recognition performance using leave-one-subject-out validation on our study with 20 beginner swimmers, with our model optimized from our final dataset resulting in an F-Score of 0.95.

Keywords: Time window, peak/valley segmentation, feature extraction, beginner swimming, activity recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 114

7471 Multi-labeled Data Expressed by a Set of Labels

Authors: Tetsuya Furukawa, Masahiro Kuzunishi

Abstract:

Collected data must be organized to be utilized efficiently, and hierarchical classification of data is efficient approach to organize data. When data is classified to multiple categories or annotated with a set of labels, users request multi-labeled data by giving a set of labels. There are several interpretations of the data expressed by a set of labels. This paper discusses which data is expressed by a set of labels by introducing orders for sets of labels and shows that there are four types of orders, which are characterized by whether the labels of expressed data includes every label of the given set of labels within the range of the set. Desirable properties of the orders, data is also expressed by the higher set of labels and different sets of labels express different data, are discussed for the orders.

Keywords: Classification Hierarchies, Multi-labeled Data, Multiple Classificaiton, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1254

7470 Investigation of Hydraulic and Thermal Performances of Fin Array at Different Shield Positions without By-Pass

Authors: Ramy H. Mohammed

Abstract:

In heat sinks, the flow within the core exhibits separation and hence does not lend itself to simple analytical boundary layer or duct flow analysis of the wall friction. In this paper, we present some findings from an experimental and numerical study aimed to obtain physical insight into the influence of the presence of the shield and its position on the hydraulic and thermal performance of square pin fin heat sink without top by-pass. The variations of the Nusselt number and friction factor are obtained under varied parameters, such as the Reynolds number and the shield position. The numerical code is validated by comparing the numerical results with the available experimental data. It is shown that, there is a good agreement between the temperature predictions based on the model and the experimental data. Results show that, as the presence of the shield, the heat transfer of fin array is enhanced and the flow resistance increased. The surface temperature distribution of the heat sink base is more uniform when the dimensionless shield position equals to 1/3 or 2/3. The comprehensive performance evaluation approach based on identical pumping power criteria is adopted and shows that the optimum shield position is at x/l=0.43.

Keywords: Shield, Fin array, Performance evaluation, Heat transfer, Validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1792

7469 Analytical Model Prediction: Micro-Cutting Tool Forces with the Effect of Friction on Machining Titanium Alloy (Ti-6Al-4V)

Authors: Mohd Shahrom Ismail, B.T. Hang Tuah Baharudin, K.K.B. Hon

Abstract:

In this paper, a methodology of a model based on predicting the tool forces oblique machining are introduced by adopting the orthogonal technique. The applied analytical calculation is mostly based on Devries model and some parts of the methodology are employed from Amareggo-Brown model. Model validation is performed by comparing experimental data with the prediction results on machining titanium alloy (Ti-6Al-4V) based on micro-cutting tool perspective. Good agreements with the experiments are observed. A detailed friction form that affected the tool forces also been examined with reasonable results obtained.

Keywords: dynamics machining, micro cutting tool, Tool forces

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1623

7468 Predictive Analytics of Student Performance Determinants in Education

Authors: Mahtab Davari, Charles Edward Okon, Somayeh Aghanavesi

Abstract:

Every institute of learning is usually interested in the performance of enrolled students. The level of these performances determines the approach an institute of study may adopt in rendering academic services. The focus of this paper is to evaluate students' academic performance in given courses of study using machine learning methods. This study evaluated various supervised machine learning classification algorithms such as Logistic Regression (LR), Support Vector Machine (SVM), Random Forest, Decision Tree, K-Nearest Neighbors, Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis, using selected features to predict study performance. The accuracy, precision, recall, and F1 score obtained from a 5-Fold Cross-Validation were used to determine the best classification algorithm to predict students’ performances. SVM (using a linear kernel), LDA, and LR were identified as the best-performing machine learning methods. Also, using the LR model, this study identified students' educational habits such as reading and paying attention in class as strong determinants for a student to have an above-average performance. Other important features include the academic history of the student and work. Demographic factors such as age, gender, high school graduation, etc., had no significant effect on a student's performance.

Keywords: Student performance, supervised machine learning, prediction, classification, cross-validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 464

7467 Web Data Scraping Technology Using Term Frequency Inverse Document Frequency to Enhance the Big Data Quality on Sentiment Analysis

Authors: Sangita Pokhrel, Nalinda Somasiri, Rebecca Jeyavadhanam, Swathi Ganesan

Abstract:

Tourism is a booming industry with huge future potential for global wealth and employment. There are countless data generated over social media sites every day, creating numerous opportunities to bring more insights to decision-makers. The integration of big data technology into the tourism industry will allow companies to conclude where their customers have been and what they like. This information can then be used by businesses, such as those in charge of managing visitor centres or hotels, etc., and the tourist can get a clear idea of places before visiting. The technical perspective of natural language is processed by analysing the sentiment features of online reviews from tourists, and we then supply an enhanced long short-term memory (LSTM) framework for sentiment feature extraction of travel reviews. We have constructed a web review database using a crawler and web scraping technique for experimental validation to evaluate the effectiveness of our methodology. The text form of sentences was first classified through VADER and RoBERTa model to get the polarity of the reviews. In this paper, we have conducted study methods for feature extraction, such as Count Vectorization and Term Frequency – Inverse Document Frequency (TFIDF) Vectorization and implemented Convolutional Neural Network (CNN) classifier algorithm for the sentiment analysis to decide if the tourist’s attitude towards the destinations is positive, negative, or simply neutral based on the review text that they posted online. The results demonstrated that from the CNN algorithm, after pre-processing and cleaning the dataset, we received an accuracy of 96.12% for the positive and negative sentiment analysis.

Keywords: Counter vectorization, Convolutional Neural Network, Crawler, data technology, Long Short-Term Memory, LSTM, Web Scraping, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 95

7466 Development and Validation of a HPLC Method for 6-Gingerol and 6-Shogaol in Joint Pain Relief Gel Containing Ginger (Zingiber officinale)

Authors: Tanwarat Kajsongkram, Saowalux Rotamporn, Sirinat Limbunruang, Sirinan Thubthimthed

Abstract:

High Performance Liquid Chromatography (HPLC) method was developed and validated for simultaneous estimation of 6-Gingerol(6G) and 6-Shogaol(6S) in joint pain relief gel containing ginger extract. The chromatographic separation was achieved by using C18 column, 150 x 4.6mm i.d., 5μ Luna, mobile phase containing acetonitrile and water (gradient elution). The flow rate was 1.0 ml/min and the absorbance was monitored at 282 nm. The proposed method was validated in terms of the analytical parameters such as specificity, accuracy, precision, linearity, range, limit of detection (LOD), limit of quantification (LOQ), and determined based on the International Conference on Harmonization (ICH) guidelines. The linearity ranges of 6G and 6S were obtained over 20- 60 and 6-18 μg/ml respectively. Good linearity was observed over the above-mentioned range with linear regression equation Y= 11016x- 23778 for 6G and Y = 19276x-19604 for 6S (x is concentration of analytes in μg/ml and Y is peak area). The value of correlation coefficient was found to be 0.9994 for both markers. The limit of detection (LOD) and limit of quantification (LOQ) for 6G were 0.8567 and 2.8555 μg/ml and for 6S were 0.3672 and 1.2238 μg/ml respectively. The recovery range for 6G and 6S were found to be 91.57 to 102.36 % and 84.73 to 92.85 % for all three spiked levels. The RSD values from repeated extractions for 6G and 6S were 3.43 and 3.09% respectively. The validation of developed method on precision, accuracy, specificity, linearity, and range were also performed with well-accepted results.

Keywords: Ginger, 6-gingerol, HPLC, 6-shogaol.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3373

7465 The Comparison of Data Replication in Distributed Systems

Authors: Iman Zangeneh, Mostafa Moradi, Ali Mokhtarbaf

Abstract:

The necessity of ever-increasing use of distributed data in computer networks is obvious for all. One technique that is performed on the distributed data for increasing of efficiency and reliablity is data rplication. In this paper, after introducing this technique and its advantages, we will examine some dynamic data replication. We will examine their characteristies for some overus scenario and the we will propose some suggestion for their improvement.

Keywords: data replication, data hiding, consistency, dynamicdata replication strategy

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1588

7464 Simulation Method for Determining the Thermally Induced Displacement of Machine Tools – Experimental Validation and Utilization in the Design Process

Authors: G. Kehl, P. Wagner

Abstract:

A novel simulation method to determine the displacements of machine tools due to thermal factors is presented. The specific characteristic of this method is the employment of original CAD data from the design process chain, which is interpreted by an algorithm in terms of geometry-based allocation of convection and radiation parameters. Furthermore analogous models relating to the thermal behaviour of machine elements are automatically implemented, which were gained by extensive experimental testing with thermography imaging. With this a transient simulation of the thermal field and in series of the displacement of the machine tool is possible simultaneously during the design phase. This method was implemented and is already used industrially in the design of machining centres in order to improve the quality of herewith manufactured workpieces.

Keywords: Accuracy, design process, finite element analysis, machine tools, thermal simulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2038

7463 Predictive Semi-Empirical NOx Model for Diesel Engine

Authors: Saurabh Sharma, Yong Sun, Bruce Vernham

Abstract:

Accurate prediction of NOx emission is a continuous challenge in the field of diesel engine-out emission modeling. Performing experiments for each conditions and scenario cost significant amount of money and man hours, therefore model-based development strategy has been implemented in order to solve that issue. NOx formation is highly dependent on the burn gas temperature and the O₂ concentration inside the cylinder. The current empirical models are developed by calibrating the parameters representing the engine operating conditions with respect to the measured NOx. This makes the prediction of purely empirical models limited to the region where it has been calibrated. An alternative solution to that is presented in this paper, which focus on the utilization of in-cylinder combustion parameters to form a predictive semi-empirical NOx model. The result of this work is shown by developing a fast and predictive NOx model by using the physical parameters and empirical correlation. The model is developed based on the steady state data collected at entire operating region of the engine and the predictive combustion model, which is developed in Gamma Technology (GT)-Power by using Direct Injected (DI)-Pulse combustion object. In this approach, temperature in both burned and unburnt zone is considered during the combustion period i.e. from Intake Valve Closing (IVC) to Exhaust Valve Opening (EVO). Also, the oxygen concentration consumed in burnt zone and trapped fuel mass is also considered while developing the reported model. Several statistical methods are used to construct the model, including individual machine learning methods and ensemble machine learning methods. A detailed validation of the model on multiple diesel engines is reported in this work. Substantial numbers of cases are tested for different engine configurations over a large span of speed and load points. Different sweeps of operating conditions such as Exhaust Gas Recirculation (EGR), injection timing and Variable Valve Timing (VVT) are also considered for the validation. Model shows a very good predictability and robustness at both sea level and altitude condition with different ambient conditions. The various advantages such as high accuracy and robustness at different operating conditions, low computational time and lower number of data points requires for the calibration establishes the platform where the model-based approach can be used for the engine calibration and development process. Moreover, the focus of this work is towards establishing a framework for the future model development for other various targets such as soot, Combustion Noise Level (CNL), NO₂/NOx ratio etc.

Keywords: Diesel engine, machine learning, NOx emission, semi-empirical.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 793

7462 Design and Construction Validation of Pile Performance through High Strain Pile Dynamic Tests for both Contiguous Flight Auger and Drilled Displacement Piles

Authors: S. Pirrello

Abstract:

Sydney’s booming real estate market has pushed property developers to invest in historically “no-go” areas, which were previously too expensive to develop. These areas are usually near rivers where the sites are underlain by deep alluvial and estuarine sediments. In these ground conditions, conventional bored pile techniques are often not competitive. Contiguous Flight Auger (CFA) and Drilled Displacement (DD) Piles techniques are on the other hand suitable for these ground conditions. This paper deals with the design and construction challenges encountered with these piling techniques for a series of high-rise towers in Sydney’s West. The advantages of DD over CFA piles such as reduced overall spoil with substantial cost savings and achievable rock sockets in medium strength bedrock are discussed. Design performances were assessed with PIGLET. Pile performances are validated in two stages, during constructions with the interpretation of real-time data from the piling rigs’ on-board computer data, and after construction with analyses of results from high strain pile dynamic testing (PDA). Results are then presented and discussed. High Strain testing data are presented as Case Pile Wave Analysis Program (CAPWAP) analyses.

Keywords: Contiguous flight auger, case pile wave analysis, high strain pile, drilled displacement, pile performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 923