Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 368

Search results for: MNIST dataset

38 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection

Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada

Abstract:

With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.

Keywords: Machine learning, Imbalanced data, Data mining, Big data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1137

37 Automated Natural Hazard Zonation System with Internet-SMS Warning: Distributed GIS for Sustainable Societies Creating Schema & Interface for Mapping & Communication

Authors: Devanjan Bhattacharya, Jitka Komarkova

Abstract:

The research describes the implementation of a novel and stand-alone system for dynamic hazard warning. The system uses all existing infrastructure already in place like mobile networks, a laptop/PC and the small installation software. The geospatial dataset are the maps of a region which are again frugal. Hence there is no need to invest and it reaches everyone with a mobile. A novel architecture of hazard assessment and warning introduced where major technologies in ICT interfaced to give a unique WebGIS based dynamic real time geohazard warning communication system. A never before architecture introduced for integrating WebGIS with telecommunication technology. Existing technologies interfaced in a novel architectural design to address a neglected domain in a way never done before – through dynamically updatable WebGIS based warning communication. The work publishes new architecture and novelty in addressing hazard warning techniques in sustainable way and user friendly manner. Coupling of hazard zonation and hazard warning procedures into a single system has been shown. Generalized architecture for deciphering a range of geo-hazards has been developed. Hence the developmental work presented here can be summarized as the development of internet-SMS based automated geo-hazard warning communication system; integrating a warning communication system with a hazard evaluation system; interfacing different open-source technologies towards design and development of a warning system; modularization of different technologies towards development of a warning communication system; automated data creation, transformation and dissemination over different interfaces. The architecture of the developed warning system has been functionally automated as well as generalized enough that can be used for any hazard and setup requirement has been kept to a minimum.

Keywords: Geospatial, web-based GIS, geohazard, warning system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1796

36 Identification of Spam Keywords Using Hierarchical Category in C2C E-commerce

Authors: Shao Bo Cheng, Yong-Jin Han, Se Young Park, Seong-Bae Park

Abstract:

Consumer-to-Consumer (C2C) E-commerce has been growing at a very high speed in recent years. Since identical or nearly-same kinds of products compete one another by relying on keyword search in C2C E-commerce, some sellers describe their products with spam keywords that are popular but are not related to their products. Though such products get more chances to be retrieved and selected by consumers than those without spam keywords, the spam keywords mislead the consumers and waste their time. This problem has been reported in many commercial services like ebay and taobao, but there have been little research to solve this problem. As a solution to this problem, this paper proposes a method to classify whether keywords of a product are spam or not. The proposed method assumes that a keyword for a given product is more reliable if the keyword is observed commonly in specifications of products which are the same or the same kind as the given product. This is because that a hierarchical category of a product in general determined precisely by a seller of the product and so is the specification of the product. Since higher layers of the hierarchical category represent more general kinds of products, a reliable degree is differently determined according to the layers. Hence, reliable degrees from different layers of a hierarchical category become features for keywords and they are used together with features only from specifications for classification of the keywords. Support Vector Machines are adopted as a basic classifier using the features, since it is powerful, and widely used in many classification tasks. In the experiments, the proposed method is evaluated with a golden standard dataset from Yi-han-wang, a Chinese C2C E-commerce, and is compared with a baseline method that does not consider the hierarchical category. The experimental results show that the proposed method outperforms the baseline in F1-measure, which proves that spam keywords are effectively identified by a hierarchical category in C2C E-commerce.

Keywords: Spam Keyword, E-commerce, keyword features, spam filtering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2508

35 Reducing CO2 Emission Using EDA and Weighted Sum Model in Smart Parking System

Authors: Rahman Ali, Muhammad Sajjad, Farkhund Iqbal, Muhammad Sadiq Hassan Zada, Mohammed Hussain

Abstract:

Emission of Carbon Dioxide (CO₂) has adversely affected the environment. One of the major sources of CO₂emission is transportation. In the last few decades, the increase in mobility of people using vehicles has enormously increased the emission of CO₂ in the environment. To reduce CO₂ emission, sustainable transportation system is required in which smart parking is one of the important measures that need to be established. To contribute to the issue of reducing the amount of CO₂emission, this research proposes a smart parking system. A cloud-based solution is provided to the drivers which automatically searches and recommends the most preferred parking slots. To determine preferences of the parking areas, this methodology exploits a number of unique parking features which ultimately results in the selection of a parking that leads to minimum level of CO₂emission from the current position of the vehicle. To realize the methodology, a scenario-based implementation is considered. During the implementation, a mobile application with GPS signals, vehicles with a number of vehicle features and a list of parking areas with parking features are used by sorting, multi-level filtering, exploratory data analysis (EDA, Analytical Hierarchy Process (AHP)) and weighted sum model (WSM) to rank the parking areas and recommend the drivers with top-k most preferred parking areas. In the EDA process, “2020testcar-2020-03-03”, a freely available dataset is used to estimate CO₂emission of a particular vehicle. To evaluate the system, results of the proposed system are compared with the conventional approach, which reveal that the proposed methodology supersedes the conventional one in reducing the emission of CO₂into the atmosphere.

Keywords: CO2 emission, IoT, EDA, Weighted Sum Model, WSM, regression, smart parking system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 742

34 Development of Energy Benchmarks Using Mandatory Energy and Emissions Reporting Data: Ontario Post-Secondary Residences

Authors: C. Xavier Mendieta, J. J McArthur

Abstract:

Governments are playing an increasingly active role in reducing carbon emissions, and a key strategy has been the introduction of mandatory energy disclosure policies. These policies have resulted in a significant amount of publicly available data, providing researchers with a unique opportunity to develop location-specific energy and carbon emission benchmarks from this data set, which can then be used to develop building archetypes and used to inform urban energy models. This study presents the development of such a benchmark using the public reporting data. The data from Ontario’s Ministry of Energy for Post-Secondary Educational Institutions are being used to develop a series of building archetype dynamic building loads and energy benchmarks to fill a gap in the currently available building database. This paper presents the development of a benchmark for college and university residences within ASHRAE climate zone 6 areas in Ontario using the mandatory disclosure energy and greenhouse gas emissions data. The methodology presented includes data cleaning, statistical analysis, and benchmark development, and lessons learned from this investigation are presented and discussed to inform the development of future energy benchmarks from this larger data set. The key findings from this initial benchmarking study are: (1) the importance of careful data screening and outlier identification to develop a valid dataset; (2) the key features used to develop a model of the data are building age, size, and occupancy schedules and these can be used to estimate energy consumption; and (3) policy changes affecting the primary energy generation significantly affected greenhouse gas emissions, and consideration of these factors was critical to evaluate the validity of the reported data.

Keywords: Building archetypes, data analysis, energy benchmarks, GHG emissions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1024

33 Customer Churn Prediction Using Four Machine Learning Algorithms Integrating Feature Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial part of maintaining a customer-oriented business in the telecommunications industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years, which has made it more important to understand customers’ needs in this strong market. For those who are looking to turn over their service providers, understanding their needs is especially important. Predictive churn is now a mandatory requirement for retaining customers in the telecommunications industry. Machine learning can be used to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: Machine Learning, Gradient Boosting, Logistic Regression, Churn, Random Forest, Decision Tree, ROC, AUC, F1-score.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 408

32 Review of the Road Crash Data Availability in Iraq

Authors: Abeer K. Jameel, Harry Evdorides

Abstract:

Iraq is a middle income country where the road safety issue is considered one of the leading causes of deaths. To control the road risk issue, the Iraqi Ministry of Planning, General Statistical Organization started to organise a collection system of traffic accidents data with details related to their causes and severity. These data are published as an annual report. In this paper, a review of the available crash data in Iraq will be presented. The available data represent the rate of accidents in aggregated level and classified according to their types, road users’ details, and crash severity, type of vehicles, causes and number of causalities. The review is according to the types of models used in road safety studies and research, and according to the required road safety data in the road constructions tasks. The available data are also compared with the road safety dataset published in the United Kingdom as an example of developed country. It is concluded that the data in Iraq are suitable for descriptive and exploratory models, aggregated level comparison analysis, and evaluation and monitoring the progress of the overall traffic safety performance. However, important traffic safety studies require disaggregated level of data and details related to the factors of the likelihood of traffic crashes. Some studies require spatial geographic details such as the location of the accidents which is essential in ranking the roads according to their level of safety, and name the most dangerous roads in Iraq which requires tactic plan to control this issue. Global Road safety agencies interested in solve this problem in low and middle-income countries have designed road safety assessment methodologies which are basing on the road attributes data only. Therefore, in this research it is recommended to use one of these methodologies.

Keywords: Data availability, Iraq, road safety.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 931

31 Development of an Ensemble Classification Model Based on Hybrid Filter-Wrapper Feature Selection for Email Phishing Detection

Authors: R. B. Ibrahim, M. S. Argungu, I. M. Mungadi

Abstract:

It is obvious in this present time, internet has become an indispensable part of human life since its inception. The Internet has provided diverse opportunities to make life so easy for human beings, through the adoption of various channels. Among these channels are email, internet banking, video conferencing, and the like. Email is one of the easiest means of communication hugely accepted among individuals and organizations globally. But over decades the security integrity of this platform has been challenged with malicious activities like Phishing. Email phishing is designed by phishers to fool the recipient into handing over sensitive personal information such as passwords, credit card numbers, account credentials, social security numbers, etc. This activity has caused a lot of financial damage to email users globally which has resulted in bankruptcy, sudden death of victims, and other health-related sicknesses. Although many methods have been proposed to detect email phishing, in this research, the results of multiple machine-learning methods for predicting email phishing have been compared with the use of filter-wrapper feature selection. It is worth noting that all three models performed substantially but one outperformed the other. The dataset used for these models is obtained from Kaggle online data repository, while three classifiers: decision tree, Naïve Bayes, and Logistic regression are ensemble (Bagging) respectively. Results from the study show that the Decision Tree (CART) bagging ensemble recorded the highest accuracy of 98.13% using PEF (Phishing Essential Features). This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, hybrid, filter-wrapper, phishing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 178

30 Grassland Phenology in Different Eco-Geographic Regions over the Tibetan Plateau

Authors: Jiahua Zhang, Qing Chang, Fengmei Yao

Abstract:

Studying on the response of vegetation phenology to climate change at different temporal and spatial scales is important for understanding and predicting future terrestrial ecosystem dynamics and the adaptation of ecosystems to global change. In this study, the Moderate Resolution Imaging Spectroradiometer (MODIS) Normalized Difference Vegetation Index (NDVI) dataset and climate data were used to analyze the dynamics of grassland phenology as well as their correlation with climatic factors in different eco-geographic regions and elevation units across the Tibetan Plateau. The results showed that during 2003–2012, the start of the grassland greening season (SOS) appeared later while the end of the growing season (EOS) appeared earlier following the plateau’s precipitation and heat gradients from southeast to northwest. The multi-year mean value of SOS showed differences between various eco-geographic regions and was significantly impacted by average elevation and regional average precipitation during spring. Regional mean differences for EOS were mainly regulated by mean temperature during autumn. Changes in trends of SOS in the central and eastern eco-geographic regions were coupled to the mean temperature during spring, advancing by about 7d/°C. However, in the two southwestern eco-geographic regions, SOS was delayed significantly due to the impact of spring precipitation. The results also showed that the SOS occurred later with increasing elevation, as expected, with a delay rate of 0.66 d/100m. For 2003–2012, SOS showed an advancing trend in low-elevation areas, but a delayed trend in high-elevation areas, while EOS was delayed in low-elevation areas, but advanced in high-elevation areas. Grassland SOS and EOS changes may be influenced by a variety of other environmental factors in each eco-geographic region.

Keywords: Grassland, phenology, MODIS, eco-geographic regions, elevation, climatic factors, Tibetan Plateau.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2828

29 Faster Pedestrian Recognition Using Deformable Part Models

Authors: Alessandro Preziosi, Antonio Prioletti, Luca Castangia

Abstract:

Deformable part models achieve high precision in pedestrian recognition, but all publicly available implementations are too slow for real-time applications. We implemented a deformable part model algorithm fast enough for real-time use by exploiting information about the camera position and orientation. This implementation is both faster and more precise than alternative DPM implementations. These results are obtained by computing convolutions in the frequency domain and using lookup tables to speed up feature computation. This approach is almost an order of magnitude faster than the reference DPM implementation, with no loss in precision. Knowing the position of the camera with respect to horizon it is also possible prune many hypotheses based on their size and location. The range of acceptable sizes and positions is set by looking at the statistical distribution of bounding boxes in labelled images. With this approach it is not needed to compute the entire feature pyramid: for example higher resolution features are only needed near the horizon. This results in an increase in mean average precision of 5% and an increase in speed by a factor of two. Furthermore, to reduce misdetections involving small pedestrians near the horizon, input images are supersampled near the horizon. Supersampling the image at 1.5 times the original scale, results in an increase in precision of about 4%. The implementation was tested against the public KITTI dataset, obtaining an 8% improvement in mean average precision over the best performing DPM-based method. By allowing for a small loss in precision computational time can be easily brought down to our target of 100ms per image, reaching a solution that is faster and still more precise than all publicly available DPM implementations.

Keywords: Autonomous vehicles, deformable part model, dpm, pedestrian recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1397

28 Copper Price Prediction Model for Various Economic Situations

Authors: Haidy S. Ghali, Engy Serag, A. Samer Ezeldin

Abstract:

Copper is an essential raw material used in the construction industry. During 2021 and the first half of 2022, the global market suffered from a significant fluctuation in copper raw material prices due to the aftermath of both the COVID-19 pandemic and the Russia-Ukraine war which exposed its consumers to an unexpected financial risk. Thereto, this paper aims to develop two hybrid price prediction models using artificial neural network and long short-term memory (ANN-LSTM), by Python, that can forecast the average monthly copper prices, traded in the London Metal Exchange; the first model is a multivariate model that forecasts the copper price of the next 1-month and the second is a univariate model that predicts the copper prices of the upcoming three months. Historical data of average monthly London Metal Exchange copper prices are collected from January 2009 till July 2022 and potential external factors are identified and employed in the multivariate model. These factors lie under three main categories: energy prices, and economic indicators of the three major exporting countries of copper depending on the data availability. Before developing the LSTM models, the collected external parameters are analyzed with respect to the copper prices using correlation, and multicollinearity tests in R software; then, the parameters are further screened to select the parameters that influence the copper prices. Then, the two LSTM models are developed, and the dataset is divided into training, validation, and testing sets. The results show that the performance of the 3-month prediction model is better than the 1-month prediction model; but still, both models can act as predicting tools for diverse economic situations.

Keywords: Copper prices, prediction model, neural network, time series forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 187

27 Evaluating the Validity of Computational Fluid Dynamics Model of Dispersion in a Complex Urban Geometry Using Two Sets of Experimental Measurements

Authors: Mohammad R. Kavian Nezhad, Carlos F. Lange, Brian A. Fleck

Abstract:

This research presents the validation study of a computational fluid dynamics (CFD) model developed to simulate the scalar dispersion emitted from rooftop sources around the buildings at the University of Alberta North Campus. The ANSYS CFX code was used to perform the numerical simulation of the wind regime and pollutant dispersion by solving the 3D steady Reynolds-averaged Navier-Stokes (RANS) equations on a building-scale high-resolution grid. The validation study was performed in two steps. First, the CFD model performance in 24 cases (eight wind directions and three wind speeds) was evaluated by comparing the predicted flow fields with the available data from the previous measurement campaign designed at the North Campus, using the standard deviation method (SDM), while the estimated results of the numerical model showed maximum average percent errors of approximately 53% and 37% for wind incidents from the North and Northwest, respectively. Good agreement with the measurements was observed for the other six directions, with an average error of less than 30%. In the second step, the reliability of the implemented turbulence model, numerical algorithm, modeling techniques, and the grid generation scheme was further evaluated using the Mock Urban Setting Test (MUST) dispersion dataset. Different statistical measures, including the fractional bias (FB), the mean geometric bias (MG), and the normalized mean square error (NMSE), were used to assess the accuracy of the predicted dispersion field. Our CFD results are in very good agreement with the field measurements.

Keywords: CFD, plume dispersion, complex urban geometry, validation study, wind flow.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 373

26 Remaining Useful Life Estimation of Bearings Based on Nonlinear Dimensional Reduction Combined with Timing Signals

Authors: Zhongmin Wang, Wudong Fan, Hengshan Zhang, Yimin Zhou

Abstract:

In data-driven prognostic methods, the prediction accuracy of the estimation for remaining useful life of bearings mainly depends on the performance of health indicators, which are usually fused some statistical features extracted from vibrating signals. However, the existing health indicators have the following two drawbacks: (1) The differnet ranges of the statistical features have the different contributions to construct the health indicators, the expert knowledge is required to extract the features. (2) When convolutional neural networks are utilized to tackle time-frequency features of signals, the time-series of signals are not considered. To overcome these drawbacks, in this study, the method combining convolutional neural network with gated recurrent unit is proposed to extract the time-frequency image features. The extracted features are utilized to construct health indicator and predict remaining useful life of bearings. First, original signals are converted into time-frequency images by using continuous wavelet transform so as to form the original feature sets. Second, with convolutional and pooling layers of convolutional neural networks, the most sensitive features of time-frequency images are selected from the original feature sets. Finally, these selected features are fed into the gated recurrent unit to construct the health indicator. The results state that the proposed method shows the enhance performance than the related studies which have used the same bearing dataset provided by PRONOSTIA.

Keywords: Continuous wavelet transform, convolution neural network, gated recurrent unit, health indicators, remaining useful life.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 767

25 Sea Level Characteristics Referenced to Specific Geodetic Datum in Alexandria, Egypt

Authors: Ahmed M. Khedr, Saad M. Abdelrahman, Kareem M. Tonbol

Abstract:

Two geo-referenced sea level datasets (September 2008 – November 2010) and (April 2012 – January 2014) were recorded at Alexandria Western Harbour (AWH). Accurate re-definition of tidal datum, referred to the latest International Terrestrial Reference Frame (ITRF-2014), was discussed and updated to improve our understanding of the old predefined tidal datum at Alexandria. Tidal and non-tidal components of sea level were separated with the use of Delft-3D hydrodynamic model-tide suit (Delft-3D, 2015). Tidal characteristics at AWH were investigated and harmonic analysis showed the most significant 34 constituents with their amplitudes and phases. Tide was identified as semi-diurnal pattern as indicated by a “Form Factor” of 0.24 and 0.25, respectively. Principle tidal datums related to major tidal phenomena were recalculated referred to a meaningful geodetic height datum. The portion of residual energy (surge) out of the total sea level energy was computed for each dataset and found 77% and 72%, respectively. Power spectral density (PSD) showed accurate resolvability in high band (1–6) cycle/days for the nominated independent constituents, except some neighbouring constituents, which are too close in frequency. Wind and atmospheric pressure data, during the recorded sea level time, were analysed and cross-correlated with the surge signals. Moderate association between surge and wind and atmospheric pressure data were obtained. In addition, long-term sea level rise trend at AWH was computed and showed good agreement with earlier estimated rates.

Keywords: Alexandria, Delft-3D, Egypt, geodetic reference, harmonic analysis, sea level.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1360

24 Hands-off Parking: Deep Learning Gesture-Based System for Individuals with Mobility Needs

Authors: Javier Romera, Alberto Justo, Ignacio Fidalgo, Javier Araluce, Joshué Pérez

Abstract:

Nowadays, individuals with mobility needs face a significant challenge when docking vehicles. In many cases, after parking, they encounter insufficient space to exit, leading to two undesired outcomes: either avoiding parking in that spot or settling for improperly placed vehicles. To address this issue, this paper presents a parking control system employing gestural teleoperation. The system comprises three main phases: capturing body markers, interpreting gestures, and transmitting orders to the vehicle. The initial phase is centered around the MediaPipe framework, a versatile tool optimized for real-time gesture recognition. MediaPipe excels at detecting and tracing body markers, with a special emphasis on hand gestures. Hands detection is done by generating 21 reference points for each hand. Subsequently, after data capture, the project employs the MultiPerceptron Layer (MPL) for in-depth gesture classification. This tandem of MediaPipe’s extraction prowess and MPL’s analytical capability ensures that human gestures are translated into actionable commands with high precision. Furthermore, the system has been trained and validated within a built-in dataset. To prove the domain adaptation, a framework based on the Robot Operating System 2 (ROS2), as a communication backbone, alongside CARLA Simulator, is used. Following successful simulations, the system is transitioned to a real-world platform, marking a significant milestone in the project. This real-vehicle implementation verifies the practicality and efficiency of the system beyond theoretical constructs.

Keywords: Gesture detection, MediaPipe, MultiLayer Perceptron Layer, Robot Operating System.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 136

23 Web Data Scraping Technology Using Term Frequency Inverse Document Frequency to Enhance the Big Data Quality on Sentiment Analysis

Authors: Sangita Pokhrel, Nalinda Somasiri, Rebecca Jeyavadhanam, Swathi Ganesan

Abstract:

Tourism is a booming industry with huge future potential for global wealth and employment. There are countless data generated over social media sites every day, creating numerous opportunities to bring more insights to decision-makers. The integration of big data technology into the tourism industry will allow companies to conclude where their customers have been and what they like. This information can then be used by businesses, such as those in charge of managing visitor centres or hotels, etc., and the tourist can get a clear idea of places before visiting. The technical perspective of natural language is processed by analysing the sentiment features of online reviews from tourists, and we then supply an enhanced long short-term memory (LSTM) framework for sentiment feature extraction of travel reviews. We have constructed a web review database using a crawler and web scraping technique for experimental validation to evaluate the effectiveness of our methodology. The text form of sentences was first classified through VADER and RoBERTa model to get the polarity of the reviews. In this paper, we have conducted study methods for feature extraction, such as Count Vectorization and Term Frequency – Inverse Document Frequency (TFIDF) Vectorization and implemented Convolutional Neural Network (CNN) classifier algorithm for the sentiment analysis to decide if the tourist’s attitude towards the destinations is positive, negative, or simply neutral based on the review text that they posted online. The results demonstrated that from the CNN algorithm, after pre-processing and cleaning the dataset, we received an accuracy of 96.12% for the positive and negative sentiment analysis.

Keywords: Counter vectorization, Convolutional Neural Network, Crawler, data technology, Long Short-Term Memory, LSTM, Web Scraping, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 175

22 Spatial Clustering Model of Vessel Trajectory to Extract Sailing Routes Based on AIS Data

Authors: Lubna Eljabu, Mohammad Etemad, Stan Matwin

Abstract:

The automatic extraction of shipping routes is advantageous for intelligent traffic management systems to identify events and support decision-making in maritime surveillance. At present, there is a high demand for the extraction of maritime traffic networks that resemble the real traffic of vessels accurately, which is valuable for further analytical processing tasks for vessels trajectories (e.g., naval routing and voyage planning, anomaly detection, destination prediction, time of arrival estimation). With the help of big data and processing huge amounts of vessels’ trajectory data, it is possible to learn these shipping routes from the navigation history of past behaviour of other, similar ships that were travelling in a given area. In this paper, we propose a spatial clustering model of vessels’ trajectories (SPTCLUST) to extract spatial representations of sailing routes from historical Automatic Identification System (AIS) data. The whole model consists of three main parts: data preprocessing, path finding, and route extraction, which consists of clustering and representative trajectory extraction. The proposed clustering method provides techniques to overcome the problems of: (i) optimal input parameters selection; (ii) the high complexity of processing a huge volume of multidimensional data; (iii) and the spatial representation of complete representative trajectory detection in the context of trajectory clustering algorithms. The experimental evaluation showed the effectiveness of the proposed model by using a real-world AIS dataset from the Port of Halifax. The results contribute to further understanding of shipping route patterns. This could aid surveillance authorities in stable and sustainable vessel traffic management.

Keywords: Vessel trajectory clustering, trajectory mining, Spatial Clustering, marine intelligent navigation, maritime traffic network extraction, sdailing routes extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 455

21 Toward Indoor and Outdoor Surveillance Using an Improved Fast Background Subtraction Algorithm

Authors: A. El Harraj, N. Raissouni

Abstract:

The detection of moving objects from a video image sequences is very important for object tracking, activity recognition, and behavior understanding in video surveillance. The most used approach for moving objects detection / tracking is background subtraction algorithms. Many approaches have been suggested for background subtraction. But, these are illumination change sensitive and the solutions proposed to bypass this problem are time consuming. In this paper, we propose a robust yet computationally efficient background subtraction approach and, mainly, focus on the ability to detect moving objects on dynamic scenes, for possible applications in complex and restricted access areas monitoring, where moving and motionless persons must be reliably detected. It consists of three main phases, establishing illumination changes invariance, background/foreground modeling and morphological analysis for noise removing. We handle illumination changes using Contrast Limited Histogram Equalization (CLAHE), which limits the intensity of each pixel to user determined maximum. Thus, it mitigates the degradation due to scene illumination changes and improves the visibility of the video signal. Initially, the background and foreground images are extracted from the video sequence. Then, the background and foreground images are separately enhanced by applying CLAHE. In order to form multi-modal backgrounds we model each channel of a pixel as a mixture of K Gaussians (K=5) using Gaussian Mixture Model (GMM). Finally, we post process the resulting binary foreground mask using morphological erosion and dilation transformations to remove possible noise. For experimental test, we used a standard dataset to challenge the efficiency and accuracy of the proposed method on a diverse set of dynamic scenes.

Keywords: Video surveillance, background subtraction, Contrast Limited Histogram Equalization, illumination invariance, object tracking, object detection, behavior understanding, dynamic scenes.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2087

20 Improving Subjective Bias Detection Using Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory

Authors: Ebipatei Victoria Tunyan, T. A. Cao, Cheol Young Ock

Abstract:

Detecting subjectively biased statements is a vital task. This is because this kind of bias, when present in the text or other forms of information dissemination media such as news, social media, scientific texts, and encyclopedias, can weaken trust in the information and stir conflicts amongst consumers. Subjective bias detection is also critical for many Natural Language Processing (NLP) tasks like sentiment analysis, opinion identification, and bias neutralization. Having a system that can adequately detect subjectivity in text will boost research in the above-mentioned areas significantly. It can also come in handy for platforms like Wikipedia, where the use of neutral language is of importance. The goal of this work is to identify the subjectively biased language in text on a sentence level. With machine learning, we can solve complex AI problems, making it a good fit for the problem of subjective bias detection. A key step in this approach is to train a classifier based on BERT (Bidirectional Encoder Representations from Transformers) as upstream model. BERT by itself can be used as a classifier; however, in this study, we use BERT as data preprocessor as well as an embedding generator for a Bi-LSTM (Bidirectional Long Short-Term Memory) network incorporated with attention mechanism. This approach produces a deeper and better classifier. We evaluate the effectiveness of our model using the Wiki Neutrality Corpus (WNC), which was compiled from Wikipedia edits that removed various biased instances from sentences as a benchmark dataset, with which we also compare our model to existing approaches. Experimental analysis indicates an improved performance, as our model achieved state-of-the-art accuracy in detecting subjective bias. This study focuses on the English language, but the model can be fine-tuned to accommodate other languages.

Keywords: Subjective bias detection, machine learning, BERT–BiLSTM–Attention, text classification, natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 830

19 Contextual SenSe Model: Word Sense Disambiguation Using Sense and Sense Value of Context Surrounding the Target

Authors: Vishal Raj, Noorhan Abbas

Abstract:

Ambiguity in NLP (Natural Language Processing) refers to the ability of a word, phrase, sentence, or text to have multiple meanings. This results in various kinds of ambiguities such as lexical, syntactic, semantic, anaphoric and referential. This study is focused mainly on solving the issue of Lexical ambiguity. Word Sense Disambiguation (WSD) is an NLP technique that aims to resolve lexical ambiguity by determining the correct meaning of a word within a given context. Most WSD solutions rely on words for training and testing, but we have used lemma and Part of Speech (POS) tokens of words for training and testing. Lemma adds generality and POS adds properties of word into token. We have designed a method to create an affinity matrix to calculate the affinity between any pair of lemma_POS (a token where lemma and POS of word are joined by underscore) of given training set. Additionally, we have devised an algorithm to create the sense clusters of tokens using affinity matrix under hierarchy of POS of lemma. Furthermore, three different mechanisms to predict the sense of target word using the affinity/similarity value are devised. Each contextual token contributes to the sense of target word with some value and whichever sense gets higher value becomes the sense of target word. So, contextual tokens play a key role in creating sense clusters and predicting the sense of target word, hence, the model is named Contextual SenSe Model (CSM). CSM exhibits a noteworthy simplicity and explication lucidity in contrast to contemporary deep learning models characterized by intricacy, time-intensive processes, and challenging explication. CSM is trained on SemCor training data and evaluated on SemEval test dataset. The results indicate that despite the naivety of the method, it achieves promising results when compared to the Most Frequent Sense (MFS) model.

Keywords: Word Sense Disambiguation, WSD, Contextual SenSe Model, Most Frequent Sense, part of speech, POS, Natural Language Processing, NLP, OOV, out of vocabulary, ELMo, Embeddings from Language Model, BERT, Bidirectional Encoder Representations from Transformers, Word2Vec, lemma_POS, Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 383

18 Methods for Material and Process Monitoring by Characterization of (Second and Third Order) Elastic Properties with Lamb Waves

Authors: R. Meier, M. Pander

Abstract:

In accordance with the industry 4.0 concept, manufacturing process steps as well as the materials themselves are going to be more and more digitalized within the next years. The “digital twin” representing the simulated and measured dataset of the (semi-finished) product can be used to control and optimize the individual processing steps and help to reduce costs and expenditure of time in product development, manufacturing, and recycling. In the present work, two material characterization methods based on Lamb waves were evaluated and compared. For demonstration purpose, both methods were shown at a standard industrial product - copper ribbons, often used in photovoltaic modules as well as in high-current microelectronic devices. By numerical approximation of the Rayleigh-Lamb dispersion model on measured phase velocities second order elastic constants (Young’s modulus, Poisson’s ratio) were determined. Furthermore, the effective third order elastic constants were evaluated by applying elastic, “non-destructive”, mechanical stress on the samples. In this way, small microstructural variations due to mechanical preconditioning could be detected for the first time. Both methods were compared with respect to precision and inline application capabilities. Microstructure of the samples was systematically varied by mechanical loading and annealing. Changes in the elastic ultrasound transport properties were correlated with results from microstructural analysis and mechanical testing. In summary, monitoring the elastic material properties of plate-like structures using Lamb waves is valuable for inline and non-destructive material characterization and manufacturing process control. Second order elastic constants analysis is robust over wide environmental and sample conditions, whereas the effective third order elastic constants highly increase the sensitivity with respect to small microstructural changes. Both Lamb wave based characterization methods are fitting perfectly into the industry 4.0 concept.

Keywords: Lamb waves, industry 4.0, process control, elasticity, acoustoelasticity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1098

17 Improved Computational Efficiency of Machine Learning Algorithms Based on Evaluation Metrics to Control the Spread of Coronavirus in the UK

Authors: Swathi Ganesan, Nalinda Somasiri, Rebecca Jeyavadhanam, Gayathri Karthick

Abstract:

The COVID-19 crisis presents a substantial and critical hazard to worldwide health. Since the occurrence of the disease in late January 2020 in the UK, the number of infected people confirmed to acquire the illness has increased tremendously across the country, and the number of individuals affected is undoubtedly considerably high. The purpose of this research is to figure out a predictive machine learning (ML) archetypal that could forecast the COVID-19 cases within the UK. This study concentrates on the statistical data collected from 31st January 2020 to 31st March 2021 in the United Kingdom. Information on total COVID-19 cases registered, new cases encountered on a daily basis, total death registered, and patients’ death per day due to Coronavirus is collected from World Health Organization (WHO). Data preprocessing is carried out to identify any missing values, outliers, or anomalies in the dataset. The data are split into 8:2 ratio for training and testing purposes to forecast future new COVID-19 cases. Support Vector Machine (SVM), Random Forest (RF), and linear regression (LR) algorithms are chosen to study the model performance in the prediction of new COVID-19 cases. From the evaluation metrics such as r-squared value and mean squared error, the statistical performance of the model in predicting the new COVID-19 cases is evaluated. RF outperformed the other two ML algorithms with a training accuracy of 99.47% and testing accuracy of 98.26% when n = 30. The mean square error obtained for RF is 4.05e11, which is lesser compared to the other predictive models used for this study. From the experimental analysis, RF algorithm can perform more effectively and efficiently in predicting the new COVID-19 cases, which could help the health sector to take relevant control measures for the spread of the virus.

Keywords: COVID-19, machine learning, supervised learning, unsupervised learning, linear regression, support vector machine, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 172

16 Client Satisfaction: Does Private or Public Health Sector Make a Difference? Results from Secondary Data Analysis in Sindh, Pakistan

Authors: Wajiha Javed, Arsalan Jabbar, Nelofer Mehboob, Muhammad Tafseer, Zahid Memon

Abstract:

Introduction: Researchers globally have strived to explore diverse factors that augment the continuation and uptake of family planning methods. Clients’ satisfaction is one of the core determinants facilitating continuation of family planning methods. There is a major debate yet scanty evidence to contrast public and private sectors with respect to client satisfaction. The objective of this study is to compare quality-of-care provided by public and private sectors of Pakistan through a client satisfaction lens. Methods: We used Pakistan Demographic Heath Survey 2012-13 dataset on 3133 women. Ten different multivariate models were made. to explore the relationship between client satisfaction and dependent outcome after adjusting for all known confounding factors and results are presented as OR and AOR (95% CI). Results: Multivariate analyses showed that clients were less satisfied in contraceptive provision from private sector as compared to public sector (AOR 0.92, 95% CI 0.63-1.68) even though the result was not statistically significant. Clients were more satisfied from private sector as compared to the public sector with respect to other determinants of quality-of-care follow-up care (AOR 3.29, 95% CI 1.95-5.55), infection prevention (AOR 2.41, 95% CI 1.60-3.62), counseling services (AOR 2.01, 95% CI 1.27-3.18, timely treatment (AOR 3.37, 95% CI 2.20-5.15), attitude of staff (AOR 2.23, 95% CI 1.50-3.33), punctuality of staff (AOR 2.28, 95% CI 1.92-4.13), timely referring (AOR 2.34, 95% CI 1.63-3.35), staff cooperation (AOR 1.75, 95% CI 1.22-2.51) and complications handling (AOR 2.27, 95% CI 1.56-3.29). Discussion: Public sector has successfully attained substantial satisfaction levels with respect to provision of contraceptives, but it contrasts previous literature from a multi country studies. Our study though in is concordance with a study from Tanzania where public sector was more likely to offer family planning services to clients as compared to private facilities. Conclusion: In majority of the developing countries, public sector is more involved in FP service provision; however, in Pakistan clients’ satisfaction in private sector is more, which opens doors for public-private partnerships and collaboration in the near future.

Keywords: Client satisfaction, Family Planning, Public private partnership, Quality of care

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2026

15 Effect of Biostimulants to Control the Phelipanche ramosa L. Pomel in Processing Tomato Crop

Authors: G. Disciglio, G. Gatta, F. Lops, A. Libutti, A. Tarantino, E. Tarantino

Abstract:

The experimental trial was carried out in open field at Foggia district (Apulia Region, Southern Italy), during the spring-summer season 2014, in order to evaluate the effect of four biostimulant products (RadiconÒ, Viormon plusÒ, LysodinÒ and SiaptonÒ 10L), compared with a control (no biostimulant), on the infestation of processing tomato crop (cv Dres) by the chlorophyll-lacking root parasite Phelipanche ramosa. Biostimulants consist in different categories of products (microbial inoculants, humic and fulvic acids, hydrolyzed proteins and aminoacids, seaweed extracts) which play various roles in plant growing, including the improvement of crop resistance and quali-quantitative characteristics of yield. The experimental trial was arranged according to a complete randomized block design with five treatments, each of one replicated three times. The processing tomato seedlings were transplanted on 5 May 2014. Throughout the crop cycle, P. ramosa infestation was assessed according to the number of emerged shoots (branched plants) counted in each plot, at 66, 78 and 92 day after transplanting. The tomato fruits were harvested at full-stage of maturity on 8 August 2014. From each plot, the marketable yield was measured and the quali-quantitative yield parameters (mean weight, dry matter content, colour coordinate, colour index and soluble solids content of the fruits) were determined. The whole dataset was tested according to the basic assumptions for the analysis of variance (ANOVA) and the differences between the means were determined using Tukey’s tests at the 5% probability level. The results of the study showed that none of the applied biostimulants provided a whole control of Phelipanche, although some positive effects were obtained from their application. To this respect, the RadiconÒ appeared to be the most effective in reducing the infestation of this root-parasite in tomato crop. This treatment also gave the higher tomato yield.

Keywords: Biostimulants, control methods, Phelipanche ramosa, processing tomato crop.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1904

14 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal

Abstract:

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Keywords: Benchmark collection, program educational objectives, student outcomes, ABET, Accreditation, machine learning, supervised multiclass classification, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 837

13 A Risk Assessment Tool for the Contamination of Aflatoxins on Dried Figs based on Machine Learning Algorithms

Authors: Kottaridi Klimentia, Demopoulos Vasilis, Sidiropoulos Anastasios, Ihara Diego, Nikolaidis Vasileios, Antonopoulos Dimitrios

Abstract:

Aflatoxins are highly poisonous and carcinogenic compounds produced by species of the genus Aspergillus spp. that can infect a variety of agricultural foods, including dried figs. Biological and environmental factors, such as population, pathogenicity and aflatoxinogenic capacity of the strains, topography, soil and climate parameters of the fig orchards are believed to have a strong effect on aflatoxin levels. Existing methods for aflatoxin detection and measurement, such as high-performance liquid chromatography (HPLC), and enzyme-linked immunosorbent assay (ELISA), can provide accurate results, but the procedures are usually time-consuming, sample-destructive and expensive. Predicting aflatoxin levels prior to crop harvest is useful for minimizing the health and financial impact of a contaminated crop. Consequently, there is interest in developing a tool that predicts aflatoxin levels based on topography and soil analysis data of fig orchards. This paper describes the development of a risk assessment tool for the contamination of aflatoxin on dried figs, based on the location and altitude of the fig orchards, the population of the fungus Aspergillus spp. in the soil, and soil parameters such as pH, saturation percentage (SP), electrical conductivity (EC), organic matter, particle size analysis (sand, silt, clay), concentration of the exchangeable cations (Ca, Mg, K, Na), extractable P and trace of elements (B, Fe, Mn, Zn and Cu), by employing machine learning methods. In particular, our proposed method integrates three machine learning techniques i.e., dimensionality reduction on the original dataset (Principal Component Analysis), metric learning (Mahalanobis Metric for Clustering) and K-nearest Neighbors learning algorithm (KNN), into an enhanced model, with mean performance equal to 85% by terms of the Pearson Correlation Coefficient (PCC) between observed and predicted values.

Keywords: aflatoxins, Aspergillus spp., dried figs, k-nearest neighbors, machine learning, prediction

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 648

12 Understanding Help Seeking among Black Women with Clinically Significant Posttraumatic Stress Symptoms

Authors: Glenda Wrenn, Juliet Muzere, Meldra Hall, Allyson Belton, Kisha Holden, Chanita Hughes-Halbert, Martha Kent, Bekh Bradley

Abstract:

Understanding the help seeking decision making process and experiences of health disparity populations with posttraumatic stress disorder (PTSD) is central to development of trauma-informed, culturally centered, and patient focused services. Yet, little is known about the decision making process among adult Black women who are non-treatment seekers as they are, by definition, not engaged in services. Methods: Audiotaped interviews were conducted with 30 African American adult women with clinically significant PTSD symptoms who were engaged in primary care, but not in treatment for PTSD despite symptom burden. A qualitative interview guide was used to elucidate key themes. Independent coding of themes mapped to theory and identification of emergent themes were conducted using qualitative methods. An existing quantitative dataset was analyzed to contextualize responses and provide a descriptive summary of the sample. Results: Emergent themes revealed that active mental avoidance, the intermittent nature of distress, ambivalence, and self-identified resilience as undermining to help seeking decisions. Participants were stuck within the help-seeking phase of ‘recognition’ of illness and retained a sense of “it is my decision” despite endorsing significant social and environmental negative influencers. Participants distinguished ‘help acceptance’ from ‘help seeking’ with greater willingness to accept help and importance placed on being of help to others. Conclusions: Elucidation of the decision-making process from the perspective of non-treatment seekers has implications for outreach and treatment within models of integrated and specialty systems care. The salience of responses to trauma symptoms and stagnation in the help seeking recognition phase are findings relevant to integrated care service design and community engagement.

Keywords: Culture, help-seeking, integrated care, PTSD.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1121

11 Parametric Approach for Reserve Liability Estimate in Mortgage Insurance

Authors: Rajinder Singh, Ram Valluru

Abstract:

Chain Ladder (CL) method, Expected Loss Ratio (ELR) method and Bornhuetter-Ferguson (BF) method, in addition to more complex transition-rate modeling, are commonly used actuarial reserving methods in general insurance. There is limited published research about their relative performance in the context of Mortgage Insurance (MI). In our experience, these traditional techniques pose unique challenges and do not provide stable claim estimates for medium to longer term liabilities. The relative strengths and weaknesses among various alternative approaches revolve around: stability in the recent loss development pattern, sufficiency and reliability of loss development data, and agreement/disagreement between reported losses to date and ultimate loss estimate. CL method results in volatile reserve estimates, especially for accident periods with little development experience. The ELR method breaks down especially when ultimate loss ratios are not stable and predictable. While the BF method provides a good tradeoff between the loss development approach (CL) and ELR, the approach generates claim development and ultimate reserves that are disconnected from the ever-to-date (ETD) development experience for some accident years that have more development experience. Further, BF is based on subjective a priori assumption. The fundamental shortcoming of these methods is their inability to model exogenous factors, like the economy, which impact various cohorts at the same chronological time but at staggered points along their life-time development. This paper proposes an alternative approach of parametrizing the loss development curve and using logistic regression to generate the ultimate loss estimate for each homogeneous group (accident year or delinquency period). The methodology was tested on an actual MI claim development dataset where various cohorts followed a sigmoidal trend, but levels varied substantially depending upon the economic and operational conditions during the development period spanning over many years. The proposed approach provides the ability to indirectly incorporate such exogenous factors and produce more stable loss forecasts for reserving purposes as compared to the traditional CL and BF methods.

Keywords: Actuarial loss reserving techniques, logistic regression, parametric function, volatility.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 416

10 TheAnalyzer: Clustering-Based System for Improving Business Productivity by Analyzing User Profiles to Enhance Human-Computer Interaction

Authors: D. S. A. Nanayakkara, K. J. P. G. Perera

Abstract:

E-commerce platforms have revolutionized the shopping experience, offering convenient ways for consumers to make purchases. To improve interactions with customers and optimize marketing strategies, it is essential for businesses to understand user behavior, preferences, and needs on these platforms. This paper focuses on recommending businesses to customize interactions with users based on their behavioral patterns, leveraging data-driven analysis and machine learning techniques. Businesses can improve engagement and boost the adoption of e-commerce platforms by aligning behavioral patterns with user goals of usability and satisfaction. We propose TheAnalyzer, a clustering-based system designed to enhance business productivity by analyzing user-profiles and improving human-computer interaction. TheAnalyzer seamlessly integrates with business applications, collecting relevant data points based on users' natural interactions without additional burdens such as questionnaires or surveys. It defines five key user analytics as features for its dataset, which are easily captured through users' interactions with e-commerce platforms. This research presents a study demonstrating the successful distinction of users into specific groups based on the five key analytics considered by TheAnalyzer. With the assistance of domain experts, customized business rules can be attached to each group, enabling TheAnalyzer to influence business applications and provide an enhanced personalized user experience. The outcomes are evaluated quantitatively and qualitatively, demonstrating that utilizing TheAnalyzer’s capabilities can optimize business outcomes, enhance customer satisfaction, and drive sustainable growth. The findings of this research contribute to the advancement of personalized interactions in e-commerce platforms. By leveraging user behavioral patterns and analyzing both new and existing users, businesses can effectively tailor their interactions to improve customer satisfaction, loyalty and ultimately drive sales.

Keywords: Data clustering, data standardization, dimensionality reduction, human-computer interaction, user profiling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 228

9 Deep Learning for Renewable Power Forecasting: An Approach Using LSTM Neural Networks

Authors: Fazıl Gökgöz, Fahrettin Filiz

Abstract:

Load forecasting has become crucial in recent years and become popular in forecasting area. Many different power forecasting models have been tried out for this purpose. Electricity load forecasting is necessary for energy policies, healthy and reliable grid systems. Effective power forecasting of renewable energy load leads the decision makers to minimize the costs of electric utilities and power plants. Forecasting tools are required that can be used to predict how much renewable energy can be utilized. The purpose of this study is to explore the effectiveness of LSTM-based neural networks for estimating renewable energy loads. In this study, we present models for predicting renewable energy loads based on deep neural networks, especially the Long Term Memory (LSTM) algorithms. Deep learning allows multiple layers of models to learn representation of data. LSTM algorithms are able to store information for long periods of time. Deep learning models have recently been used to forecast the renewable energy sources such as predicting wind and solar energy power. Historical load and weather information represent the most important variables for the inputs within the power forecasting models. The dataset contained power consumption measurements are gathered between January 2016 and December 2017 with one-hour resolution. Models use publicly available data from the Turkish Renewable Energy Resources Support Mechanism. Forecasting studies have been carried out with these data via deep neural networks approach including LSTM technique for Turkish electricity markets. 432 different models are created by changing layers cell count and dropout. The adaptive moment estimation (ADAM) algorithm is used for training as a gradient-based optimizer instead of SGD (stochastic gradient). ADAM performed better than SGD in terms of faster convergence and lower error rates. Models performance is compared according to MAE (Mean Absolute Error) and MSE (Mean Squared Error). Best five MAE results out of 432 tested models are 0.66, 0.74, 0.85 and 1.09. The forecasting performance of the proposed LSTM models gives successful results compared to literature searches.

Keywords: Deep learning, long-short-term memory, energy, renewable energy load forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1596