Search results for: time series clustering
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 19722

Search results for: time series clustering

19422 Forecasting Container Throughput: Using Aggregate or Terminal-Specific Data?

Authors: Gu Pang, Bartosz Gebka

Abstract:

We forecast the demand of total container throughput at the Indonesia’s largest seaport, Tanjung Priok Port. We propose four univariate forecasting models, including SARIMA, the additive Seasonal Holt-Winters, the multiplicative Seasonal Holt-Winters and the Vector Error Correction Model. Our aim is to provide insights into whether forecasting the total container throughput obtained by historical aggregated port throughput time series is superior to the forecasts of the total throughput obtained by summing up the best individual terminal forecasts. We test the monthly port/individual terminal container throughput time series between 2003 and 2013. The performance of forecasting models is evaluated based on Mean Absolute Error and Root Mean Squared Error. Our results show that the multiplicative Seasonal Holt-Winters model produces the most accurate forecasts of total container throughput, whereas SARIMA generates the worst in-sample model fit. The Vector Error Correction Model provides the best model fits and forecasts for individual terminals. Our results report that the total container throughput forecasts based on modelling the total throughput time series are consistently better than those obtained by combining those forecasts generated by terminal-specific models. The forecasts of total throughput until the end of 2018 provide an essential insight into the strategic decision-making on the expansion of port's capacity and construction of new container terminals at Tanjung Priok Port.

Keywords: SARIMA, Seasonal Holt-Winters, Vector Error Correction Model, container throughput

Procedia PDF Downloads 479
19421 The Relationships between Carbon Dioxide (CO2) Emissions, Energy Consumption and GDP per capita for Oman: Time Series Analysis, 1980–2010

Authors: Jinhoa Lee

Abstract:

The relationships between environmental quality, energy use and economic output have created growing attention over the past decades among researchers and policy makers. Focusing on the empirical aspects of the role of CO2 emissions and energy use in affecting the economic output, this paper is an effort to fulfil the gap in a comprehensive case study at a country level using modern econometric techniques. To achieve the goal, this country-specific study examines the short-run and long-run relationships among energy consumption, carbon dioxide (CO2) emissions and gross domestic product (GDP) for Oman using time series analysis from the year 1980-2010. To investigate the relationships between the variables, this paper employs the Augmented Dickey Fuller (ADF) test for stationary, Johansen maximum likelihood method for co-integration and a Vector Error Correction Model (VECM) for both short- and long-run causality among the research variables for the sample. All the variables in this study show very strong significant effects on GDP in the country for the long term. The long-run equilibrium in the VECM suggests positive long-run causalities from CO2 emissions to GDP. Conversely, negative impacts of energy consumption on GDP are found to be significant in Oman during the period. In the short run, there exist negative unidirectional causalities among GDP, CO2 emissions and energy consumption running from GDP to CO2 emissions and from energy consumption to CO2 emissions. Overall, the results support arguments that there are relationships among environmental quality, energy use and economic output in Oman over of period 1980-2010.

Keywords: CO2 emissions, energy consumption, GDP, Oman, time series analysis

Procedia PDF Downloads 438
19420 A Concept of Data Mining with XML Document

Authors: Akshay Agrawal, Anand K. Srivastava

Abstract:

The increasing amount of XML datasets available to casual users increases the necessity of investigating techniques to extract knowledge from these data. Data mining is widely applied in the database research area in order to extract frequent correlations of values from both structured and semi-structured datasets. The increasing availability of heterogeneous XML sources has raised a number of issues concerning how to represent and manage these semi structured data. In recent years due to the importance of managing these resources and extracting knowledge from them, lots of methods have been proposed in order to represent and cluster them in different ways.

Keywords: XML, similarity measure, clustering, cluster quality, semantic clustering

Procedia PDF Downloads 351
19419 Analysis of Production Forecasting in Unconventional Gas Resources Development Using Machine Learning and Data-Driven Approach

Authors: Dongkwon Han, Sangho Kim, Sunil Kwon

Abstract:

Unconventional gas resources have dramatically changed the future energy landscape. Unlike conventional gas resources, the key challenges in unconventional gas have been the requirement that applies to advanced approaches for production forecasting due to uncertainty and complexity of fluid flow. In this study, artificial neural network (ANN) model which integrates machine learning and data-driven approach was developed to predict productivity in shale gas. The database of 129 wells of Eagle Ford shale basin used for testing and training of the ANN model. The Input data related to hydraulic fracturing, well completion and productivity of shale gas were selected and the output data is a cumulative production. The performance of the ANN using all data sets, clustering and variables importance (VI) models were compared in the mean absolute percentage error (MAPE). ANN model using all data sets, clustering, and VI were obtained as 44.22%, 10.08% (cluster 1), 5.26% (cluster 2), 6.35%(cluster 3), and 32.23% (ANN VI), 23.19% (SVM VI), respectively. The results showed that the pre-trained ANN model provides more accurate results than the ANN model using all data sets.

Keywords: unconventional gas, artificial neural network, machine learning, clustering, variables importance

Procedia PDF Downloads 174
19418 Forecasting Stock Indexes Using Bayesian Additive Regression Tree

Authors: Darren Zou

Abstract:

Forecasting the stock market is a very challenging task. Various economic indicators such as GDP, exchange rates, interest rates, and unemployment have a substantial impact on the stock market. Time series models are the traditional methods used to predict stock market changes. In this paper, a machine learning method, Bayesian Additive Regression Tree (BART) is used in predicting stock market indexes based on multiple economic indicators. BART can be used to model heterogeneous treatment effects, and thereby works well when models are misspecified. It also has the capability to handle non-linear main effects and multi-way interactions without much input from financial analysts. In this research, BART is proposed to provide a reliable prediction on day-to-day stock market activities. By comparing the analysis results from BART and with time series method, BART can perform well and has better prediction capability than the traditional methods.

Keywords: BART, Bayesian, predict, stock

Procedia PDF Downloads 100
19417 The Relationships between Energy Consumption, Carbon Dioxide (CO2) Emissions, and GDP for Turkey: Time Series Analysis, 1980-2010

Authors: Jinhoa Lee

Abstract:

The relationships between environmental quality, energy use and economic output have created growing attention over the past decades among researchers and policy makers. Focusing on the empirical aspects of the role of carbon dioxide (CO2) emissions and energy use in affecting the economic output, this paper is an effort to fulfill the gap in a comprehensive case study at a country level using modern econometric techniques. To achieve the goal, this country-specific study examines the short-run and long-run relationships among energy consumption (using disaggregated energy sources: crude oil, coal, natural gas, and electricity), CO2 emissions and gross domestic product (GDP) for Turkey using time series analysis from the year 1980-2010. To investigate the relationships between the variables, this paper employs the Augmented Dickey-Fuller (ADF) test for stationarity, Johansen’s maximum likelihood method for cointegration and a Vector Error Correction Model (VECM) for both short- and long-run causality among the research variables for the sample. The long-run equilibrium in the VECM suggests no effects of the CO2 emissions and energy use on the GDP in Turkey. There exists a short-run bidirectional relationship between the electricity and natural gas consumption, and also there is a negative unidirectional causality running from the GDP to electricity use. Overall, the results partly support arguments that there are relationships between energy use and economic output; however, the effects may differ due to the source of energy such as in the case of Turkey for the period of 1980-2010. However, there is no significant relationship between the CO2 emissions and the GDP and between the CO2 emissions and the energy use both in the short term and long term.

Keywords: CO2 emissions, energy consumption, GDP, Turkey, time series analysis

Procedia PDF Downloads 487
19416 Modeling of Diurnal Pattern of Air Temperature in a Tropical Environment: Ile-Ife and Ibadan, Nigeria

Authors: Rufus Temidayo Akinnubi, M. O. Adeniyi

Abstract:

Existing diurnal air temperature models simulate night time air temperature over Nigeria with high biases. An improved parameterization is presented for modeling the diurnal pattern of air temperature (Ta) which is applicable in the calculation of turbulent heat fluxes in Global climate models, based on Nigeria Micrometeorological Experimental site (NIMEX) surface layer observations. Five diurnal Ta models for estimating hourly Ta from daily maximum, daily minimum, and daily mean air temperature were validated using root-mean-square error (RMSE), Mean Error Bias (MBE) and scatter graphs. The original Fourier series model showed better performance for unstable air temperature parameterizations while the stable Ta was strongly overestimated with a large error. The model was improved with the inclusion of the atmospheric cooling rate that accounts for the temperature inversion that occurs during the nocturnal boundary layer condition. The MBE and RMSE estimated by the modified Fourier series model reduced by 4.45 oC and 3.12 oC during the transitional period from dry to wet stable atmospheric conditions. The modified Fourier series model gave good estimation of the diurnal weather patterns of Ta when compared with other existing models for a tropical environment.

Keywords: air temperature, mean bias error, Fourier series analysis, surface energy balance,

Procedia PDF Downloads 205
19415 The Effect of PM10 Dispersion from Industrial, Residential and Commercial Areas in Arid Environment

Authors: Meshari Al-Harbi

Abstract:

A comparative area-season-elemental-wise time series analysis by Dust Track monitor (2012-2013) revealed high PM10 dispersion in the outdoor environment in the sequence of industrial> express highways>residential>open areas. Time series analysis from 7AM-6AM (until next day), 30d (monthly), 3600sec. (for any given period of a month), and 12 months (yearly) showed peak PM10 dispersion during 1AM-7AM, 1d-4d and 25d-31d of every month, 1500-3600 with the exception in PM10 dispersion in residential areas, and in the months-March to June, respectively. This time-bound PM10 dispersion suggests the primary influence of human activities (peak mobility and productivity period for a given time frame) besides the secondary influence of meteorological parameters (high temperature and wind action) and, occasional dust storms. Whereas, gravimetric analysis reveals the influence of precipitation, low temperature and low volatility resulting high trace metals in PM10 during winter than in summer and primarily attributes to the influence of nature besides, the secondary attributes of smoke stack emission from various industries and automobiles. Furthermore, our study recommends residents to limit outdoor air pollution exposures and take precautionary measures to inhale PM10 pollutants from the atmosphere.

Keywords: aerosol, pollution, respirable particulates, trace-metals

Procedia PDF Downloads 287
19414 Implementation of Algorithm K-Means for Grouping District/City in Central Java Based on Macro Economic Indicators

Authors: Nur Aziza Luxfiati

Abstract:

Clustering is partitioning data sets into sub-sets or groups in such a way that elements certain properties have shared property settings with a high level of similarity within one group and a low level of similarity between groups. . The K-Means algorithm is one of thealgorithmsclustering as a grouping tool that is most widely used in scientific and industrial applications because the basic idea of the kalgorithm is-means very simple. In this research, applying the technique of clustering using the k-means algorithm as a method of solving the problem of national development imbalances between regions in Central Java Province based on macroeconomic indicators. The data sample used is secondary data obtained from the Central Java Provincial Statistics Agency regarding macroeconomic indicator data which is part of the publication of the 2019 National Socio-Economic Survey (Susenas) data. score and determine the number of clusters (k) using the elbow method. After the clustering process is carried out, the validation is tested using themethodsBetween-Class Variation (BCV) and Within-Class Variation (WCV). The results showed that detection outlier using z-score normalization showed no outliers. In addition, the results of the clustering test obtained a ratio value that was not high, namely 0.011%. There are two district/city clusters in Central Java Province which have economic similarities based on the variables used, namely the first cluster with a high economic level consisting of 13 districts/cities and theclustersecondwith a low economic level consisting of 22 districts/cities. And in the cluster second, namely, between low economies, the authors grouped districts/cities based on similarities to macroeconomic indicators such as 20 districts of Gross Regional Domestic Product, with a Poverty Depth Index of 19 districts, with 5 districts in Human Development, and as many as Open Unemployment Rate. 10 districts.

Keywords: clustering, K-Means algorithm, macroeconomic indicators, inequality, national development

Procedia PDF Downloads 133
19413 Automatic Detection and Update of Region of Interest in Vehicular Traffic Surveillance Videos

Authors: Naydelis Brito Suárez, Deni Librado Torres Román, Fernando Hermosillo Reynoso

Abstract:

Automatic detection and generation of a dynamic ROI (Region of Interest) in vehicle traffic surveillance videos based on a static camera in Intelligent Transportation Systems is challenging for computer vision-based systems. The dynamic ROI, being a changing ROI, should capture any other moving object located outside of a static ROI. In this work, the video is represented by a Tensor model composed of a Background and a Foreground Tensor, which contains all moving vehicles or objects. The values of each pixel over a time interval are represented by time series, and some pixel rows were selected. This paper proposes a pixel entropy-based algorithm for automatic detection and generation of a dynamic ROI in traffic videos under the assumption of two types of theoretical pixel entropy behaviors: (1) a pixel located at the road shows a high entropy value due to disturbances in this zone by vehicle traffic, (2) a pixel located outside the road shows a relatively low entropy value. To study the statistical behavior of the selected pixels, detecting the entropy changes and consequently moving objects, Shannon, Tsallis, and Approximate entropies were employed. Although Tsallis entropy achieved very high results in real-time, Approximate entropy showed results slightly better but in greater time.

Keywords: convex hull, dynamic ROI detection, pixel entropy, time series, moving objects

Procedia PDF Downloads 44
19412 Co-Integration and Error Correction Mechanism of Supply Response of Sugarcane in Pakistan (1980-2012)

Authors: Himayatullah Khan

Abstract:

This study estimates supply response function of sugarcane in Pakistan from 1980-81 to 2012-13. The study uses co-integration approach and error correction mechanism. Sugarcane production, area and price series were tested for unit root using Augmented Dickey Fuller (ADF). The study found that these series were stationary at their first differenced level. Using the Augmented Engle-Granger test and Cointegrating Regression Durbin-Watson (CRDW) test, the study found that “production and price” and “area and price” were co-integrated suggesting that the two sets of time series had long-run or equilibrium relationship. The results of the error correction models for the two sets of series showed that there was disequilibrium in the short run there may be disequilibrium. The Engle-Granger residual may be thought of as the equilibrium error which can be used to tie the short-run behavior of the dependent variable to its long-run value. The Granger-Causality test results showed that log of price granger caused both the long of production and log of area whereas, the log of production and log of area Granger caused each other.

Keywords: co-integration, error correction mechanism, Granger-causality, sugarcane, supply response

Procedia PDF Downloads 413
19411 Forecasting Model to Predict Dengue Incidence in Malaysia

Authors: W. H. Wan Zakiyatussariroh, A. A. Nasuhar, W. Y. Wan Fairos, Z. A. Nazatul Shahreen

Abstract:

Forecasting dengue incidence in a population can provide useful information to facilitate the planning of the public health intervention. Many studies on dengue cases in Malaysia were conducted but are limited in modeling the outbreak and forecasting incidence. This article attempts to propose the most appropriate time series model to explain the behavior of dengue incidence in Malaysia for the purpose of forecasting future dengue outbreaks. Several seasonal auto-regressive integrated moving average (SARIMA) models were developed to model Malaysia’s number of dengue incidence on weekly data collected from January 2001 to December 2011. SARIMA (2,1,1)(1,1,1)52 model was found to be the most suitable model for Malaysia’s dengue incidence with the least value of Akaike information criteria (AIC) and Bayesian information criteria (BIC) for in-sample fitting. The models further evaluate out-sample forecast accuracy using four different accuracy measures. The results indicate that SARIMA (2,1,1)(1,1,1)52 performed well for both in-sample fitting and out-sample evaluation.

Keywords: time series modeling, Box-Jenkins, SARIMA, forecasting

Procedia PDF Downloads 452
19410 A Stepwise Approach to Automate the Search for Optimal Parameters in Seasonal ARIMA Models

Authors: Manisha Mukherjee, Diptarka Saha

Abstract:

Reliable forecasts of univariate time series data are often necessary for several contexts. ARIMA models are quite popular among practitioners in this regard. Hence, choosing correct parameter values for ARIMA is a challenging yet imperative task. Thus, a stepwise algorithm is introduced to provide automatic and robust estimates for parameters (p; d; q)(P; D; Q) used in seasonal ARIMA models. This process is focused on improvising the overall quality of the estimates, and it alleviates the problems induced due to the unidimensional nature of the methods that are currently used such as auto.arima. The fast and automated search of parameter space also ensures reliable estimates of the parameters that possess several desirable qualities, consequently, resulting in higher test accuracy especially in the cases of noisy data. After vigorous testing on real as well as simulated data, the algorithm doesn’t only perform better than current state-of-the-art methods, it also completely obviates the need for human intervention due to its automated nature.

Keywords: time series, ARIMA, auto.arima, ARIMA parameters, forecast, R function

Procedia PDF Downloads 136
19409 An Empirical Study to Predict Myocardial Infarction Using K-Means and Hierarchical Clustering

Authors: Md. Minhazul Islam, Shah Ashisul Abed Nipun, Majharul Islam, Md. Abdur Rakib Rahat, Jonayet Miah, Salsavil Kayyum, Anwar Shadaab, Faiz Al Faisal

Abstract:

The target of this research is to predict Myocardial Infarction using unsupervised Machine Learning algorithms. Myocardial Infarction Prediction related to heart disease is a challenging factor faced by doctors & hospitals. In this prediction, accuracy of the heart disease plays a vital role. From this concern, the authors have analyzed on a myocardial dataset to predict myocardial infarction using some popular Machine Learning algorithms K-Means and Hierarchical Clustering. This research includes a collection of data and the classification of data using Machine Learning Algorithms. The authors collected 345 instances along with 26 attributes from different hospitals in Bangladesh. This data have been collected from patients suffering from myocardial infarction along with other symptoms. This model would be able to find and mine hidden facts from historical Myocardial Infarction cases. The aim of this study is to analyze the accuracy level to predict Myocardial Infarction by using Machine Learning techniques.

Keywords: Machine Learning, K-means, Hierarchical Clustering, Myocardial Infarction, Heart Disease

Procedia PDF Downloads 181
19408 Cleaning of Scientific References in Large Patent Databases Using Rule-Based Scoring and Clustering

Authors: Emiel Caron

Abstract:

Patent databases contain patent related data, organized in a relational data model, and are used to produce various patent statistics. These databases store raw data about scientific references cited by patents. For example, Patstat holds references to tens of millions of scientific journal publications and conference proceedings. These references might be used to connect patent databases with bibliographic databases, e.g. to study to the relation between science, technology, and innovation in various domains. Problematic in such studies is the low data quality of the references, i.e. they are often ambiguous, unstructured, and incomplete. Moreover, a complete bibliographic reference is stored in only one attribute. Therefore, a computerized cleaning and disambiguation method for large patent databases is developed in this work. The method uses rule-based scoring and clustering. The rules are based on bibliographic metadata, retrieved from the raw data by regular expressions, and are transparent and adaptable. The rules in combination with string similarity measures are used to detect pairs of records that are potential duplicates. Due to the scoring, different rules can be combined, to join scientific references, i.e. the rules reinforce each other. The scores are based on expert knowledge and initial method evaluation. After the scoring, pairs of scientific references that are above a certain threshold, are clustered by means of single-linkage clustering algorithm to form connected components. The method is designed to disambiguate all the scientific references in the Patstat database. The performance evaluation of the clustering method, on a large golden set with highly cited papers, shows on average a 99% precision and a 95% recall. The method is therefore accurate but careful, i.e. it weighs precision over recall. Consequently, separate clusters of high precision are sometimes formed, when there is not enough evidence for connecting scientific references, e.g. in the case of missing year and journal information for a reference. The clusters produced by the method can be used to directly link the Patstat database with bibliographic databases as the Web of Science or Scopus.

Keywords: clustering, data cleaning, data disambiguation, data mining, patent analysis, scientometrics

Procedia PDF Downloads 171
19407 Forecasting Residential Water Consumption in Hamilton, New Zealand

Authors: Farnaz Farhangi

Abstract:

Many people in New Zealand believe that the access to water is inexhaustible, and it comes from a history of virtually unrestricted access to it. For the region like Hamilton which is one of New Zealand’s fastest growing cities, it is crucial for policy makers to know about the future water consumption and implementation of rules and regulation such as universal water metering. Hamilton residents use water freely and they do not have any idea about how much water they use. Hence, one of proposed objectives of this research is focusing on forecasting water consumption using different methods. Residential water consumption time series exhibits seasonal and trend variations. Seasonality is the pattern caused by repeating events such as weather conditions in summer and winter, public holidays, etc. The problem with this seasonal fluctuation is that, it dominates other time series components and makes difficulties in determining other variations (such as educational campaign’s effect, regulation, etc.) in time series. Apart from seasonality, a stochastic trend is also combined with seasonality and makes different effects on results of forecasting. According to the forecasting literature, preprocessing (de-trending and de-seasonalization) is essential to have more performed forecasting results, while some other researchers mention that seasonally non-adjusted data should be used. Hence, I answer the question that is pre-processing essential? A wide range of forecasting methods exists with different pros and cons. In this research, I apply double seasonal ARIMA and Artificial Neural Network (ANN), considering diverse elements such as seasonality and calendar effects (public and school holidays) and combine their results to find the best predicted values. My hypothesis is the examination the results of combined method (hybrid model) and individual methods and comparing the accuracy and robustness. In order to use ARIMA, the data should be stationary. Also, ANN has successful forecasting applications in terms of forecasting seasonal and trend time series. Using a hybrid model is a way to improve the accuracy of the methods. Due to the fact that water demand is dominated by different seasonality, in order to find their sensitivity to weather conditions or calendar effects or other seasonal patterns, I combine different methods. The advantage of this combination is reduction of errors by averaging of each individual model. It is also useful when we are not sure about the accuracy of each forecasting model and it can ease the problem of model selection. Using daily residential water consumption data from January 2000 to July 2015 in Hamilton, I indicate how prediction by different methods varies. ANN has more accurate forecasting results than other method and preprocessing is essential when we use seasonal time series. Using hybrid model reduces forecasting average errors and increases the performance.

Keywords: artificial neural network (ANN), double seasonal ARIMA, forecasting, hybrid model

Procedia PDF Downloads 303
19406 Towards Real-Time Classification of Finger Movement Direction Using Encephalography Independent Components

Authors: Mohamed Mounir Tellache, Hiroyuki Kambara, Yasuharu Koike, Makoto Miyakoshi, Natsue Yoshimura

Abstract:

This study explores the practicality of using electroencephalographic (EEG) independent components to predict eight-direction finger movements in pseudo-real-time. Six healthy participants with individual-head MRI images performed finger movements in eight directions with two different arm configurations. The analysis was performed in two stages. The first stage consisted of using independent component analysis (ICA) to separate the signals representing brain activity from non-brain activity signals and to obtain the unmixing matrix. The resulting independent components (ICs) were checked, and those reflecting brain-activity were selected. Finally, the time series of the selected ICs were used to predict eight finger-movement directions using Sparse Logistic Regression (SLR). The second stage consisted of using the previously obtained unmixing matrix, the selected ICs, and the model obtained by applying SLR to classify a different EEG dataset. This method was applied to two different settings, namely the single-participant level and the group-level. For the single-participant level, the EEG dataset used in the first stage and the EEG dataset used in the second stage originated from the same participant. For the group-level, the EEG datasets used in the first stage were constructed by temporally concatenating each combination without repetition of the EEG datasets of five participants out of six, whereas the EEG dataset used in the second stage originated from the remaining participants. The average test classification results across datasets (mean ± S.D.) were 38.62 ± 8.36% for the single-participant, which was significantly higher than the chance level (12.50 ± 0.01%), and 27.26 ± 4.39% for the group-level which was also significantly higher than the chance level (12.49% ± 0.01%). The classification accuracy within [–45°, 45°] of the true direction is 70.03 ± 8.14% for single-participant and 62.63 ± 6.07% for group-level which may be promising for some real-life applications. Clustering and contribution analyses further revealed the brain regions involved in finger movement and the temporal aspect of their contribution to the classification. These results showed the possibility of using the ICA-based method in combination with other methods to build a real-time system to control prostheses.

Keywords: brain-computer interface, electroencephalography, finger motion decoding, independent component analysis, pseudo real-time motion decoding

Procedia PDF Downloads 119
19405 Automatic Detection of Traffic Stop Locations Using GPS Data

Authors: Areej Salaymeh, Loren Schwiebert, Stephen Remias, Jonathan Waddell

Abstract:

Extracting information from new data sources has emerged as a crucial task in many traffic planning processes, such as identifying traffic patterns, route planning, traffic forecasting, and locating infrastructure improvements. Given the advanced technologies used to collect Global Positioning System (GPS) data from dedicated GPS devices, GPS equipped phones, and navigation tools, intelligent data analysis methodologies are necessary to mine this raw data. In this research, an automatic detection framework is proposed to help identify and classify the locations of stopped GPS waypoints into two main categories: signalized intersections or highway congestion. The Delaunay triangulation is used to perform this assessment in the clustering phase. While most of the existing clustering algorithms need assumptions about the data distribution, the effectiveness of the Delaunay triangulation relies on triangulating geographical data points without such assumptions. Our proposed method starts by cleaning noise from the data and normalizing it. Next, the framework will identify stoppage points by calculating the traveled distance. The last step is to use clustering to form groups of waypoints for signalized traffic and highway congestion. Next, a binary classifier was applied to find distinguish highway congestion from signalized stop points. The binary classifier uses the length of the cluster to find congestion. The proposed framework shows high accuracy for identifying the stop positions and congestion points in around 99.2% of trials. We show that it is possible, using limited GPS data, to distinguish with high accuracy.

Keywords: Delaunay triangulation, clustering, intelligent transportation systems, GPS data

Procedia PDF Downloads 251
19404 Dynamic Modeling of the Exchange Rate in Tunisia: Theoretical and Empirical Study

Authors: Chokri Slim

Abstract:

The relative failure of simultaneous equation models in the seventies has led researchers to turn to other approaches that take into account the dynamics of economic and financial systems. In this paper, we use an approach based on vector autoregressive model that is widely used in recent years. Their popularity is due to their flexible nature and ease of use to produce models with useful descriptive characteristics. It is also easy to use them to test economic hypotheses. The standard econometric techniques assume that the series studied are stable over time (stationary hypothesis). Most economic series do not verify this hypothesis, which assumes, when one wishes to study the relationships that bind them to implement specific techniques. This is cointegration which characterizes non-stationary series (integrated) with a linear combination is stationary, will also be presented in this paper. Since the work of Johansen, this approach is generally presented as part of a multivariate analysis and to specify long-term stable relationships while at the same time analyzing the short-term dynamics of the variables considered. In the empirical part, we have applied these concepts to study the dynamics of of the exchange rate in Tunisia, which is one of the most important economic policy of a country open to the outside. According to the results of the empirical study by the cointegration method, there is a cointegration relationship between the exchange rate and its determinants. This relationship shows that the variables have a significant influence in determining the exchange rate in Tunisia.

Keywords: stationarity, cointegration, dynamic models, causality, VECM models

Procedia PDF Downloads 337
19403 Feature Selection of Personal Authentication Based on EEG Signal for K-Means Cluster Analysis Using Silhouettes Score

Authors: Jianfeng Hu

Abstract:

Personal authentication based on electroencephalography (EEG) signals is one of the important field for the biometric technology. More and more researchers have used EEG signals as data source for biometric. However, there are some disadvantages for biometrics based on EEG signals. The proposed method employs entropy measures for feature extraction from EEG signals. Four type of entropies measures, sample entropy (SE), fuzzy entropy (FE), approximate entropy (AE) and spectral entropy (PE), were deployed as feature set. In a silhouettes calculation, the distance from each data point in a cluster to all another point within the same cluster and to all other data points in the closest cluster are determined. Thus silhouettes provide a measure of how well a data point was classified when it was assigned to a cluster and the separation between them. This feature renders silhouettes potentially well suited for assessing cluster quality in personal authentication methods. In this study, “silhouettes scores” was used for assessing the cluster quality of k-means clustering algorithm is well suited for comparing the performance of each EEG dataset. The main goals of this study are: (1) to represent each target as a tuple of multiple feature sets, (2) to assign a suitable measure to each feature set, (3) to combine different feature sets, (4) to determine the optimal feature weighting. Using precision/recall evaluations, the effectiveness of feature weighting in clustering was analyzed. EEG data from 22 subjects were collected. Results showed that: (1) It is possible to use fewer electrodes (3-4) for personal authentication. (2) There was the difference between each electrode for personal authentication (p<0.01). (3) There is no significant difference for authentication performance among feature sets (except feature PE). Conclusion: The combination of k-means clustering algorithm and silhouette approach proved to be an accurate method for personal authentication based on EEG signals.

Keywords: personal authentication, K-mean clustering, electroencephalogram, EEG, silhouettes

Procedia PDF Downloads 260
19402 Proposing an Algorithm to Cluster Ad Hoc Networks, Modulating Two Levels of Learning Automaton and Nodes Additive Weighting

Authors: Mohammad Rostami, Mohammad Reza Forghani, Elahe Neshat, Fatemeh Yaghoobi

Abstract:

An Ad Hoc network consists of wireless mobile equipment which connects to each other without any infrastructure, using connection equipment. The best way to form a hierarchical structure is clustering. Various methods of clustering can form more stable clusters according to nodes' mobility. In this research we propose an algorithm, which allocates some weight to nodes based on factors, i.e. link stability and power reduction rate. According to the allocated weight in the previous phase, the cellular learning automaton picks out in the second phase nodes which are candidates for being cluster head. In the third phase, learning automaton selects cluster head nodes, member nodes and forms the cluster. Thus, this automaton does the learning from the setting and can form optimized clusters in terms of power consumption and link stability. To simulate the proposed algorithm we have used omnet++4.2.2. Simulation results indicate that newly formed clusters have a longer lifetime than previous algorithms and decrease strongly network overload by reducing update rate.

Keywords: mobile Ad Hoc networks, clustering, learning automaton, cellular automaton, battery power

Procedia PDF Downloads 380
19401 Electroencephalography (EEG) Analysis of Alcoholic and Control Subjects Using Multiscale Permutation Entropy

Authors: Lal Hussain, Wajid Aziz, Sajjad Ahmed Nadeem, Saeed Arif Shah, Abdul Majid

Abstract:

Brain electrical activity as reflected in Electroencephalography (EEG) have been analyzed and diagnosed using various techniques. Among them, complexity measure, nonlinearity, disorder, and unpredictability play vital role due to the nonlinear interconnection between functional and anatomical subsystem emerged in brain in healthy state and during various diseases. There are many social and economical issues of alcoholic abuse as memory weakness, decision making, impairments, and concentrations etc. Alcoholism not only defect the brains but also associated with emotional, behavior, and cognitive impairments damaging the white and gray brain matters. A recently developed signal analysis method i.e. Multiscale Permutation Entropy (MPE) is proposed to estimate the complexity of long-range temporal correlation time series EEG of Alcoholic and Control subjects acquired from University of California Machine Learning repository and results are compared with MSE. Using MPE, coarsed grained series is first generated and the PE is computed for each coarsed grained time series against the electrodes O1, O2, C3, C4, F2, F3, F4, F7, F8, Fp1, Fp2, P3, P4, T7, and T8. The results computed against each electrode using MPE gives higher significant values as compared to MSE as well as mean rank differences accordingly. Likewise, ROC and Area under the ROC also gives higher separation against each electrode using MPE in comparison to MSE.

Keywords: electroencephalogram (EEG), multiscale permutation entropy (MPE), multiscale sample entropy (MSE), permutation entropy (PE), mann whitney test (MMT), receiver operator curve (ROC), complexity measure

Procedia PDF Downloads 464
19400 A Local Tensor Clustering Algorithm to Annotate Uncharacterized Genes with Many Biological Networks

Authors: Paul Shize Li, Frank Alber

Abstract:

A fundamental task of clinical genomics is to unravel the functions of genes and their associations with disorders. Although experimental biology has made efforts to discover and elucidate the molecular mechanisms of individual genes in the past decades, still about 40% of human genes have unknown functions, not to mention the diseases they may be related to. For those biologists who are interested in a particular gene with unknown functions, a powerful computational method tailored for inferring the functions and disease relevance of uncharacterized genes is strongly needed. Studies have shown that genes strongly linked to each other in multiple biological networks are more likely to have similar functions. This indicates that the densely connected subgraphs in multiple biological networks are useful in the functional and phenotypic annotation of uncharacterized genes. Therefore, in this work, we have developed an integrative network approach to identify the frequent local clusters, which are defined as those densely connected subgraphs that frequently occur in multiple biological networks and consist of the query gene that has few or no disease or function annotations. This is a local clustering algorithm that models multiple biological networks sharing the same gene set as a three-dimensional matrix, the so-called tensor, and employs the tensor-based optimization method to efficiently find the frequent local clusters. Specifically, massive public gene expression data sets that comprehensively cover dynamic, physiological, and environmental conditions are used to generate hundreds of gene co-expression networks. By integrating these gene co-expression networks, for a given uncharacterized gene that is of biologist’s interest, the proposed method can be applied to identify the frequent local clusters that consist of this uncharacterized gene. Finally, those frequent local clusters are used for function and disease annotation of this uncharacterized gene. This local tensor clustering algorithm outperformed the competing tensor-based algorithm in both module discovery and running time. We also demonstrated the use of the proposed method on real data of hundreds of gene co-expression data and showed that it can comprehensively characterize the query gene. Therefore, this study provides a new tool for annotating the uncharacterized genes and has great potential to assist clinical genomic diagnostics.

Keywords: local tensor clustering, query gene, gene co-expression network, gene annotation

Procedia PDF Downloads 109
19399 Consumer Load Profile Determination with Entropy-Based K-Means Algorithm

Authors: Ioannis P. Panapakidis, Marios N. Moschakis

Abstract:

With the continuous increment of smart meter installations across the globe, the need for processing of the load data is evident. Clustering-based load profiling is built upon the utilization of unsupervised machine learning tools for the purpose of formulating the typical load curves or load profiles. The most commonly used algorithm in the load profiling literature is the K-means. While the algorithm has been successfully tested in a variety of applications, its drawback is the strong dependence in the initialization phase. This paper proposes a novel modified form of the K-means that addresses the aforementioned problem. Simulation results indicate the superiority of the proposed algorithm compared to the K-means.

Keywords: clustering, load profiling, load modeling, machine learning, energy efficiency and quality

Procedia PDF Downloads 139
19398 Empirical Investigation into Climate Change and Climate-Smart Agriculture for Food Security in Nigeria

Authors: J. Julius Adebayo

Abstract:

The objective of this paper is to assess the agro-climatic condition of Ibadan in the rain forest ecological zone of Nigeria, using rainfall pattern and temperature between 1978-2018. Data on rainfall and temperature in Ibadan, Oyo State for a period of 40 years were obtained from Meteorological Section of Forestry Research Institute of Nigeria, Ibadan and Oyo State Meteorology Centre. Time series analysis was employed to analyze the data. The trend revealed that rainfall is decreasing slowly and temperature is averagely increasing year after year. The model for rainfall and temperature are Yₜ = 1454.11-8*t and Yₜ = 31.5995 + 2.54 E-02*t respectively, where t is the time. On this basis, a forecast of 20 years (2019-2038) was generated, and the results showed a further downward trend on rainfall and upward trend in temperature, this indicates persistence rainfall shortage and very hot weather for agricultural practices in the southwest rain forest ecological zone. Suggestions on possible solutions to avert climate change crisis and also promote climate-smart agriculture for sustainable food and nutrition security were also discussed.

Keywords: climate change, rainfall pattern, temperature, time series analysis, food and nutrition security

Procedia PDF Downloads 114
19397 A Spatial Information Network Traffic Prediction Method Based on Hybrid Model

Authors: Jingling Li, Yi Zhang, Wei Liang, Tao Cui, Jun Li

Abstract:

Compared with terrestrial network, the traffic of spatial information network has both self-similarity and short correlation characteristics. By studying its traffic prediction method, the resource utilization of spatial information network can be improved, and the method can provide an important basis for traffic planning of a spatial information network. In this paper, considering the accuracy and complexity of the algorithm, the spatial information network traffic is decomposed into approximate component with long correlation and detail component with short correlation, and a time series hybrid prediction model based on wavelet decomposition is proposed to predict the spatial network traffic. Firstly, the original traffic data are decomposed to approximate components and detail components by using wavelet decomposition algorithm. According to the autocorrelation and partial correlation smearing and truncation characteristics of each component, the corresponding model (AR/MA/ARMA) of each detail component can be directly established, while the type of approximate component modeling can be established by ARIMA model after smoothing. Finally, the prediction results of the multiple models are fitted to obtain the prediction results of the original data. The method not only considers the self-similarity of a spatial information network, but also takes into account the short correlation caused by network burst information, which is verified by using the measured data of a certain back bone network released by the MAWI working group in 2018. Compared with the typical time series model, the predicted data of hybrid model is closer to the real traffic data and has a smaller relative root means square error, which is more suitable for a spatial information network.

Keywords: spatial information network, traffic prediction, wavelet decomposition, time series model

Procedia PDF Downloads 115
19396 Time Series Modelling for Forecasting Wheat Production and Consumption of South Africa in Time of War

Authors: Yiseyon Hosu, Joseph Akande

Abstract:

Wheat is one of the most important staple food grains of human for centuries and is largely consumed in South Africa. It has a special place in the South African economy because of its significance in food security, trade, and industry. This paper modelled and forecast the production and consumption of wheat in South Africa in the time covid-19 and the ongoing Russia-Ukraine war by using annual time series data from 1940–2021 based on the ARIMA models. Both the averaging forecast and selected models forecast indicate that there is the possibility of an increase with respect to production. The minimum and maximum growth in production is projected to be between 3million and 10 million tons, respectively. However, the model also forecast a possibility of depression with respect to consumption in South Africa. Although Covid-19 and the war between Ukraine and Russia, two major producers and exporters of global wheat, are having an effect on the volatility of the prices currently, the wheat production in South African is expected to increase and meat the consumption demand and provided an opportunity for increase export with respect to domestic consumption. The forecasting of production and consumption behaviours of major crops play an important role towards food and nutrition security, these findings can assist policymakers and will provide them with insights into the production and pricing policy of wheat in South Africa.

Keywords: ARIMA, food security, price volatility, staple food, South Africa

Procedia PDF Downloads 73
19395 A Comparative Study of Substituted Li Ferrites Sintered by the Conventional and Microwave Sintering Technique

Authors: Ibetombi Soibam

Abstract:

Li-Zn-Ni ferrite having the compositional formula Li0.4-0.5xZn0.2NixFe2.4-0.5xO4 where x = 0.02 ≤ x ≤0.1 in steps of 0.02 was fabricated by the citrate precursor method. In this method, metal nitrates and citric acid was used to prepare the gel which exhibit self-propagating combustion behavior giving the required ferrite sample. The ferrite sample was given a pre-firing at 650°C in a programmable conventional furnace for 3 hours with a heating rate of 5°C/min. A series of the sample was finally given conventional sintering (CS) at 1040°C after the pre-firing process. Another series was given microwave sintering (MS) at 1040°C in a programmable microwave furnace which uses a single magnetron operating at 2.45 GHz frequency. X- ray diffraction pattern confirmed the spinel phase structure for both the series. The theoretical and experimental density was calculated. It was observed that densification increases with the increase in Ni concentration in both the series. However, samples sintered by microwave technique was found to be denser. The microstructure of the two series of the sample was examined using scanning electron microscopy (SEM). Dielectric properties have been investigated as a function of frequency and composition for both series of samples sintered by CS and MS technique. The variation of dielectric constant with frequency show dispersion for both the series. It was explained in terms of Koop’s two layer model. From the analysis of dielectric measurement, it was observed that the value of room temperature dielectric constant decreases with the increase in Ni concentration for both the series. The microwave sintered samples show a lower dielectric constant making microwave sintering suitable for high-frequency applications. The possible mechanisms contributing to all the above behavior is being discussed.

Keywords: citrate precursor, dielectric constant, ferrites, microwave sintering

Procedia PDF Downloads 382
19394 Empirical Study of Partitions Similarity Measures

Authors: Abdelkrim Alfalah, Lahcen Ouarbya, John Howroyd

Abstract:

This paper investigates and compares the performance of four existing distances and similarity measures between partitions. The partition measures considered are Rand Index (RI), Adjusted Rand Index (ARI), Variation of Information (VI), and Normalised Variation of Information (NVI). This work investigates the ability of these partition measures to capture three predefined intuitions: the variation within randomly generated partitions, the sensitivity to small perturbations, and finally the independence from the dataset scale. It has been shown that the Adjusted Rand Index performed well overall, with regards to these three intuitions.

Keywords: clustering, comparing partitions, similarity measure, partition distance, partition metric, similarity between partitions, clustering comparison.

Procedia PDF Downloads 159
19393 A Study on Changing of Energy-Saving Performance of GHP Air Conditioning System with Time-Series Variation

Authors: Ying Xin, Shigeki Kametani

Abstract:

This paper deals the energy saving performance of GHP (Gas engine heat pump) air conditioning system has improved with time-series variation. There are two types of air conditioning systems, VRF (Variable refrigerant flow) and central cooling and heating system. VRF is classified as EHP (Electric driven heat pump) and GHP. EHP drives the compressor with electric motor. GHP drives the compressor with the gas engine. The electric consumption of GHP is less than one tenth of EHP does. In this study, the energy consumption data of GHP installed the junior high schools was collected. An annual and monthly energy consumption per rated thermal output power of each apparatus was calculated, and then their energy efficiency was analyzed. From these data, we investigated improvement of the energy saving of the GHP air conditioning system by the change in the generation.

Keywords: energy-saving, variable refrigerant flow, gas engine heat pump, electric driven heat pump, air conditioning system

Procedia PDF Downloads 270