Search results for: pseudo-panel data method
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 36982

Search results for: pseudo-panel data method

36892 Human-Centred Data Analysis Method for Future Design of Residential Spaces: Coliving Case Study

Authors: Alicia Regodon Puyalto, Alfonso Garcia-Santos

Abstract:

This article presents a method to analyze the use of indoor spaces based on data analytics obtained from inbuilt digital devices. The study uses the data generated by the in-place devices, such as smart locks, Wi-Fi routers, and electrical sensors, to gain additional insights on space occupancy, user behaviour, and comfort. Those devices, originally installed to facilitate remote operations, report data through the internet that the research uses to analyze information on human real-time use of spaces. Using an in-place Internet of Things (IoT) network enables a faster, more affordable, seamless, and scalable solution to analyze building interior spaces without incorporating external data collection systems such as sensors. The methodology is applied to a real case study of coliving, a residential building of 3000m², 7 floors, and 80 users in the centre of Madrid. The case study applies the method to classify IoT devices, assess, clean, and analyze collected data based on the analysis framework. The information is collected remotely, through the different platforms devices' platforms; the first step is to curate the data, understand what insights can be provided from each device according to the objectives of the study, this generates an analysis framework to be escalated for future building assessment even beyond the residential sector. The method will adjust the parameters to be analyzed tailored to the dataset available in the IoT of each building. The research demonstrates how human-centered data analytics can improve the future spatial design of indoor spaces.

Keywords: in-place devices, IoT, human-centred data-analytics, spatial design

Procedia PDF Downloads 173
36891 Applying Different Stenography Techniques in Cloud Computing Technology to Improve Cloud Data Privacy and Security Issues

Authors: Muhammad Muhammad Suleiman

Abstract:

Cloud Computing is a versatile concept that refers to a service that allows users to outsource their data without having to worry about local storage issues. However, the most pressing issues to be addressed are maintaining a secure and reliable data repository rather than relying on untrustworthy service providers. In this study, we look at how stenography approaches and collaboration with Digital Watermarking can greatly improve the system's effectiveness and data security when used for Cloud Computing. The main requirement of such frameworks, where data is transferred or exchanged between servers and users, is safe data management in cloud environments. Steganography is the cloud is among the most effective methods for safe communication. Steganography is a method of writing coded messages in such a way that only the sender and recipient can safely interpret and display the information hidden in the communication channel. This study presents a new text steganography method for hiding a loaded hidden English text file in a cover English text file to ensure data protection in cloud computing. Data protection, data hiding capability, and time were all improved using the proposed technique.

Keywords: cloud computing, steganography, information hiding, cloud storage, security

Procedia PDF Downloads 165
36890 Determination of the Risks of Heart Attack at the First Stage as Well as Their Control and Resource Planning with the Method of Data Mining

Authors: İbrahi̇m Kara, Seher Arslankaya

Abstract:

Frequently preferred in the field of engineering in particular, data mining has now begun to be used in the field of health as well since the data in the health sector have reached great dimensions. With data mining, it is aimed to reveal models from the great amounts of raw data in agreement with the purpose and to search for the rules and relationships which will enable one to make predictions about the future from the large amount of data set. It helps the decision-maker to find the relationships among the data which form at the stage of decision-making. In this study, it is aimed to determine the risk of heart attack at the first stage, to control it, and to make its resource planning with the method of data mining. Through the early and correct diagnosis of heart attacks, it is aimed to reveal the factors which affect the diseases, to protect health and choose the right treatment methods, to reduce the costs in health expenditures, and to shorten the durations of patients’ stay at hospitals. In this way, the diagnosis and treatment costs of a heart attack will be scrutinized, which will be useful to determine the risk of the disease at the first stage, to control it, and to make its resource planning.

Keywords: data mining, decision support systems, heart attack, health sector

Procedia PDF Downloads 332
36889 Study of Evapotranspiration for Pune District

Authors: Ranjeet Sable, Mahotsavi Patil, Aadesh Nimbalkar, Prajakta Palaskar, Ritu Sagar

Abstract:

The exact amount of water used by various crops in different climatic conditions is necessary to step for design, planning, and management of irrigation schemes, water resources, scheduling of irrigation systems. Evaporation and transpiration are combinable called as evapotranspiration. Water loss from trees during photosynthesis is called as transpiration and when water gets converted into gaseous state is called evaporation. For calculation of correct evapotranspiration, we have to choose the method in such way that is should be suitable and require minimum climatic data also it should be applicable for wide range of climatic conditions. In hydrology, there are multiple correlations and regression is generally used to develop relationships between three or more hydrological variables by knowing the dependence between them. This research work includes the study of various methods for calculation of evapotranspiration and selects reasonable and suitable one Pune region (Maharashtra state). As field methods are very costly, time-consuming and not give appropriate results if the suitable climate is not maintained. Observation recorded at Pune metrological stations are used to calculate evapotranspiration with the help of Radiation Method (RAD), Modified Penman Method (MPM), Thornthwaite Method (THW), Blaney-Criddle (BCL), Christiansen Equation (CNM), Hargreaves Method (HGM), from which Hargreaves and Thornthwaite are temperature based methods. Performance of all these methods are compared with Modified Penman method and method which showing less variation with standard Modified Penman method (MPM) is selected as the suitable one. Evapotranspiration values are estimated on a monthly basis. Comparative analysis in this research used for selection for raw data-dependent methods in case of missing data.

Keywords: Blaney-Criddle, Christiansen equation evapotranspiration, Hargreaves method, precipitations, Penman method, water use efficiency

Procedia PDF Downloads 247
36888 Prediction of Anticancer Potential of Curcumin Nanoparticles by Means of Quasi-Qsar Analysis Using Monte Carlo Method

Authors: Ruchika Goyal, Ashwani Kumar, Sandeep Jain

Abstract:

The experimental data for anticancer potential of curcumin nanoparticles was calculated by means of eclectic data. The optimal descriptors were examined using Monte Carlo method based CORAL SEA software. The statistical quality of the model is following: n = 14, R² = 0.6809, Q² = 0.5943, s = 0.175, MAE = 0.114, F = 26 (sub-training set), n =5, R²= 0.9529, Q² = 0.7982, s = 0.086, MAE = 0.068, F = 61, Av Rm² = 0.7601, ∆R²m = 0.0840, k = 0.9856 and kk = 1.0146 (test set) and n = 5, R² = 0.6075 (validation set). This data can be used to build predictive QSAR models for anticancer activity.

Keywords: anticancer potential, curcumin, model, nanoparticles, optimal descriptors, QSAR

Procedia PDF Downloads 289
36887 Improving Similarity Search Using Clustered Data

Authors: Deokho Kim, Wonwoo Lee, Jaewoong Lee, Teresa Ng, Gun-Ill Lee, Jiwon Jeong

Abstract:

This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.

Keywords: visual search, deep learning, convolutional neural network, machine learning

Procedia PDF Downloads 190
36886 Concept for Knowledge out of Sri Lankan Non-State Sector: Performances of Higher Educational Institutes and Successes of Its Sector

Authors: S. Jeyarajan

Abstract:

Concept of knowledge is discovered from conducted study for successive Competition in Sri Lankan Non-State Higher Educational Institutes. The Concept discovered out of collected Knowledge Management Practices from Emerald inside likewise reputed literatures and of Non-State Higher Educational sector. A test is conducted to reveal existences and its reason behind of these collected practices in Sri Lankan Non-State Higher Education Institutes. Further, unavailability of such study and uncertain on number of participants for data collection in the Sri Lankan context contributed selection of research method as qualitative method, which used attributes of Delphi Method to manage those likewise uncertainty. Data are collected under Dramaturgical Method, which contributes efficient usage of the Delphi method. Grounded theory is selected as data analysis techniques, which is conducted in intermixed discourse to manage different perspectives of data that are collected systematically through perspective and modified snowball sampling techniques. Data are then analysed using Grounded Theory Development Techniques in Intermix discourses to manage differences in Data. Consequently, Agreement in the results of Grounded theories and of finding in the Foreign Study is discovered in the analysis whereas present study conducted as Qualitative Research and The Foreign Study conducted as Quantitative Research. As such, the Present study widens the discovery in the Foreign Study. Further, having discovered reason behind of the existences, the Present result shows Concept for Knowledge from Sri Lankan Non-State sector to manage higher educational Institutes in successful manner.

Keywords: adherence of snowball sampling into perspective sampling, Delphi method in qualitative method, grounded theory development in intermix discourses of analysis, knowledge management for success of higher educational institutes

Procedia PDF Downloads 151
36885 Towards a Balancing Medical Database by Using the Least Mean Square Algorithm

Authors: Kamel Belammi, Houria Fatrim

Abstract:

imbalanced data set, a problem often found in real world application, can cause seriously negative effect on classification performance of machine learning algorithms. There have been many attempts at dealing with classification of imbalanced data sets. In medical diagnosis classification, we often face the imbalanced number of data samples between the classes in which there are not enough samples in rare classes. In this paper, we proposed a learning method based on a cost sensitive extension of Least Mean Square (LMS) algorithm that penalizes errors of different samples with different weight and some rules of thumb to determine those weights. After the balancing phase, we applythe different classifiers (support vector machine (SVM), k- nearest neighbor (KNN) and multilayer neuronal networks (MNN)) for balanced data set. We have also compared the obtained results before and after balancing method.

Keywords: multilayer neural networks, k- nearest neighbor, support vector machine, imbalanced medical data, least mean square algorithm, diabetes

Procedia PDF Downloads 501
36884 Exploring the Capabilities of Sentinel-1A and Sentinel-2A Data for Landslide Mapping

Authors: Ismayanti Magfirah, Sartohadi Junun, Samodra Guruh

Abstract:

Landslides are one of the most frequent and devastating natural disasters in Indonesia. Many studies have been conducted regarding this phenomenon. However, there is a lack of attention in the landslide inventory mapping. The natural condition (dense forest area) and the limited human and economic resources are some of the major problems in building landslide inventory in Indonesia. Considering the importance of landslide inventory data in susceptibility, hazard, and risk analysis, it is essential to generate landslide inventory based on available resources. In order to achieve this, the first thing we have to do is identify the landslides' location. The presence of Sentinel-1A and Sentinel-2A data gives new insights into land monitoring investigation. The free access, high spatial resolution, and short revisit time, make the data become one of the most trending open sources data used in landslide mapping. Sentinel-1A and Sentinel-2A data have been used broadly for landslide detection and landuse/landcover mapping. This study aims to generate landslide map by integrating Sentinel-1A and Sentinel-2A data use change detection method. The result will be validated by field investigation to make preliminary landslide inventory in the study area.

Keywords: change detection method, landslide inventory mapping, Sentinel-1A, Sentinel-2A

Procedia PDF Downloads 141
36883 Privacy Preserving Data Publishing Based on Sensitivity in Context of Big Data Using Hive

Authors: P. Srinivasa Rao, K. Venkatesh Sharma, G. Sadhya Devi, V. Nagesh

Abstract:

Privacy Preserving Data Publication is the main concern in present days because the data being published through the internet has been increasing day by day. This huge amount of data was named as Big Data by its size. This project deals the privacy preservation in the context of Big Data using a data warehousing solution called hive. We implemented Nearest Similarity Based Clustering (NSB) with Bottom-up generalization to achieve (v,l)-anonymity. (v,l)-Anonymity deals with the sensitivity vulnerabilities and ensures the individual privacy. We also calculate the sensitivity levels by simple comparison method using the index values, by classifying the different levels of sensitivity. The experiments were carried out on the hive environment to verify the efficiency of algorithms with Big Data. This framework also supports the execution of existing algorithms without any changes. The model in the paper outperforms than existing models.

Keywords: sensitivity, sensitive level, clustering, Privacy Preserving Data Publication (PPDP), bottom-up generalization, Big Data

Procedia PDF Downloads 264
36882 Missing Link Data Estimation with Recurrent Neural Network: An Application Using Speed Data of Daegu Metropolitan Area

Authors: JaeHwan Yang, Da-Woon Jeong, Seung-Young Kho, Dong-Kyu Kim

Abstract:

In terms of ITS, information on link characteristic is an essential factor for plan or operation. But in practical cases, not every link has installed sensors on it. The link that does not have data on it is called “Missing Link”. The purpose of this study is to impute data of these missing links. To get these data, this study applies the machine learning method. With the machine learning process, especially for the deep learning process, missing link data can be estimated from present link data. For deep learning process, this study uses “Recurrent Neural Network” to take time-series data of road. As input data, Dedicated Short-range Communications (DSRC) data of Dalgubul-daero of Daegu Metropolitan Area had been fed into the learning process. Neural Network structure has 17 links with present data as input, 2 hidden layers, for 1 missing link data. As a result, forecasted data of target link show about 94% of accuracy compared with actual data.

Keywords: data estimation, link data, machine learning, road network

Procedia PDF Downloads 487
36881 Research on Straightening Process Model Based on Iteration and Self-Learning

Authors: Hong Lu, Xiong Xiao

Abstract:

Shaft parts are widely used in machinery industry, however, bending deformation often occurred when this kind of parts is being heat treated. This parts needs to be straightened to meet the requirement of straightness. As for the pressure straightening process, a good straightening stroke algorithm is related to the precision and efficiency of straightening process. In this paper, the relationship between straightening load and deflection during the straightening process is analyzed, and the mathematical model of the straightening process has been established. By the mathematical model, the iterative method is used to solve the straightening stroke. Compared to the traditional straightening stroke algorithm, straightening stroke calculated by this method is much more precise; because it can adapt to the change of material performance parameters. Considering that the straightening method is widely used in the mass production of the shaft parts, knowledge base is used to store the data of the straightening process, and a straightening stroke algorithm based on empirical data is set up. In this paper, the straightening process control model which combine the straightening stroke method based on iteration and straightening stroke algorithm based on empirical data has been set up. Finally, an experiment has been designed to verify the straightening process control model.

Keywords: straightness, straightening stroke, deflection, shaft parts

Procedia PDF Downloads 303
36880 Modification Encryption Time and Permutation in Advanced Encryption Standard Algorithm

Authors: Dalal N. Hammod, Ekhlas K. Gbashi

Abstract:

Today, cryptography is used in many applications to achieve high security in data transmission and in real-time communications. AES has long gained global acceptance and is used for securing sensitive data in various industries but has suffered from slow processing and take a large time to transfer data. This paper suggests a method to enhance Advance Encryption Standard (AES) Algorithm based on time and permutation. The suggested method (MAES) is based on modifying the SubByte and ShiftRrows in the encryption part and modification the InvSubByte and InvShiftRows in the decryption part. After the implementation of the proposal and testing the results, the Modified AES achieved good results in accomplishing the communication with high performance criteria in terms of randomness, encryption time, storage space, and avalanche effects. The proposed method has good randomness to ciphertext because this method passed NIST statistical tests against attacks; also, (MAES) reduced the encryption time by (10 %) than the time of the original AES; therefore, the modified AES is faster than the original AES. Also, the proposed method showed good results in memory utilization where the value is (54.36) for the MAES, but the value for the original AES is (66.23). Also, the avalanche effects used for calculating diffusion property are (52.08%) for the modified AES and (51.82%) percentage for the original AES.

Keywords: modified AES, randomness test, encryption time, avalanche effects

Procedia PDF Downloads 219
36879 Improved K-Means Clustering Algorithm Using RHadoop with Combiner

Authors: Ji Eun Shin, Dong Hoon Lim

Abstract:

Data clustering is a common technique used in data analysis and is used in many applications, such as artificial intelligence, pattern recognition, economics, ecology, psychiatry and marketing. K-means clustering is a well-known clustering algorithm aiming to cluster a set of data points to a predefined number of clusters. In this paper, we implement K-means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. The main idea is to introduce a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. The experimental results demonstrated that K-means algorithm using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also showed that our K-means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases.

Keywords: big data, combiner, K-means clustering, RHadoop

Procedia PDF Downloads 403
36878 Studies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise — in the Case of Critical Dataset Size —

Authors: Tetsuro Saeki, Yuichi Kato, Shoutarou Mizuno

Abstract:

STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to realworld data.

Keywords: rule induction, decision table, missing data, noise

Procedia PDF Downloads 368
36877 Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency

Authors: Semeh Ben Salem, Sami Naouali, Moetez Sallami

Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Keywords: clustering, unsupervised learning, pattern recognition, categorical datasets, knowledge discovery, k-means

Procedia PDF Downloads 236
36876 Generation of Numerical Data for the Facilitation of the Personalized Hyperthermic Treatment of Cancer with An Interstital Antenna Array Using the Method of Symmetrical Components

Authors: Prodromos E. Atlamazoglou

Abstract:

The method of moments combined with the method of symmetrical components is used for the analysis of interstitial hyperthermia applicators. The basis and testing functions are both piecewise sinusoids, qualifying our technique as a Galerkin one. The dielectric coatings are modeled by equivalent volume polarization currents, which are simply related to the conduction current distribution, avoiding in that way the introduction of additional unknowns or numerical integrations. The results of our method for a four dipole circular array, are in agreement with those already published in literature for a same hyperthermia configuration. Apart from being accurate, our approach is more general, more computationally efficient and takes into account the coupling between the antennas.

Keywords: hyperthermia, integral equations, insulated antennas, method of symmetrical components

Procedia PDF Downloads 238
36875 Improvement of Parallel Compressor Model in Dealing Outlet Unequal Pressure Distribution

Authors: Kewei Xu, Jens Friedrich, Kevin Dwinger, Wei Fan, Xijin Zhang

Abstract:

Parallel Compressor Model (PCM) is a simplified approach to predict compressor performance with inlet distortions. In PCM calculation, it is assumed that the sub-compressors’ outlet static pressure is uniform and therefore simplifies PCM calculation procedure. However, if the compressor’s outlet duct is not long and straight, such assumption frequently induces error ranging from 10% to 15%. This paper provides a revised calculation method of PCM that can correct the error. The revised method employs energy equation, momentum equation and continuity equation to acquire needed parameters and replace the equal static pressure assumption. Based on the revised method, PCM is applied on two compression system with different blades types. The predictions of their performance in non-uniform inlet conditions are yielded through the revised calculation method and are employed to evaluate the method’s efficiency. Validating the results by experimental data, it is found that although little deviation occurs, calculated result agrees well with experiment data whose error ranges from 0.1% to 3%. Therefore, this proves the revised calculation method of PCM possesses great advantages in predicting the performance of the distorted compressor with limited exhaust duct.

Keywords: parallel compressor model (pcm), revised calculation method, inlet distortion, outlet unequal pressure distribution

Procedia PDF Downloads 308
36874 Socratic Style of Teaching: An Analysis of Dialectical Method

Authors: Muhammad Jawwad, Riffat Iqbal

Abstract:

The Socratic method, also known as the dialectical method and elenctic method, has significant relevance in the contemporary educational system. It can be incorporated into modern-day educational systems theoretically as well as practically. Being interactive and dialogue-based in nature, this teaching approach is followed by critical thinking and innovation. The pragmatic value of the Dialectical Method has been discussed in this article, and the limitations of the Socratic method have also been highlighted. The interactive Method of Socrates can be used in many subjects for students of different grades. The Limitations and delimitations of the Method have also been discussed for its proper implementation. This article has attempted to elaborate and analyze the teaching method of Socrates with all its pre-suppositions and Epistemological character.

Keywords: Socratic method, dialectical method, knowledge, teaching, virtue

Procedia PDF Downloads 108
36873 Spatially Random Sampling for Retail Food Risk Factors Study

Authors: Guilan Huang

Abstract:

In 2013 and 2014, the U.S. Food and Drug Administration (FDA) collected data from selected fast food restaurants and full service restaurants for tracking changes in the occurrence of foodborne illness risk factors. This paper discussed how we customized spatial random sampling method by considering financial position and availability of FDA resources, and how we enriched restaurants data with location. Location information of restaurants provides opportunity for quantitatively determining random sampling within non-government units (e.g.: 240 kilometers around each data-collector). Spatial analysis also could optimize data-collectors’ work plans and resource allocation. Spatial analytic and processing platform helped us handling the spatial random sampling challenges. Our method fits in FDA’s ability to pinpoint features of foodservice establishments, and reduced both time and expense on data collection.

Keywords: geospatial technology, restaurant, retail food risk factor study, spatially random sampling

Procedia PDF Downloads 324
36872 Hierarchical Filtering Method of Threat Alerts Based on Correlation Analysis

Authors: Xudong He, Jian Wang, Jiqiang Liu, Lei Han, Yang Yu, Shaohua Lv

Abstract:

Nowadays, the threats of the internet are enormous and increasing; however, the classification of huge alert messages generated in this environment is relatively monotonous. It affects the accuracy of the network situation assessment, and also brings inconvenience to the security managers to deal with the emergency. In order to deal with potential network threats effectively and provide more effective data to improve the network situation awareness. It is essential to build a hierarchical filtering method to prevent the threats. In this paper, it establishes a model for data monitoring, which can filter systematically from the original data to get the grade of threats and be stored for using again. Firstly, it filters the vulnerable resources, open ports of host devices and services. Then use the entropy theory to calculate the performance changes of the host devices at the time of the threat occurring and filter again. At last, sort the changes of the performance value at the time of threat occurring. Use the alerts and performance data collected in the real network environment to evaluate and analyze. The comparative experimental analysis shows that the threat filtering method can effectively filter the threat alerts effectively.

Keywords: correlation analysis, hierarchical filtering, multisource data, network security

Procedia PDF Downloads 178
36871 Monthly River Flow Prediction Using a Nonlinear Prediction Method

Authors: N. H. Adenan, M. S. M. Noorani

Abstract:

River flow prediction is an essential to ensure proper management of water resources can be optimally distribute water to consumers. This study presents an analysis and prediction by using nonlinear prediction method involving monthly river flow data in Tanjung Tualang from 1976 to 2006. Nonlinear prediction method involves the reconstruction of phase space and local linear approximation approach. The phase space reconstruction involves the reconstruction of one-dimensional (the observed 287 months of data) in a multidimensional phase space to reveal the dynamics of the system. Revenue of phase space reconstruction is used to predict the next 72 months. A comparison of prediction performance based on correlation coefficient (CC) and root mean square error (RMSE) have been employed to compare prediction performance for nonlinear prediction method, ARIMA and SVM. Prediction performance comparisons show the prediction results using nonlinear prediction method is better than ARIMA and SVM. Therefore, the result of this study could be used to developed an efficient water management system to optimize the allocation water resources.

Keywords: river flow, nonlinear prediction method, phase space, local linear approximation

Procedia PDF Downloads 387
36870 A Minimum Spanning Tree-Based Method for Initializing the K-Means Clustering Algorithm

Authors: J. Yang, Y. Ma, X. Zhang, S. Li, Y. Zhang

Abstract:

The traditional k-means algorithm has been widely used as a simple and efficient clustering method. However, the algorithm often converges to local minima for the reason that it is sensitive to the initial cluster centers. In this paper, an algorithm for selecting initial cluster centers on the basis of minimum spanning tree (MST) is presented. The set of vertices in MST with same degree are regarded as a whole which is used to find the skeleton data points. Furthermore, a distance measure between the skeleton data points with consideration of degree and Euclidean distance is presented. Finally, MST-based initialization method for the k-means algorithm is presented, and the corresponding time complexity is analyzed as well. The presented algorithm is tested on five data sets from the UCI Machine Learning Repository. The experimental results illustrate the effectiveness of the presented algorithm compared to three existing initialization methods.

Keywords: degree, initial cluster center, k-means, minimum spanning tree

Procedia PDF Downloads 380
36869 Canopy Temperature Acquired from Daytime and Nighttime Aerial Data as an Indicator of Trees’ Health Status

Authors: Agata Zakrzewska, Dominik Kopeć, Adrian Ochtyra

Abstract:

The growing number of new cameras, sensors, and research methods allow for a broader application of thermal data in remote sensing vegetation studies. The aim of this research was to check whether it is possible to use thermal infrared data with a spectral range (3.6-4.9 μm) obtained during the day and the night to assess the health condition of selected species of deciduous trees in an urban environment. For this purpose, research was carried out in the city center of Warsaw (Poland) in 2020. During the airborne data acquisition, thermal data, laser scanning, and orthophoto map images were collected. Synchronously with airborne data, ground reference data were obtained for 617 studied species (Acer platanoides, Acer pseudoplatanus, Aesculus hippocastanum, Tilia cordata, and Tilia × euchlora) in different health condition states. The results were as follows: (i) healthy trees are cooler than trees in poor condition and dying both in the daytime and nighttime data; (ii) the difference in the canopy temperatures between healthy and dying trees was 1.06oC of mean value on the nighttime data and 3.28oC of mean value on the daytime data; (iii) condition classes significantly differentiate on both daytime and nighttime thermal data, but only on daytime data all condition classes differed statistically significantly from each other. In conclusion, the aerial thermal data can be considered as an alternative to hyperspectral data, a method of assessing the health condition of trees in an urban environment. Especially data obtained during the day, which can differentiate condition classes better than data obtained at night. The method based on thermal infrared and laser scanning data fusion could be a quick and efficient solution for identifying trees in poor health that should be visually checked in the field.

Keywords: middle wave infrared, thermal imagery, tree discoloration, urban trees

Procedia PDF Downloads 91
36868 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.

Keywords: clustering, data analysis, data mining, predictive models

Procedia PDF Downloads 439
36867 Forthcoming Big Data on Smart Buildings and Cities: An Experimental Study on Correlations among Urban Data

Authors: Yu-Mi Song, Sung-Ah Kim, Dongyoun Shin

Abstract:

Cities are complex systems of diverse and inter-tangled activities. These activities and their complex interrelationships create diverse urban phenomena. And such urban phenomena have considerable influences on the lives of citizens. This research aimed to develop a method to reveal the causes and effects among diverse urban elements in order to enable better understanding of urban activities and, therefrom, to make better urban planning strategies. Specifically, this study was conducted to solve a data-recommendation problem found on a Korean public data homepage. First, a correlation analysis was conducted to find the correlations among random urban data. Then, based on the results of that correlation analysis, the weighted data network of each urban data was provided to people. It is expected that the weights of urban data thereby obtained will provide us with insights into cities and show us how diverse urban activities influence each other and induce feedback.

Keywords: big data, machine learning, ontology model, urban data model

Procedia PDF Downloads 388
36866 Detection of Autistic Children's Voice Based on Artificial Neural Network

Authors: Royan Dawud Aldian, Endah Purwanti, Soegianto Soelistiono

Abstract:

In this research we have been developed an automatic investigation to classify normal children voice or autistic by using modern computation technology that is computation based on artificial neural network. The superiority of this computation technology is its capability on processing and saving data. In this research, digital voice features are gotten from the coefficient of linear-predictive coding with auto-correlation method and have been transformed in frequency domain using fast fourier transform, which used as input of artificial neural network in back-propagation method so that will make the difference between normal children and autistic automatically. The result of back-propagation method shows that successful classification capability for normal children voice experiment data is 100% whereas, for autistic children voice experiment data is 100%. The success rate using back-propagation classification system for the entire test data is 100%.

Keywords: autism, artificial neural network, backpropagation, linier predictive coding, fast fourier transform

Procedia PDF Downloads 428
36865 Identifying Critical Success Factors for Data Quality Management through a Delphi Study

Authors: Maria Paula Santos, Ana Lucas

Abstract:

Organizations support their operations and decision making on the data they have at their disposal, so the quality of these data is remarkably important and Data Quality (DQ) is currently a relevant issue, the literature being unanimous in pointing out that poor DQ can result in large costs for organizations. The literature review identified and described 24 Critical Success Factors (CSF) for Data Quality Management (DQM) that were presented to a panel of experts, who ordered them according to their degree of importance, using the Delphi method with the Q-sort technique, based on an online questionnaire. The study shows that the five most important CSF for DQM are: definition of appropriate policies and standards, control of inputs, definition of a strategic plan for DQ, organizational culture focused on quality of the data and obtaining top management commitment and support.

Keywords: critical success factors, data quality, data quality management, Delphi, Q-Sort

Procedia PDF Downloads 192
36864 An AK-Chart for the Non-Normal Data

Authors: Chia-Hau Liu, Tai-Yue Wang

Abstract:

Traditional multivariate control charts assume that measurement from manufacturing processes follows a multivariate normal distribution. However, this assumption may not hold or may be difficult to verify because not all the measurement from manufacturing processes are normal distributed in practice. This study develops a new multivariate control chart for monitoring the processes with non-normal data. We propose a mechanism based on integrating the one-class classification method and the adaptive technique. The adaptive technique is used to improve the sensitivity to small shift on one-class classification in statistical process control. In addition, this design provides an easy way to allocate the value of type I error so it is easier to be implemented. Finally, the simulation study and the real data from industry are used to demonstrate the effectiveness of the propose control charts.

Keywords: multivariate control chart, statistical process control, one-class classification method, non-normal data

Procedia PDF Downloads 396
36863 Ontological Modeling Approach for Statistical Databases Publication in Linked Open Data

Authors: Bourama Mane, Ibrahima Fall, Mamadou Samba Camara, Alassane Bah

Abstract:

At the level of the National Statistical Institutes, there is a large volume of data which is generally in a format which conditions the method of publication of the information they contain. Each household or business data collection project includes a dissemination platform for its implementation. Thus, these dissemination methods previously used, do not promote rapid access to information and especially does not offer the option of being able to link data for in-depth processing. In this paper, we present an approach to modeling these data to publish them in a format intended for the Semantic Web. Our objective is to be able to publish all this data in a single platform and offer the option to link with other external data sources. An application of the approach will be made on data from major national surveys such as the one on employment, poverty, child labor and the general census of the population of Senegal.

Keywords: Semantic Web, linked open data, database, statistic

Procedia PDF Downloads 152