Search results for: genomic data analysis
13380 Urban Big Data: An Experimental Approach to Building-Value Estimation Using Web-Based Data
Authors: Sun-Young Jang, Sung-Ah Kim, Dongyoun Shin
Abstract:
Current real-estate value estimation, difficult for laymen, usually is performed by specialists. This paper presents an automated estimation process based on big data and machine-learning technology that calculates influences of building conditions on real-estate price measurement. The present study analyzed actual building sales sample data for Nonhyeon-dong, Gangnam-gu, Seoul, Korea, measuring the major influencing factors among the various building conditions. Further to that analysis, a prediction model was established and applied using RapidMiner Studio, a graphical user interface (GUI)-based tool for derivation of machine-learning prototypes. The prediction model is formulated by reference to previous examples. When new examples are applied, it analyses and predicts accordingly. The analysis process discerns the crucial factors effecting price increases by calculation of weighted values. The model was verified, and its accuracy determined, by comparing its predicted values with actual price increases.Keywords: Big data, building-value analysis, machine learning, price prediction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 116413379 A New Predictor of Coding Regions in Genomic Sequences using a Combination of Different Approaches
Authors: Aníbal Rodríguez Fuentes, Juan V. Lorenzo Ginori, Ricardo Grau Ábalo
Abstract:
Identifying protein coding regions in DNA sequences is a basic step in the location of genes. Several approaches based on signal processing tools have been applied to solve this problem, trying to achieve more accurate predictions. This paper presents a new predictor that improves the efficacy of three techniques that use the Fourier Transform to predict coding regions, and that could be computed using an algorithm that reduces the computation load. Some ideas about the combination of the predictor with other methods are discussed. ROC curves are used to demonstrate the efficacy of the proposed predictor, based on the computation of 25 DNA sequences from three different organisms.
Keywords: Bioinformatics, Coding region prediction, Computational load reduction, Digital Signal Processing, Fourier Transform.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 166813378 Welding Process Selection for Storage Tank by Integrated Data Envelopment Analysis and Fuzzy Credibility Constrained Programming Approach
Authors: Rahmad Wisnu Wardana, Eakachai Warinsiriruk, Sutep Joy-A-Ka
Abstract:
Selecting the most suitable welding process usually depends on experiences or common application in similar companies. However, this approach generally ignores many criteria that can be affecting the suitable welding process selection. Therefore, knowledge automation through knowledge-based systems will significantly improve the decision-making process. The aims of this research propose integrated data envelopment analysis (DEA) and fuzzy credibility constrained programming approach for identifying the best welding process for stainless steel storage tank in the food and beverage industry. The proposed approach uses fuzzy concept and credibility measure to deal with uncertain data from experts' judgment. Furthermore, 12 parameters are used to determine the most appropriate welding processes among six competitive welding processes.
Keywords: Welding process selection, data envelopment analysis, fuzzy credibility constrained programming, storage tank.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 79913377 Fault Detection of Drinking Water Treatment Process Using PCA and Hotelling's T2 Chart
Authors: Joval P George, Dr. Zheng Chen, Philip Shaw
Abstract:
This paper deals with the application of Principal Component Analysis (PCA) and the Hotelling-s T2 Chart, using data collected from a drinking water treatment process. PCA is applied primarily for the dimensional reduction of the collected data. The Hotelling-s T2 control chart was used for the fault detection of the process. The data was taken from a United Utilities Multistage Water Treatment Works downloaded from an Integrated Program Management (IPM) dashboard system. The analysis of the results show that Multivariate Statistical Process Control (MSPC) techniques such as PCA, and control charts such as Hotelling-s T2, can be effectively applied for the early fault detection of continuous multivariable processes such as Drinking Water Treatment. The software package SIMCA-P was used to develop the MSPC models and Hotelling-s T2 Chart from the collected data.
Keywords: Principal component analysis, hotelling's t2 chart, multivariate statistical process control, drinking water treatment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 278513376 Stakeholder Analysis of Agricultural Drone Policy: A Case Study of the Agricultural Drone Ecosystem of Thailand
Authors: Thanomsin Chakreeves, Atichat Preittigun, Ajchara Phu-ang
Abstract:
This paper presents a stakeholder analysis of agricultural drone policies that meet the government's goal of building an agricultural drone ecosystem in Thailand. Firstly, case studies from other countries are reviewed. The stakeholder analysis method and qualitative data from the interviews are then presented including data from the Institute of Innovation and Management, the Office of National Higher Education Science Research and Innovation Policy Council, agricultural entrepreneurs and farmers. Study and interview data are then employed to describe the current ecosystem and to guide the implementation of agricultural drone policies that are suitable for the ecosystem of Thailand. Finally, policy recommendations are then made that the Thai government should adopt in the future.
Keywords: Drone public policy, drone ecosystem, policy development, agricultural drone.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 80613375 Q-Map: Clinical Concept Mining from Clinical Documents
Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala
Abstract:
Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.Keywords: Information retrieval (IR), unified medical language system (UMLS), Syntax Based Analysis, natural language processing (NLP), medical informatics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 77913374 Analysis of Users’ Behavior on Book Loan Log Based On Association Rule Mining
Authors: Kanyarat Bussaban, Kunyanuth Kularbphettong
Abstract:
This research aims to create a model for analysis of student behavior using Library resources based on data mining technique in case of Suan Sunandha Rajabhat University. The model was created under association rules, Apriori algorithm. The results were found 14 rules and the rules were tested with testing data set and it showed that the ability of classify data was 79.24percent and the MSE was 22.91. The results showed that the user’s behavior model by using association rule technique can use to manage the library resources.
Keywords: Behavior, data mining technique, Apriori algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 230613373 Parallelization of Ensemble Kalman Filter (EnKF) for Oil Reservoirs with Time-lapse Seismic Data
Authors: Md Khairullah, Hai-Xiang Lin, Remus G. Hanea, Arnold W. Heemink
Abstract:
In this paper we describe the design and implementation of a parallel algorithm for data assimilation with ensemble Kalman filter (EnKF) for oil reservoir history matching problem. The use of large number of observations from time-lapse seismic leads to a large turnaround time for the analysis step, in addition to the time consuming simulations of the realizations. For efficient parallelization it is important to consider parallel computation at the analysis step. Our experiments show that parallelization of the analysis step in addition to the forecast step has good scalability, exploiting the same set of resources with some additional efforts.
Keywords: EnKF, Data assimilation, Parallel computing, Parallel efficiency.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 228113372 Predicting DHF Incidence in Northern Thailand using Time Series Analysis Technique
Authors: S. Wongkoon, M. Pollar, M. Jaroensutasinee, K. Jaroensutasinee
Abstract:
This study aimed at developing a forecasting model on the number of Dengue Haemorrhagic Fever (DHF) incidence in Northern Thailand using time series analysis. We developed Seasonal Autoregressive Integrated Moving Average (SARIMA) models on the data collected between 2003-2006 and then validated the models using the data collected between January-September 2007. The results showed that the regressive forecast curves were consistent with the pattern of actual values. The most suitable model was the SARIMA(2,0,1)(0,2,0)12 model with a Akaike Information Criterion (AIC) of 12.2931 and a Mean Absolute Percent Error (MAPE) of 8.91713. The SARIMA(2,0,1)(0,2,0)12 model fitting was adequate for the data with the Portmanteau statistic Q20 = 8.98644 ( x20,95= 27.5871, P>0.05). This indicated that there was no significant autocorrelation between residuals at different lag times in the SARIMA(2,0,1)(0,2,0)12 model.
Keywords: Dengue, SARIMA, Time Series Analysis, Northern Thailand.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 199013371 Geospatial Network Analysis Using Particle Swarm Optimization
Authors: Varun Singh, Mainak Bandyopadhyay, Maharana Pratap Singh
Abstract:
The shortest path (SP) problem concerns with finding the shortest path from a specific origin to a specified destination in a given network while minimizing the total cost associated with the path. This problem has widespread applications. Important applications of the SP problem include vehicle routing in transportation systems particularly in the field of in-vehicle Route Guidance System (RGS) and traffic assignment problem (in transportation planning). Well known applications of evolutionary methods like Genetic Algorithms (GA), Ant Colony Optimization, Particle Swarm Optimization (PSO) have come up to solve complex optimization problems to overcome the shortcomings of existing shortest path analysis methods. It has been reported by various researchers that PSO performs better than other evolutionary optimization algorithms in terms of success rate and solution quality. Further Geographic Information Systems (GIS) have emerged as key information systems for geospatial data analysis and visualization. This research paper is focused towards the application of PSO for solving the shortest path problem between multiple points of interest (POI) based on spatial data of Allahabad City and traffic speed data collected using GPS. Geovisualization of results of analysis is carried out in GIS.
Keywords: GIS, Outliers, PSO, Traffic Data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 289213370 The Trend of Injuries in Building Fire in Tehran from 2002 to 2012
Authors: Mohammadreza Ashouri, Majid Bayatian
Abstract:
Analysis of fire data is a way for the implementation of any plan to improve the level of safety in cities. Such an analysis is able to reveal signs of changes in a given period and can be used as a measure of safety. The information of about 66,341 fires (from 2002 to 2012) released by Tehran Safety Services and Fire-Fighting Organization and data on the population and the number of households provided by Tehran Municipality and the Statistical Yearbook of Iran were extracted. Using the data, the fire changes, the rate of injuries, and mortality rate were determined and analyzed. The rate of injuries and mortality rate of fires per one million population of Tehran were 59.58% and 86.12%, respectively. During the study period, the number of fires and fire stations increased by 104.38% and 102.63%, respectively. Most fires (9.21%) happened in the 4th District of Tehran. The results showed that the recorded fire data have not been systematically planned for fire prevention since one of the ways to reduce injuries caused by fires is to develop a systematic plan for necessary actions in emergency situations. To determine a reliable source for fire prevention, the stages, definitions of working processes and the cause and effect chains should be considered. Therefore, a comprehensive statistical system should be developed for reported and recorded fire data.
Keywords: Fire statistics, fire analysis, accident prevention, Tehran.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 76913369 Coverage Probability Analysis of WiMAX Network under Additive White Gaussian Noise and Predicted Empirical Path Loss Model
Authors: Chaudhuri Manoj Kumar Swain, Susmita Das
Abstract:
This paper explores a detailed procedure of predicting a path loss (PL) model and its application in estimating the coverage probability in a WiMAX network. For this a hybrid approach is followed in predicting an empirical PL model of a 2.65 GHz WiMAX network deployed in a suburban environment. Data collection, statistical analysis, and regression analysis are the phases of operations incorporated in this approach and the importance of each of these phases has been discussed properly. The procedure of collecting data such as received signal strength indicator (RSSI) through experimental set up is demonstrated. From the collected data set, empirical PL and RSSI models are predicted with regression technique. Furthermore, with the aid of the predicted PL model, essential parameters such as PL exponent as well as the coverage probability of the network are evaluated. This research work may assist in the process of deployment and optimisation of any cellular network significantly.
Keywords: WiMAX, RSSI, path loss, coverage probability, regression analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 70613368 Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance
Authors: Ekachai Phaisangittisagul, Rapeepol Chongprachawat
Abstract:
Obtaining labeled data in supervised learning is often difficult and expensive, and thus the trained learning algorithm tends to be overfitting due to small number of training data. As a result, some researchers have focused on using unlabeled data which may not necessary to follow the same generative distribution as the labeled data to construct a high-level feature for improving performance on supervised learning tasks. In this paper, we investigate the impact of the relationship between unlabeled and labeled data for classification performance. Specifically, we will apply difference unlabeled data which have different degrees of relation to the labeled data for handwritten digit classification task based on MNIST dataset. Our experimental results show that the higher the degree of relation between unlabeled and labeled data, the better the classification performance. Although the unlabeled data that is completely from different generative distribution to the labeled data provides the lowest classification performance, we still achieve high classification performance. This leads to expanding the applicability of the supervised learning algorithms using unsupervised learning.Keywords: Autoencoder, high-level feature, MNIST dataset, selftaught learning, supervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 183213367 Corporate Governance and Share Prices: Firm Level Review in Turkey
Authors: Raif Parlakkaya, Ahmet Diken, Erkan Kara
Abstract:
This paper examines the relationship between corporate governance rating and stock prices of 26 Turkish firms listed in Turkish stock exchange (Borsa Istanbul) by using panel data analysis over five-year period. The paper also investigates the stock performance of firms with governance rating with regards to the market portfolio (i.e. BIST 100 Index) both prior and after governance scoring began. The empirical results show that there is no relation between corporate governance rating and stock prices when using panel data for annual variation in both rating score and stock prices. Further analysis indicates surprising results that while the selected firms outperform the market significantly prior to rating, the same performance does not continue afterwards.Keywords: Corporate governance, stock price, performance, panel data analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 252613366 A CFD Analysis of Hydraulic Characteristics of the Rod Bundles in the BREST-OD-300 Wire-Spaced Fuel Assemblies
Authors: Dmitry V. Fomichev, Vladimir I. Solonin
Abstract:
This paper presents the findings from a numerical simulation of the flow in 37-rod fuel assembly models spaced by a double-wire trapezoidal wrapping as applied to the BREST-OD-300 experimental nuclear reactor. Data on a high static pressure distribution within the models, and equations for determining the fuel bundle flow friction factors have been obtained. Recommendations are provided on using the closing turbulence models available in the ANSYS Fluent. A comparative analysis has been performed against the existing empirical equations for determining the flow friction factors. The calculated and experimental data fit has been shown.
An analysis into the experimental data and results of the numerical simulation of the BREST-OD-300 fuel rod assembly hydrodynamic performance are presented.
Keywords: BREST-OD-300, ware-spaces, fuel assembly, computation fluid dynamics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 222713365 A New Precautionary Method for Measurement and Improvement the Data Quality
Authors: Seyed Mohammad Hossein Moossavizadeh, Mehran Mohsenzadeh, Nasrin Arshadi
Abstract:
the data quality is a kind of complex and unstructured concept, which is concerned by information systems managers. The reason of this attention is the high amount of Expenses for maintenance and cleaning of the inefficient data. Such a data more than its expenses of lack of quality, cause wrong statistics, analysis and decisions in organizations. Therefor the managers intend to improve the quality of their information systems' data. One of the basic subjects of quality improvement is the evaluation of the amount of it. In this paper, we present a precautionary method, which with its application the data of information systems would have a better quality. Our method would cover different dimensions of data quality; therefor it has necessary integrity. The presented method has tested on three dimensions of accuracy, value-added and believability and the results confirm the improvement and integrity of this method.
Keywords: Data quality, precaution, information system, measurement, improvement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 146813364 The Leaves of a Tree
Authors: Zhu Jiaming, Yu Mengna
Abstract:
In this article, models based on quantitative analysis, physical geometry and regression analysis are established, by using analytic hierarchy process analysis, fuzzy cluster analysis, fuzzy photographic and data fitting. The reasons of various leaf shapes among different species and the differences between the leaf shapes on same tree have been solved by using software, such as Eviews, VB and Matlab. We also successfully estimate the leaf mass of a tree and the correlation with the tree profile.Keywords: Leaf shape; Mass; Fuzzy cluster; Regression analysis; Eviews; Matlab
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 159713363 Classifying Bio-Chip Data using an Ant Colony System Algorithm
Authors: Minsoo Lee, Yearn Jeong Kim, Yun-mi Kim, Sujeung Cheong, Sookyung Song
Abstract:
Bio-chips are used for experiments on genes and contain various information such as genes, samples and so on. The two-dimensional bio-chips, in which one axis represent genes and the other represent samples, are widely being used these days. Instead of experimenting with real genes which cost lots of money and much time to get the results, bio-chips are being used for biological experiments. And extracting data from the bio-chips with high accuracy and finding out the patterns or useful information from such data is very important. Bio-chip analysis systems extract data from various kinds of bio-chips and mine the data in order to get useful information. One of the commonly used methods to mine the data is classification. The algorithm that is used to classify the data can be various depending on the data types or number characteristics and so on. Considering that bio-chip data is extremely large, an algorithm that imitates the ecosystem such as the ant algorithm is suitable to use as an algorithm for classification. This paper focuses on finding the classification rules from the bio-chip data using the Ant Colony algorithm which imitates the ecosystem. The developed system takes in consideration the accuracy of the discovered rules when it applies it to the bio-chip data in order to predict the classes.Keywords: Ant Colony System, DNA chip data, Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 146813362 Automatic Detection and Spatio-temporal Analysis of Commercial Accumulations Using Digital Yellow Page Data
Authors: Yuki. Akiyama, Hiroaki. Sengoku, Ryosuke. Shibasaki
Abstract:
In this study, the locations and areas of commercial accumulations were detected by using digital yellow page data. An original buffering method that can accurately create polygons of commercial accumulations is proposed in this paper.; by using this method, distribution of commercial accumulations can be easily created and monitored over a wide area. The locations, areas, and time-series changes of commercial accumulations in the South Kanto region can be monitored by integrating polygons of commercial accumulations with the time-series data of digital yellow page data. The circumstances of commercial accumulations were shown to vary according to areas, that is, highly- urbanized regions such as the city center of Tokyo and prefectural capitals, suburban areas near large cities, and suburban and rural areas.Keywords: Commercial accumulations, Spatio-temporal analysis, Urban monitoring, Yellow page data
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 126313361 Clustering Multivariate Empiric Characteristic Functions for Multi-Class SVM Classification
Authors: María-Dolores Cubiles-de-la-Vega, Rafael Pino-Mejías, Esther-Lydia Silva-Ramírez
Abstract:
A dissimilarity measure between the empiric characteristic functions of the subsamples associated to the different classes in a multivariate data set is proposed. This measure can be efficiently computed, and it depends on all the cases of each class. It may be used to find groups of similar classes, which could be joined for further analysis, or it could be employed to perform an agglomerative hierarchical cluster analysis of the set of classes. The final tree can serve to build a family of binary classification models, offering an alternative approach to the multi-class SVM problem. We have tested this dendrogram based SVM approach with the oneagainst- one SVM approach over four publicly available data sets, three of them being microarray data. Both performances have been found equivalent, but the first solution requires a smaller number of binary SVM models.Keywords: Cluster Analysis, Empiric Characteristic Function, Multi-class SVM, R.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 187713360 Wind Speed Data Analysis using Wavelet Transform
Authors: S. Avdakovic, A. Lukac, A. Nuhanovic, M. Music
Abstract:
Renewable energy systems are becoming a topic of great interest and investment in the world. In recent years wind power generation has experienced a very fast development in the whole world. For planning and successful implementations of good wind power plant projects, wind potential measurements are required. In these projects, of great importance is the effective choice of the micro location for wind potential measurements, installation of the measurement station with the appropriate measuring equipment, its maintenance and analysis of the gained data on wind potential characteristics. In this paper, a wavelet transform has been applied to analyze the wind speed data in the context of insight in the characteristics of the wind and the selection of suitable locations that could be the subject of a wind farm construction. This approach shows that it can be a useful tool in investigation of wind potential.Keywords: Wind potential, Wind speed data, Wavelettransform.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 263213359 Student Satisfaction Data for Work Based Learners
Authors: Rosie Borup, Hanifa Shah
Abstract:
This paper aims to describe how student satisfaction is measured for work-based learners as these are non-traditional learners, conducting academic learning in the workplace, typically their curricula have a high degree of negotiation, and whose motivations are directly related to their employers- needs, as well as their own career ambitions. We argue that while increasing WBL participation, and use of SSD are both accepted as being of strategic importance to the HE agenda, the use of WBL SSD is rarely examined, and lessons can be learned from the comparison of SSD from a range of WBL programmes, and increased visibility of this type of data will provide insight into ways to improve and develop this type of delivery. The key themes that emerged from the analysis of the interview data were: learners profiles and needs, employers drivers, academic staff drivers, organizational approach, tools for collecting data and visibility of findings. The paper concludes with observations on best practice in the collection, analysis and use of WBL SSD, thus offering recommendations for both academic managers and practitioners.Keywords: Student satisfaction data, work based learning, employer engagement, NSS.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 149313358 Data Transformation Services (DTS): Creating Data Mart by Consolidating Multi-Source Enterprise Operational Data
Authors: J. D. D. Daniel, K. N. Goh, S. M. Yusop
Abstract:
Trends in business intelligence, e-commerce and remote access make it necessary and practical to store data in different ways on multiple systems with different operating systems. As business evolve and grow, they require efficient computerized solution to perform data update and to access data from diverse enterprise business applications. The objective of this paper is to demonstrate the capability of DTS [1] as a database solution for automatic data transfer and update in solving business problem. This DTS package is developed for the sales of variety of plants and eventually expanded into commercial supply and landscaping business. Dimension data modeling is used in DTS package to extract, transform and load data from heterogeneous database systems such as MySQL, Microsoft Access and Oracle that consolidates into a Data Mart residing in SQL Server. Hence, the data transfer from various databases is scheduled to run automatically every quarter of the year to review the efficient sales analysis. Therefore, DTS is absolutely an attractive solution for automatic data transfer and update which meeting today-s business needs.Keywords: Data Transformation Services (DTS), ObjectLinking and Embedding Database (OLEDB), Data Mart, OnlineAnalytical Processing (OLAP), Online Transactional Processing(OLTP).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 203813357 Granularity Analysis for Spatio-Temporal Web Sensors
Authors: Shun Hattori
Abstract:
In recent years, many researches to mine the exploding Web world, especially User Generated Content (UGC) such as weblogs, for knowledge about various phenomena and events in the physical world have been done actively, and also Web services with the Web-mined knowledge have begun to be developed for the public. However, there are few detailed investigations on how accurately Web-mined data reflect physical-world data. It must be problematic to idolatrously utilize the Web-mined data in public Web services without ensuring their accuracy sufficiently. Therefore, this paper introduces the simplest Web Sensor and spatiotemporallynormalized Web Sensor to extract spatiotemporal data about a target phenomenon from weblogs searched by keyword(s) representing the target phenomenon, and tries to validate the potential and reliability of the Web-sensed spatiotemporal data by four kinds of granularity analyses of coefficient correlation with temperature, rainfall, snowfall, and earthquake statistics per day by region of Japan Meteorological Agency as physical-world data: spatial granularity (region-s population density), temporal granularity (time period, e.g., per day vs. per week), representation granularity (e.g., “rain" vs. “heavy rain"), and media granularity (weblogs vs. microblogs such as Tweets).Keywords: Granularity analysis, knowledge extraction, spatiotemporal data mining, Web credibility, Web mining, Web sensor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 188213356 Multivariate Analysis of Spectroscopic Data for Agriculture Applications
Authors: Asmaa M. Hussein, Amr Wassal, Ahmed Farouk Al-Sadek, A. F. Abd El-Rahman
Abstract:
In this study, a multivariate analysis of potato spectroscopic data was presented to detect the presence of brown rot disease or not. Near-Infrared (NIR) spectroscopy (1,350-2,500 nm) combined with multivariate analysis was used as a rapid, non-destructive technique for the detection of brown rot disease in potatoes. Spectral measurements were performed in 565 samples, which were chosen randomly at the infection place in the potato slice. In this study, 254 infected and 311 uninfected (brown rot-free) samples were analyzed using different advanced statistical analysis techniques. The discrimination performance of different multivariate analysis techniques, including classification, pre-processing, and dimension reduction, were compared. Applying a random forest algorithm classifier with different pre-processing techniques to raw spectra had the best performance as the total classification accuracy of 98.7% was achieved in discriminating infected potatoes from control.
Keywords: Brown rot disease, NIR spectroscopy, potato, random forest.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 88513355 A Modified AES Based Algorithm for Image Encryption
Authors: M. Zeghid, M. Machhout, L. Khriji, A. Baganne, R. Tourki
Abstract:
With the fast evolution of digital data exchange, security information becomes much important in data storage and transmission. Due to the increasing use of images in industrial process, it is essential to protect the confidential image data from unauthorized access. In this paper, we analyze the Advanced Encryption Standard (AES), and we add a key stream generator (A5/1, W7) to AES to ensure improving the encryption performance; mainly for images characterised by reduced entropy. The implementation of both techniques has been realized for experimental purposes. Detailed results in terms of security analysis and implementation are given. Comparative study with traditional encryption algorithms is shown the superiority of the modified algorithm.Keywords: Cryptography, Encryption, Advanced EncryptionStandard (AES), ECB mode, statistical analysis, key streamgenerator.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 505813354 Multidimensional Performance Management
Authors: David Wiese
Abstract:
In order to maximize efficiency of an information management platform and to assist in decision making, the collection, storage and analysis of performance-relevant data has become of fundamental importance. This paper addresses the merits and drawbacks provided by the OLAP paradigm for efficiently navigating large volumes of performance measurement data hierarchically. The system managers or database administrators navigate through adequately (re)structured measurement data aiming to detect performance bottlenecks, identify causes for performance problems or assessing the impact of configuration changes on the system and its representative metrics. Of particular importance is finding the root cause of an imminent problem, threatening availability and performance of an information system. Leveraging OLAP techniques, in contrast to traditional static reporting, this is supposed to be accomplished within moderate amount of time and little processing complexity. It is shown how OLAP techniques can help improve understandability and manageability of measurement data and, hence, improve the whole Performance Analysis process.
Keywords: Data Warehousing, OLAP, Multidimensional Navigation, Performance Diagnosis, Performance Management, Performance Tuning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 213513353 A Study of the Adaptive Reuse for School Land Use Strategy: An Application of the Analytic Network Process and Big Data
Authors: Wann-Ming Wey
Abstract:
In today's popularity and progress of information technology, the big data set and its analysis are no longer a major conundrum. Now, we could not only use the relevant big data to analysis and emulate the possible status of urban development in the near future, but also provide more comprehensive and reasonable policy implementation basis for government units or decision-makers via the analysis and emulation results as mentioned above. In this research, we set Taipei City as the research scope, and use the relevant big data variables (e.g., population, facility utilization and related social policy ratings) and Analytic Network Process (ANP) approach to implement in-depth research and discussion for the possible reduction of land use in primary and secondary schools of Taipei City. In addition to enhance the prosperous urban activities for the urban public facility utilization, the final results of this research could help improve the efficiency of urban land use in the future. Furthermore, the assessment model and research framework established in this research also provide a good reference for schools or other public facilities land use and adaptive reuse strategies in the future.
Keywords: Adaptive reuse, analytic network process, big data, land use strategy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 92113352 A Review and Comparative Analysis on Cluster Ensemble Methods
Authors: S. Sarumathi, P. Ranjetha, C. Saraswathy, M. Vaishnavi, S. Geetha
Abstract:
Clustering is an unsupervised learning technique for aggregating data objects into meaningful classes so that intra cluster similarity is maximized and inter cluster similarity is minimized in data mining. However, no single clustering algorithm proves to be the most effective in producing the best result. As a result, a new challenging technique known as the cluster ensemble approach has blossomed in order to determine the solution to this problem. For the cluster analysis issue, this new technique is a successful approach. The cluster ensemble's main goal is to combine similar clustering solutions in a way that achieves the precision while also improving the quality of individual data clustering. Because of the massive and rapid creation of new approaches in the field of data mining, the ongoing interest in inventing novel algorithms necessitates a thorough examination of current techniques and future innovation. This paper presents a comparative analysis of various cluster ensemble approaches, including their methodologies, formal working process, and standard accuracy and error rates. As a result, the society of clustering practitioners will benefit from this exploratory and clear research, which will aid in determining the most appropriate solution to the problem at hand.
Keywords: Clustering, cluster ensemble methods, consensus function, data mining, unsupervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 82013351 Grocery Customer Behavior Analysis using RFID-based Shopping Paths Data
Authors: In-Chul Jung, Young S. Kwon
Abstract:
Knowing about the customer behavior in a grocery has been a long-standing issue in the retailing industry. The advent of RFID has made it easier to collect moving data for an individual shopper's behavior. Most of the previous studies used the traditional statistical clustering technique to find the major characteristics of customer behavior, especially shopping path. However, in using the clustering technique, due to various spatial constraints in the store, standard clustering methods are not feasible because moving data such as the shopping path should be adjusted in advance of the analysis, which is time-consuming and causes data distortion. To alleviate this problem, we propose a new approach to spatial pattern clustering based on the longest common subsequence. Experimental results using real data obtained from a grocery confirm the good performance of the proposed method in finding the hot spot, dead spot and major path patterns of customer movements.Keywords: customer path, shopping behavior, exploratoryanalysis, LCS, RFID
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3148