Search results for: Spatial data
7457 Soil/Phytofisionomy Relationship in Southeast of Chapada Diamantina, Bahia, Brazil
Authors: Marcelo Araujo da Nóbrega, Ariel Moura Vilas Boas
Abstract:
This study aims to characterize the physicochemical aspects of the soils of southeastern Chapada Diamantina - Bahia related to the phytophysiognomies of this area, rupestrian field, small savanna (savanna fields), small dense savanna (savanna fields), savanna (Cerrado), dry thorny forest (Caatinga), dry thorny forest/savanna, scrub (Carrasco - ecotone), forest island (seasonal semi-deciduous forest - Capão) and seasonal semi-deciduous forest. To achieve the research objective, soil samples were collected in each plant formation and analyzed in the soil laboratory of ESALQ - USP in order to identify soil fertility through the determination of pH, organic matter, phosphorus, potassium, calcium, magnesium, potential acidity, sum of bases, cation exchange capacity and base saturation. The composition of soil particles was also checked; that is, the texture, step made in the terrestrial ecosystems laboratory of the Department of Ecology of USP and in the soil laboratory of ESALQ. Another important factor also studied was to show the variations in the vegetation cover in the region as a function of soil moisture in the different existing physiographic environments. Another study carried out was a comparison between the average soil moisture data with precipitation data from three locations with very different phytophysiognomies. The soils found in this part of Bahia can be classified into 5 classes, with a predominance of oxisols. All of these classes have a great diversity of physical and chemical properties, as can be seen in photographs and in particle size and fertility analyzes. The deepest soils are located in the Central Pediplano of Chapada Diamantina where the dirty field, the clean field, the executioner and the semideciduous seasonal forest (Capão) are located, and the shallower soils were found in the rupestrian field, dry thorny forest, and savanna fields, the latter located on a hillside. As for the variations in water in the region's soil, the data indicate that there were large spatial variations in humidity in both the rainy and dry periods.
Keywords: Bahia, Chapada diamantina, phytophysiognomies, soils.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5817456 A Network Traffic Prediction Algorithm Based On Data Mining Technique
Authors: D. Prangchumpol
Abstract:
This paper is a description approach to predict incoming and outgoing data rate in network system by using association rule discover, which is one of the data mining techniques. Information of incoming and outgoing data in each times and network bandwidth are network performance parameters, which needed to solve in the traffic problem. Since congestion and data loss are important network problems. The result of this technique can predicted future network traffic. In addition, this research is useful for network routing selection and network performance improvement.
Keywords: Traffic prediction, association rule, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 36697455 Fuzzy Processing of Uncertain Data
Authors: Petr Morávek, Miloš Šeda
Abstract:
In practice, we often come across situations where it is necessary to make decisions based on incomplete or uncertain data. In control systems it may be due to the unknown exact mathematical model, or its excessive complexity (e.g. nonlinearity) when it is necessary to simplify it, respectively, to solve it using a rule base. In the case of databases, searching data we compare a similarity measure with of the requirements of the selection with stored data, where both the select query and the data itself may contain vague terms, for example in the form of linguistic qualifiers. In this paper, we focus on the processing of uncertain data in databases and demonstrate it on the example multi-criteria decision making in the selection of variants, specified by higher number of technical parameters.Keywords: fuzzy logic, linguistic variable, multicriteria decision
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14187454 Automated Stereophotogrammetry Data Cleansing
Authors: Stuart Henry, Philip Morrow, John Winder, Bryan Scotney
Abstract:
The stereophotogrammetry modality is gaining more widespread use in the clinical setting. Registration and visualization of this data, in conjunction with conventional 3D volumetric image modalities, provides virtual human data with textured soft tissue and internal anatomical and structural information. In this investigation computed tomography (CT) and stereophotogrammetry data is acquired from 4 anatomical phantoms and registered using the trimmed iterative closest point (TrICP) algorithm. This paper fully addresses the issue of imaging artifacts around the stereophotogrammetry surface edge using the registered CT data as a reference. Several iterative algorithms are implemented to automatically identify and remove stereophotogrammetry surface edge outliers, improving the overall visualization of the combined stereophotogrammetry and CT data. This paper shows that outliers at the surface edge of stereophotogrammetry data can be successfully removed automatically.
Keywords: Data cleansing, stereophotogrammetry.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18427453 Optimal Document Archiving and Fast Information Retrieval
Authors: Hazem M. El-Bakry, Ahmed A. Mohammed
Abstract:
In this paper, an intelligent algorithm for optimal document archiving is presented. It is kown that electronic archives are very important for information system management. Minimizing the size of the stored data in electronic archive is a main issue to reduce the physical storage area. Here, the effect of different types of Arabic fonts on electronic archives size is discussed. Simulation results show that PDF is the best file format for storage of the Arabic documents in electronic archive. Furthermore, fast information detection in a given PDF file is introduced. Such approach uses fast neural networks (FNNs) implemented in the frequency domain. The operation of these networks relies on performing cross correlation in the frequency domain rather than spatial one. It is proved mathematically and practically that the number of computation steps required for the presented FNNs is less than that needed by conventional neural networks (CNNs). Simulation results using MATLAB confirm the theoretical computations.Keywords: Information Storage and Retrieval, Electronic Archiving, Fast Information Detection, Cross Correlation, Frequency Domain.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15867452 An Improved Data Mining Method Applied to the Search of Relationship between Metabolic Syndrome and Lifestyles
Authors: Yi Chao Huang, Yu Ling Liao, Chiu Shuang Lin
Abstract:
A data cutting and sorting method (DCSM) is proposed to optimize the performance of data mining. DCSM reduces the calculation time by getting rid of redundant data during the data mining process. In addition, DCSM minimizes the computational units by splitting the database and by sorting data with support counts. In the process of searching for the relationship between metabolic syndrome and lifestyles with the health examination database of an electronics manufacturing company, DCSM demonstrates higher search efficiency than the traditional Apriori algorithm in tests with different support counts.Keywords: Data mining, Data cutting and sorting method, Apriori algorithm, Metabolic syndrome
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15887451 Data Mining Approach for Commercial Data Classification and Migration in Hybrid Storage Systems
Authors: Mais Haj Qasem, Maen M. Al Assaf, Ali Rodan
Abstract:
Parallel hybrid storage systems consist of a hierarchy of different storage devices that vary in terms of data reading speed performance. As we ascend in the hierarchy, data reading speed becomes faster. Thus, migrating the application’ important data that will be accessed in the near future to the uppermost level will reduce the application I/O waiting time; hence, reducing its execution elapsed time. In this research, we implement trace-driven two-levels parallel hybrid storage system prototype that consists of HDDs and SSDs. The prototype uses data mining techniques to classify application’ data in order to determine its near future data accesses in parallel with the its on-demand request. The important data (i.e. the data that the application will access in the near future) are continuously migrated to the uppermost level of the hierarchy. Our simulation results show that our data migration approach integrated with data mining techniques reduces the application execution elapsed time when using variety of traces in at least to 22%.Keywords: Data mining, hybrid storage system, recurrent neural network, support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17367450 Long Term Variability of Temperature in Armenia in the Context of Climate Change
Authors: Hrachuhi Galstyan, Lucian Sfîcă, Pavel Ichim
Abstract:
The purpose of this study is to analyze the temporal and spatial variability of thermal conditions in the Republic of Armenia. The paper describes annual fluctuations in air temperature. Research has been focused on case study region of Armenia and surrounding areas, where long–term measurements and observations of weather conditions have been performed within the National Meteorological Service of Armenia and its surrounding areas. The study contains yearly air temperature data recorded between 1961- 2012. Mann-Kendal test and the autocorrelation function were applied to detect the change trend of annual mean temperature, as well as other parametric and non-parametric tests searching to find the presence of some breaks in the long term evolution of temperature. The analysis of all records reveals a tendency mostly towards warmer years, with increased temperatures especially in valleys and inner basins. The maximum temperature increase is up to 1,5°C. Negative results have not been observed in Armenia. The patterns of temperature change have been observed since the 1990’s over much of the Armenian territory. The climate in Armenia was influenced by global change in the last 2 decades, as results from the methods employed within the study.Keywords: Air temperature, long-term variability, trend, climate change.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22157449 Association Rules Mining and NOSQL Oriented Document in Big Data
Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub
Abstract:
Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.
Keywords: Apriori, Association rules mining, Big Data, data mining, Hadoop, Map Reduce, MongoDB, NoSQL.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6947448 Hydrologic Balance and Surface Water Resources of the Cheliff-Zahrez Basin
Authors: Mehaiguene Madjid, Touhari Fadhila, Meddi Mohamed
Abstract:
The Cheliff basin offers a good hydrological example for the possibility of studying the problem which elucidated in the future, because of the unclearity in several aspects and hydraulic installation. Thus, our study of the Cheliff basin is divided into two principal parts: The spatial evaluation of the precipitation: also, the understanding of the modes of the reconstitution of the resource in water supposes a good knowledge of the structuring of the precipitation fields in the studied space. In the goal of a good knowledge of revitalizes them in water and their management integrated one judged necessary to establish a precipitation card of the Cheliff basin for a good understanding of the evolution of the resource in water in the basin and that goes will serve as basis for all study of hydraulic planning in the Cheliff basin. Then, the establishment of the precipitation card of the Cheliff basin answered a direct need of setting to the disposition of the researchers for the region and a document of reference that will be completed therefore and actualized. The hydrological study, based on the statistical hydrometric data processing will lead us to specify the hydrological terms of the assessment hydrological and to clarify the fundamental aspects of the annual flow, seasonal, extreme and thus of their variability and resources surface water.
Keywords: Hydrological assessment, surface water resources, Cheliff, Algeria.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10487447 Computing Continuous Skyline Queries without Discriminating between Static and Dynamic Attributes
Authors: Ibrahim Gomaa, Hoda M. O. Mokhtar
Abstract:
Although most of the existing skyline queries algorithms focused basically on querying static points through static databases; with the expanding number of sensors, wireless communications and mobile applications, the demand for continuous skyline queries has increased. Unlike traditional skyline queries which only consider static attributes, continuous skyline queries include dynamic attributes, as well as the static ones. However, as skyline queries computation is based on checking the domination of skyline points over all dimensions, considering both the static and dynamic attributes without separation is required. In this paper, we present an efficient algorithm for computing continuous skyline queries without discriminating between static and dynamic attributes. Our algorithm in brief proceeds as follows: First, it excludes the points which will not be in the initial skyline result; this pruning phase reduces the required number of comparisons. Second, the association between the spatial positions of data points is examined; this phase gives an idea of where changes in the result might occur and consequently enables us to efficiently update the skyline result (continuous update) rather than computing the skyline from scratch. Finally, experimental evaluation is provided which demonstrates the accuracy, performance and efficiency of our algorithm over other existing approaches.
Keywords: Continuous query processing, dynamic database, moving object, skyline queries.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12437446 Identifying Critical Success Factors for Data Quality Management through a Delphi Study
Authors: Maria Paula Santos, Ana Lucas
Abstract:
Organizations support their operations and decision making on the data they have at their disposal, so the quality of these data is remarkably important and Data Quality (DQ) is currently a relevant issue, the literature being unanimous in pointing out that poor DQ can result in large costs for organizations. The literature review identified and described 24 Critical Success Factors (CSF) for Data Quality Management (DQM) that were presented to a panel of experts, who ordered them according to their degree of importance, using the Delphi method with the Q-sort technique, based on an online questionnaire. The study shows that the five most important CSF for DQM are: definition of appropriate policies and standards, control of inputs, definition of a strategic plan for DQ, organizational culture focused on quality of the data and obtaining top management commitment and support.
Keywords: Critical success factors, data quality, data quality management, Delphi, Q-Sort.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11097445 Secure Data Aggregation Using Clusters in Sensor Networks
Authors: Prakash G L, Thejaswini M, S H Manjula, K R Venugopal, L M Patnaik
Abstract:
Wireless sensor network can be applied to both abominable and military environments. A primary goal in the design of wireless sensor networks is lifetime maximization, constrained by the energy capacity of batteries. One well-known method to reduce energy consumption in such networks is data aggregation. Providing efcient data aggregation while preserving data privacy is a challenging problem in wireless sensor networks research. In this paper, we present privacy-preserving data aggregation scheme for additive aggregation functions. The Cluster-based Private Data Aggregation (CPDA)leverages clustering protocol and algebraic properties of polynomials. It has the advantage of incurring less communication overhead. The goal of our work is to bridge the gap between collaborative data collection by wireless sensor networks and data privacy. We present simulation results of our schemes and compare their performance to a typical data aggregation scheme TAG, where no data privacy protection is provided. Results show the efficacy and efficiency of our schemes.Keywords: Aggregation, Clustering, Query Processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17347444 A New Protocol for Concealed Data Aggregation in Wireless Sensor Networks
Authors: M. Abbasi Dezfouli, S. Mazraeh, M. H. Yektaie
Abstract:
Wireless sensor networks (WSN) consists of many sensor nodes that are placed on unattended environments such as military sites in order to collect important information. Implementing a secure protocol that can prevent forwarding forged data and modifying content of aggregated data and has low delay and overhead of communication, computing and storage is very important. This paper presents a new protocol for concealed data aggregation (CDA). In this protocol, the network is divided to virtual cells, nodes within each cell produce a shared key to send and receive of concealed data with each other. Considering to data aggregation in each cell is locally and implementing a secure authentication mechanism, data aggregation delay is very low and producing false data in the network by malicious nodes is not possible. To evaluate the performance of our proposed protocol, we have presented computational models that show the performance and low overhead in our protocol.Keywords: Wireless Sensor Networks, Security, Concealed Data Aggregation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17357443 IMDC: An Image-Mapped Data Clustering Technique for Large Datasets
Authors: Faruq A. Al-Omari, Nabeel I. Al-Fayoumi
Abstract:
In this paper, we present a new algorithm for clustering data in large datasets using image processing approaches. First the dataset is mapped into a binary image plane. The synthesized image is then processed utilizing efficient image processing techniques to cluster the data in the dataset. Henceforth, the algorithm avoids exhaustive search to identify clusters. The algorithm considers only a small set of the data that contains critical boundary information sufficient to identify contained clusters. Compared to available data clustering techniques, the proposed algorithm produces similar quality results and outperforms them in execution time and storage requirements.
Keywords: Data clustering, Data mining, Image-mapping, Pattern discovery, Predictive analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15007442 Evaluation of the IMERG Product Performance at Estimating the Rainfall Properties in a Semi-Arid Region of Mexico
Authors: Eric Muñoz de la Torre, Julián González Trinidad, Efrén González Ramírez
Abstract:
Rain varies greatly in its duration, intensity, and spatial coverage, it is important to have sub-daily rainfall data for various applications, including risk prevention, however, the ground measurements are limited by the low and irregular density of rain gauges. An alternative to this problem is the Satellite Precipitation Products (SPPs) that use passive microwave and infrared sensors to estimate rainfall, as IMERG, however, these SPPs have to be validated before their application. The aim of this study is to evaluate the performance of the IMERG: Integrated Multi-satellitE Retrievals for Global Precipitation Measurement final run V06B SPP in a semi-arid region of Mexico, using four rain gauges sub-daily data of October 2019 and June to September 2021, using the Minimum inter-event Time (MIT) criterion to separate unique rain events with a dry period of 10 hrs for the purpose of evaluating the rainfall properties (depth, duration and intensity). Point to pixel analysis, continuous, categorical, and volumetric statistical metrics were used. Results show that IMERG is capable to estimate the rainfall depth with a slight overestimation but is unable to identify the real duration and intensity of the rain events, showing moderate overestimations and underestimations, respectively. The study zone presented 80 to 85% of convective rain events, the rest were stratiform rain events, classified by the depth magnitude variation of IMERG pixels and rain gauges. IMERG showed poorer performance at detecting the first ones but had a good performance at estimating stratiform rain events that are originated by Cold Fronts.
Keywords: IMERG, rainfall, rain gauge, remote sensing, statistical evaluation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 797441 The New Method of Concealed Data Aggregation in Wireless Sensor: A Case Study
Authors: M. Abbasi Dezfouli, S. Mazraeh, M. H. Yektaie
Abstract:
Wireless sensor networks (WSN) consists of many sensor nodes that are placed on unattended environments such as military sites in order to collect important information. Implementing a secure protocol that can prevent forwarding forged data and modifying content of aggregated data and has low delay and overhead of communication, computing and storage is very important. This paper presents a new protocol for concealed data aggregation (CDA). In this protocol, the network is divided to virtual cells, nodes within each cell produce a shared key to send and receive of concealed data with each other. Considering to data aggregation in each cell is locally and implementing a secure authentication mechanism, data aggregation delay is very low and producing false data in the network by malicious nodes is not possible. To evaluate the performance of our proposed protocol, we have presented computational models that show the performance and low overhead in our protocol.
Keywords: Wireless Sensor Networks, Security, Concealed Data Aggregation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17687440 Peakwise Smoothing of Data Models using Wavelets
Authors: D Sudheer Reddy, N Gopal Reddy, P V Radhadevi, J Saibaba, Geeta Varadan
Abstract:
Smoothing or filtering of data is first preprocessing step for noise suppression in many applications involving data analysis. Moving average is the most popular method of smoothing the data, generalization of this led to the development of Savitzky-Golay filter. Many window smoothing methods were developed by convolving the data with different window functions for different applications; most widely used window functions are Gaussian or Kaiser. Function approximation of the data by polynomial regression or Fourier expansion or wavelet expansion also gives a smoothed data. Wavelets also smooth the data to great extent by thresholding the wavelet coefficients. Almost all smoothing methods destroys the peaks and flatten them when the support of the window is increased. In certain applications it is desirable to retain peaks while smoothing the data as much as possible. In this paper we present a methodology called as peak-wise smoothing that will smooth the data to any desired level without losing the major peak features.Keywords: smoothing, moving average, peakwise smoothing, spatialdensity models, planar shape models, wavelets.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17507439 A New Precautionary Method for Measurement and Improvement the Data Quality
Authors: Seyed Mohammad Hossein Moossavizadeh, Mehran Mohsenzadeh, Nasrin Arshadi
Abstract:
the data quality is a kind of complex and unstructured concept, which is concerned by information systems managers. The reason of this attention is the high amount of Expenses for maintenance and cleaning of the inefficient data. Such a data more than its expenses of lack of quality, cause wrong statistics, analysis and decisions in organizations. Therefor the managers intend to improve the quality of their information systems' data. One of the basic subjects of quality improvement is the evaluation of the amount of it. In this paper, we present a precautionary method, which with its application the data of information systems would have a better quality. Our method would cover different dimensions of data quality; therefor it has necessary integrity. The presented method has tested on three dimensions of accuracy, value-added and believability and the results confirm the improvement and integrity of this method.
Keywords: Data quality, precaution, information system, measurement, improvement.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14687438 An Efficient Data Mining Approach on Compressed Transactions
Authors: Jia-Yu Dai, Don-Lin Yang, Jungpin Wu, Ming-Chuan Hung
Abstract:
In an era of knowledge explosion, the growth of data increases rapidly day by day. Since data storage is a limited resource, how to reduce the data space in the process becomes a challenge issue. Data compression provides a good solution which can lower the required space. Data mining has many useful applications in recent years because it can help users discover interesting knowledge in large databases. However, existing compression algorithms are not appropriate for data mining. In [1, 2], two different approaches were proposed to compress databases and then perform the data mining process. However, they all lack the ability to decompress the data to their original state and improve the data mining performance. In this research a new approach called Mining Merged Transactions with the Quantification Table (M2TQT) was proposed to solve these problems. M2TQT uses the relationship of transactions to merge related transactions and builds a quantification table to prune the candidate itemsets which are impossible to become frequent in order to improve the performance of mining association rules. The experiments show that M2TQT performs better than existing approaches.Keywords: Association rule, data mining, merged transaction, quantification table.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19607437 Weigh-in-Motion Data Analysis Software for Developing Traffic Data for Mechanistic Empirical Pavement Design
Authors: M. A. Hasan, M. R. Islam, R. A. Tarefder
Abstract:
Currently, there are few user friendly Weigh-in- Motion (WIM) data analysis softwares available which can produce traffic input data for the recently developed AASHTOWare pavement Mechanistic-Empirical (ME) design software. However, these softwares have only rudimentary Quality Control (QC) processes. Therefore, they cannot properly deal with erroneous WIM data. As the pavement performance is highly sensible to the quality of WIM data, it is highly recommended to use more refined QC process on raw WIM data to get a good result. This study develops a userfriendly software, which can produce traffic input for the ME design software. This software takes the raw data (Class and Weight data) collected from the WIM station and processes it with a sophisticated QC procedure. Traffic data such as traffic volume, traffic distribution, axle load spectra, etc. can be obtained from this software; which can directly be used in the ME design software.Keywords: Weigh-in-motion, software, axle load spectra, traffic distribution, AASHTOWare.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18967436 Human Growth Curve Estimation through a Combination of Longitudinal and Cross-sectional Data
Authors: Sedigheh Mirzaei S., Debasis Sengupta
Abstract:
Parametric models have been quite popular for studying human growth, particularly in relation to biological parameters such as peak size velocity and age at peak size velocity. Longitudinal data are generally considered to be vital for fittinga parametric model to individual-specific data, and for studying the distribution of these biological parameters in a human population. However, cross-sectional data are easier to obtain than longitudinal data. In this paper, we present a method of combining longitudinal and cross-sectional data for the purpose of estimating the distribution of the biological parameters. We demonstrate, through simulations in the special case ofthePreece Baines model, how estimates based on longitudinal data can be improved upon by harnessing the information contained in cross-sectional data.We study the extent of improvement for different mixes of the two types of data, and finally illustrate the use of the method through data collected by the Indian Statistical Institute.Keywords: Preece-Baines growth model, MCMC method, Mixed effect model
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21397435 Semantic Support for Hypothesis-Based Research from Smart Environment Monitoring and Analysis Technologies
Authors: T. S. Myers, J. Trevathan
Abstract:
Improvements in the data fusion and data analysis phase of research are imperative due to the exponential growth of sensed data. Currently, there are developments in the Semantic Sensor Web community to explore efficient methods for reuse, correlation and integration of web-based data sets and live data streams. This paper describes the integration of remotely sensed data with web-available static data for use in observational hypothesis testing and the analysis phase of research. The Semantic Reef system combines semantic technologies (e.g., well-defined ontologies and logic systems) with scientific workflows to enable hypothesis-based research. A framework is presented for how the data fusion concepts from the Semantic Reef architecture map to the Smart Environment Monitoring and Analysis Technologies (SEMAT) intelligent sensor network initiative. The data collected via SEMAT and the inferred knowledge from the Semantic Reef system are ingested to the Tropical Data Hub for data discovery, reuse, curation and publication.
Keywords: Information architecture, Semantic technologies Sensor networks, Ontologies.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17157434 Data Migration between Document-Oriented and Relational Databases
Authors: Bogdan Walek, Cyril Klimes
Abstract:
Current tools for data migration between documentoriented and relational databases have several disadvantages. We propose a new approach for data migration between documentoriented and relational databases. During data migration the relational schema of the target (relational database) is automatically created from collection of XML documents. Proposed approach is verified on data migration between document-oriented database IBM Lotus/ Notes Domino and relational database implemented in relational database management system (RDBMS) MySQL.Keywords: data migration, database, document-oriented database, XML, relational schema
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 35257433 Identity Verification Using k-NN Classifiers and Autistic Genetic Data
Authors: Fuad M. Alkoot
Abstract:
DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN).
Keywords: Biometrics, identity verification, genetic data, k-nearest neighbor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11207432 Power Saving System in Green Data Center
Authors: Joon-young Jung, Dong-oh Kang, Chang-seok Bae
Abstract:
Power consumption is rapidly increased in data centers because the number of data center is increased and more the scale of data center become larger. Therefore, it is one of key research items to reduce power consumption in data center. The peak power of a typical server is around 250 watts. When a server is idle, it continues to use around 60% of the power consumed when in use, though vendors are putting effort into reducing this “idle" power load. Servers tend to work at only around a 5% to 20% utilization rate, partly because of response time concerns. An average of 10% of servers in their data centers was unused. In those reason, we propose dynamic power management system to reduce power consumption in green data center. Experiment result shows that about 55% power consumption is reduced at idle time.Keywords: Data Center, Green IT, Management Server, Power Saving.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16287431 MATLAB-Based Graphical User Interface (GUI) for Data Mining as a Tool for Environment Management
Authors: M. Awawdeh, A. Fedi
Abstract:
The application of data mining to environmental monitoring has become crucial for a number of tasks related to emergency management. Over recent years, many tools have been developed for decision support system (DSS) for emergency management. In this article a graphical user interface (GUI) for environmental monitoring system is presented. This interface allows accomplishing (i) data collection and observation and (ii) extraction for data mining. This tool may be the basis for future development along the line of the open source software paradigm.
Keywords: Data Mining, Environmental data, Mathematical Models, Matlab Graphical User Interface.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 47417430 Principal Component Analysis using Singular Value Decomposition of Microarray Data
Authors: Dong Hoon Lim
Abstract:
A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. Principal component analysis(PCA) has been widely used in multivariate data analysis to reduce the dimensionality of the data in order to simplify subsequent analysis and allow for summarization of the data in a parsimonious manner. PCA, which can be implemented via a singular value decomposition(SVD), is useful for analysis of microarray data. For application of PCA using SVD we use the DNA microarray data for the small round blue cell tumors(SRBCT) of childhood by Khan et al.(2001). To decide the number of components which account for sufficient amount of information we draw scree plot. Biplot, a graphic display associated with PCA, reveals important features that exhibit relationship between variables and also the relationship of variables with observations.
Keywords: Principal component analysis, singular value decomposition, microarray data, SRBCT
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 32507429 Hyperspectral Imaging and Nonlinear Fukunaga-Koontz Transform Based Food Inspection
Authors: Hamidullah Binol, Abdullah Bal
Abstract:
Nowadays, food safety is a great public concern; therefore, robust and effective techniques are required for detecting the safety situation of goods. Hyperspectral Imaging (HSI) is an attractive material for researchers to inspect food quality and safety estimation such as meat quality assessment, automated poultry carcass inspection, quality evaluation of fish, bruise detection of apples, quality analysis and grading of citrus fruits, bruise detection of strawberry, visualization of sugar distribution of melons, measuring ripening of tomatoes, defect detection of pickling cucumber, and classification of wheat kernels. HSI can be used to concurrently collect large amounts of spatial and spectral data on the objects being observed. This technique yields with exceptional detection skills, which otherwise cannot be achieved with either imaging or spectroscopy alone. This paper presents a nonlinear technique based on kernel Fukunaga-Koontz transform (KFKT) for detection of fat content in ground meat using HSI. The KFKT which is the nonlinear version of FKT is one of the most effective techniques for solving problems involving two-pattern nature. The conventional FKT method has been improved with kernel machines for increasing the nonlinear discrimination ability and capturing higher order of statistics of data. The proposed approach in this paper aims to segment the fat content of the ground meat by regarding the fat as target class which is tried to be separated from the remaining classes (as clutter). We have applied the KFKT on visible and nearinfrared (VNIR) hyperspectral images of ground meat to determine fat percentage. The experimental studies indicate that the proposed technique produces high detection performance for fat ratio in ground meat.Keywords: Food (Ground meat) inspection, Fukunaga-Koontz transform, hyperspectral imaging, kernel methods.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15007428 Clustering Mixed Data Using Non-normal Regression Tree for Process Monitoring
Authors: Youngji Yoo, Cheong-Sool Park, Jun Seok Kim, Young-Hak Lee, Sung-Shick Kim, Jun-Geol Baek
Abstract:
In the semiconductor manufacturing process, large amounts of data are collected from various sensors of multiple facilities. The collected data from sensors have several different characteristics due to variables such as types of products, former processes and recipes. In general, Statistical Quality Control (SQC) methods assume the normality of the data to detect out-of-control states of processes. Although the collected data have different characteristics, using the data as inputs of SQC will increase variations of data, require wide control limits, and decrease performance to detect outof- control. Therefore, it is necessary to separate similar data groups from mixed data for more accurate process control. In the paper, we propose a regression tree using split algorithm based on Pearson distribution to handle non-normal distribution in parametric method. The regression tree finds similar properties of data from different variables. The experiments using real semiconductor manufacturing process data show improved performance in fault detecting ability.Keywords: Semiconductor, non-normal mixed process data, clustering, Statistical Quality Control (SQC), regression tree, Pearson distribution system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1780