Search results for: training data condensation.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7973

Search results for: training data condensation.

6923 Low Energy Technology for Leachate Valorisation

Authors: Jesús M. Martín, Francisco Corona, Dolores Hidalgo

Abstract:

Landfills present long-term threats to soil, air, groundwater and surface water due to the formation of greenhouse gases (methane gas and carbon dioxide) and leachate from decomposing garbage. The composition of leachate differs from site to site and also within the landfill. The leachates alter with time (from weeks to years) since the landfilled waste is biologically highly active and their composition varies. Mainly, the composition of the leachate depends on factors such as characteristics of the waste, the moisture content, climatic conditions, degree of compaction and the age of the landfill. Therefore, the leachate composition cannot be generalized and the traditional treatment models should be adapted in each case. Although leachate composition is highly variable, what different leachates have in common is hazardous constituents and their potential eco-toxicological effects on human health and on terrestrial ecosystems. Since leachate has distinct compositions, each landfill or dumping site would represent a different type of risk on its environment. Nevertheless, leachates consist always of high organic concentration, conductivity, heavy metals and ammonia nitrogen. Leachate could affect the current and future quality of water bodies due to uncontrolled infiltrations. Therefore, control and treatment of leachate is one of the biggest issues in urban solid waste treatment plants and landfills design and management. This work presents a treatment model that will be carried out "in-situ" using a cost-effective novel technology that combines solar evaporation/condensation plus forward osmosis. The plant is powered by renewable energies (solar energy, biomass and residual heat), which will minimize the carbon footprint of the process. The final effluent quality is very high, allowing reuse (preferred) or discharge into watercourses. In the particular case of this work, the final effluents will be reused for cleaning and gardening purposes. A minority semi-solid residual stream is also generated in the process. Due to its special composition (rich in metals and inorganic elements), this stream will be valorized in ceramic industries to improve the final products characteristics.

Keywords: Forward osmosis, landfills, leachate valorization, solar evaporation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 920
6922 Web Search Engine Based Naming Procedure for Independent Topic

Authors: Takahiro Nishigaki, Takashi Onoda

Abstract:

In recent years, the number of document data has been increasing since the spread of the Internet. Many methods have been studied for extracting topics from large document data. We proposed Independent Topic Analysis (ITA) to extract topics independent of each other from large document data such as newspaper data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis. The topic represented by ITA is represented by a set of words. However, the set of words is quite different from the topics the user imagines. For example, the top five words with high independence of a topic are as follows. Topic1 = {"scor", "game", "lead", "quarter", "rebound"}. This Topic 1 is considered to represent the topic of "SPORTS". This topic name "SPORTS" has to be attached by the user. ITA cannot name topics. Therefore, in this research, we propose a method to obtain topics easy for people to understand by using the web search engine, topics given by the set of words given by independent topic analysis. In particular, we search a set of topical words, and the title of the homepage of the search result is taken as the topic name. And we also use the proposed method for some data and verify its effectiveness.

Keywords: Independent topic analysis, topic extraction, topic naming, web search engine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 468
6921 Joint Training Offer Selection and Course Timetabling Problems: Models and Algorithms

Authors: Gianpaolo Ghiani, Emanuela Guerriero, Emanuele Manni, Alessandro Romano

Abstract:

In this article, we deal with a variant of the classical course timetabling problem that has a practical application in many areas of education. In particular, in this paper we are interested in high schools remedial courses. The purpose of such courses is to provide under-prepared students with the skills necessary to succeed in their studies. In particular, a student might be under prepared in an entire course, or only in a part of it. The limited availability of funds, as well as the limited amount of time and teachers at disposal, often requires schools to choose which courses and/or which teaching units to activate. Thus, schools need to model the training offer and the related timetabling, with the goal of ensuring the highest possible teaching quality, by meeting the above-mentioned financial, time and resources constraints. Moreover, there are some prerequisites between the teaching units that must be satisfied. We first present a Mixed-Integer Programming (MIP) model to solve this problem to optimality. However, the presence of many peculiar constraints contributes inevitably in increasing the complexity of the mathematical model. Thus, solving it through a general-purpose solver may be performed for small instances only, while solving real-life-sized instances of such model requires specific techniques or heuristic approaches. For this purpose, we also propose a heuristic approach, in which we make use of a fast constructive procedure to obtain a feasible solution. To assess our exact and heuristic approaches we perform extensive computational results on both real-life instances (obtained from a high school in Lecce, Italy) and randomly generated instances. Our tests show that the MIP model is never solved to optimality, with an average optimality gap of 57%. On the other hand, the heuristic algorithm is much faster (in about the 50% of the considered instances it converges in approximately half of the time limit) and in many cases allows achieving an improvement on the objective function value obtained by the MIP model. Such an improvement ranges between 18% and 66%.

Keywords: Heuristic, MIP model, Remedial course, School, Timetabling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1611
6920 Decision Tree for Competing Risks Survival Probability in Breast Cancer Study

Authors: N. A. Ibrahim, A. Kudus, I. Daud, M. R. Abu Bakar

Abstract:

Competing risks survival data that comprises of more than one type of event has been used in many applications, and one of these is in clinical study (e.g. in breast cancer study). The decision tree method can be extended to competing risks survival data by modifying the split function so as to accommodate two or more risks which might be dependent on each other. Recently, researchers have constructed some decision trees for recurrent survival time data using frailty and marginal modelling. We further extended the method for the case of competing risks. In this paper, we developed the decision tree method for competing risks survival time data based on proportional hazards for subdistribution of competing risks. In particular, we grow a tree by using deviance statistic. The application of breast cancer data is presented. Finally, to investigate the performance of the proposed method, simulation studies on identification of true group of observations were executed.

Keywords: Competing risks, Decision tree, Simulation, Subdistribution Proportional Hazard.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2351
6919 Spatially Random Sampling for Retail Food Risk Factors Study

Authors: Guilan Huang

Abstract:

In 2013 and 2014, the U.S. Food and Drug Administration (FDA) collected data from selected fast food restaurants and full service restaurants for tracking changes in the occurrence of foodborne illness risk factors. This paper discussed how we customized spatial random sampling method by considering financial position and availability of FDA resources, and how we enriched restaurants data with location. Location information of restaurants provides opportunity for quantitatively determining random sampling within non-government units (e.g.: 240 kilometers around each data-collector). Spatial analysis also could optimize data-collectors’ work plans and resource allocation. Spatial analytic and processing platform helped us handling the spatial random sampling challenges. Our method fits in FDA’s ability to pinpoint features of foodservice establishments, and reduced both time and expense on data collection.

Keywords: Geospatial technology, restaurant, retail food risk factors study, spatial random sampling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1431
6918 Anisotropic Total Fractional Order Variation Model in Seismic Data Denoising

Authors: Jianwei Ma, Diriba Gemechu

Abstract:

In seismic data processing, attenuation of random noise is the basic step to improve quality of data for further application of seismic data in exploration and development in different gas and oil industries. The signal-to-noise ratio of the data also highly determines quality of seismic data. This factor affects the reliability as well as the accuracy of seismic signal during interpretation for different purposes in different companies. To use seismic data for further application and interpretation, we need to improve the signal-to-noise ration while attenuating random noise effectively. To improve the signal-to-noise ration and attenuating seismic random noise by preserving important features and information about seismic signals, we introduce the concept of anisotropic total fractional order denoising algorithm. The anisotropic total fractional order variation model defined in fractional order bounded variation is proposed as a regularization in seismic denoising. The split Bregman algorithm is employed to solve the minimization problem of the anisotropic total fractional order variation model and the corresponding denoising algorithm for the proposed method is derived. We test the effectiveness of theproposed method for synthetic and real seismic data sets and the denoised result is compared with F-X deconvolution and non-local means denoising algorithm.

Keywords: Anisotropic total fractional order variation, fractional order bounded variation, seismic random noise attenuation, Split Bregman Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 979
6917 Improving Academic Performance Prediction using Voting Technique in Data Mining

Authors: Ikmal Hisyam Mohamad Paris, Lilly Suriani Affendey, Norwati Mustapha

Abstract:

In this paper we compare the accuracy of data mining methods to classifying students in order to predicting student-s class grade. These predictions are more useful for identifying weak students and assisting management to take remedial measures at early stages to produce excellent graduate that will graduate at least with second class upper. Firstly we examine single classifiers accuracy on our data set and choose the best one and then ensembles it with a weak classifier to produce simple voting method. We present results show that combining different classifiers outperformed other single classifiers for predicting student performance.

Keywords: Classification, Data Mining, Prediction, Combination of Multiple Classifiers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2733
6916 Extracting Terrain Points from Airborne Laser Scanning Data in Densely Forested Areas

Authors: Ziad Abdeldayem, Jakub Markiewicz, Kunal Kansara, Laura Edwards

Abstract:

Airborne Laser Scanning (ALS) is one of the main technologies for generating high-resolution digital terrain models (DTMs). DTMs are crucial to several applications, such as topographic mapping, flood zone delineation, geographic information systems (GIS), hydrological modelling, spatial analysis, etc. Laser scanning system generates irregularly spaced three-dimensional cloud of points. Raw ALS data are mainly ground points (that represent the bare earth) and non-ground points (that represent buildings, trees, cars, etc.). Removing all the non-ground points from the raw data is referred to as filtering. Filtering heavily forested areas is considered a difficult and challenging task as the canopy stops laser pulses from reaching the terrain surface. This research presents an approach for removing non-ground points from raw ALS data in densely forested areas. Smoothing splines are exploited to interpolate and fit the noisy ALS data. The presented filter utilizes a weight function to allocate weights for each point of the data. Furthermore, unlike most of the methods, the presented filtering algorithm is designed to be automatic. Three different forested areas in the United Kingdom are used to assess the performance of the algorithm. The results show that the generated DTMs from the filtered data are accurate (when compared against reference terrain data) and the performance of the method is stable for all the heavily forested data samples. The average root mean square error (RMSE) value is 0.35 m.

Keywords: Airborne laser scanning, digital terrain models, filtering, forested areas.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 693
6915 Designing an Integrated Platform for Real-Time Recommendations Sharing among the Aged and People Living with Cancer

Authors: Adekunle O. Afolabi, Pekka Toivanen

Abstract:

The world is expected to experience growth in the number of ageing population, and this will bring about high cost of providing care for these valuable citizens. In addition, many of these live with chronic diseases that come with old age. Providing adequate care in the face of rising costs and dwindling personnel can be challenging. However, advances in technologies and emergence of the Internet of Things are providing a way to address these challenges while improving care giving. This study proposes the integration of recommendation systems into homecare to provide real-time recommendations for effective management of people receiving care at home and those living with chronic diseases. Using the simplified Training Logic Concept, stakeholders and requirements were identified. Specific requirements were gathered from people living with cancer. The solution designed has two components namely home and community, to enhance recommendations sharing for effective care giving. The community component of the design was implemented with the development of a mobile app called Recommendations Sharing Community for Aged and Chronically Ill People (ReSCAP). This component has illustrated the possibility of real-time recommendations, improved recommendations sharing among care receivers and between a physician and care receivers. Full implementation will increase access to health data for better care decision making.

Keywords: Recommendation systems, healthcare, internet of things, real-time, homecare.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 911
6914 Blockchain for IoT Security and Privacy in Healthcare Sector

Authors: Umair Shafique, Hafiz Usman Zia, Fiaz Majeed, Samina Naz, Javeria Ahmed, Maleeha Zainab

Abstract:

The Internet of Things (IoT) has become a hot topic for the last couple of years. This innovative technology has shown promising progress in various areas and the world has witnessed exponential growth in multiple application domains. Researchers are working to investigate its aptitudes to get the best from it by harnessing its true potential. But at the same time, IoT networks open up a new aspect of vulnerability and physical threats to data integrity, privacy, and confidentiality. It is due to centralized control, data silos approach for handling information, and a lack of standardization in the IoT networks. As we know, blockchain is a new technology that involves creating secure distributed ledgers to store and communicate data. Some of the benefits include resiliency, integrity, anonymity, decentralization, and autonomous control. The potential for blockchain technology to provide the key to managing and controlling IoT has created a new wave of excitement around the idea of putting that data back into the hands of the end-users. In this manuscript, we have proposed a model that combines blockchain and IoT networks to address potential security and privacy issues in the healthcare domain and how various stakeholders will interact with the system.

Keywords: Internet of Things, IoT, blockchain, data integrity, authentication, data privacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 359
6913 Multidimensional Performance Management

Authors: David Wiese

Abstract:

In order to maximize efficiency of an information management platform and to assist in decision making, the collection, storage and analysis of performance-relevant data has become of fundamental importance. This paper addresses the merits and drawbacks provided by the OLAP paradigm for efficiently navigating large volumes of performance measurement data hierarchically. The system managers or database administrators navigate through adequately (re)structured measurement data aiming to detect performance bottlenecks, identify causes for performance problems or assessing the impact of configuration changes on the system and its representative metrics. Of particular importance is finding the root cause of an imminent problem, threatening availability and performance of an information system. Leveraging OLAP techniques, in contrast to traditional static reporting, this is supposed to be accomplished within moderate amount of time and little processing complexity. It is shown how OLAP techniques can help improve understandability and manageability of measurement data and, hence, improve the whole Performance Analysis process.

Keywords: Data Warehousing, OLAP, Multidimensional Navigation, Performance Diagnosis, Performance Management, Performance Tuning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2110
6912 A New Algorithm for Enhanced Robustness of Copyright Mark

Authors: Harsh Vikram Singh, S. P. Singh, Anand Mohan

Abstract:

This paper discusses a new heavy tailed distribution based data hiding into discrete cosine transform (DCT) coefficients of image, which provides statistical security as well as robustness against steganalysis attacks. Unlike other data hiding algorithms, the proposed technique does not introduce much effect in the stegoimage-s DCT coefficient probability plots, thus making the presence of hidden data statistically undetectable. In addition the proposed method does not compromise on hiding capacity. When compared to the generic block DCT based data-hiding scheme, our method found more robust against a variety of image manipulating attacks such as filtering, blurring, JPEG compression etc.

Keywords: Information Security, Robust Steganography, Steganalysis, Pareto Probability Distribution function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1774
6911 Liveability of Kuala Lumpur City Centre: An Evaluation of the Happiness Level of the Streets- Activities

Authors: Shuhana Shamsuddin, Nur Rasyiqah Abu Hassan, Ahmad Bashri Sulaiman

Abstract:

Liveable city is referred to as the quality of life in an area that contributes towards a safe, healthy and enjoyable place. This paper discusses the role of the streets- activities in making Kuala Lumpur a liveable city and the happiness level of the residents towards the city-s street activities. The study was conducted using the residents of Kuala Lumpur. A mixed method technique is used with the quantitative data as a main data and supported by the qualitative data. Data were collected using questionnaires, observation and also an interview session with a sample of residents of Kuala Lumpur. The sampling technique is based on multistage cluster data sampling. The findings revealed that, there is still no significant relationship between the length of stay of the resident in Kuala Lumpur with the happiness level towards the street activities that occurred in the city.

Keywords: Liveable city, activities, urban design quality, quality of life, happiness level.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2866
6910 Cloud Computing Support for Diagnosing Researches

Authors: A. Amirov, O. Gerget, V. Kochegurov

Abstract:

One of the main biomedical problem lies in detecting dependencies in semi structured data. Solution includes biomedical portal and algorithms (integral rating health criteria, multidimensional data visualization methods). Biomedical portal allows to process diagnostic and research data in parallel mode using Microsoft System Center 2012, Windows HPC Server cloud technologies. Service does not allow user to see internal calculations instead it provides practical interface. When data is sent for processing user may track status of task and will achieve results as soon as computation is completed. Service includes own algorithms and allows diagnosing and predicating medical cases. Approved methods are based on complex system entropy methods, algorithms for determining the energy patterns of development and trajectory models of biological systems and logical–probabilistic approach with the blurring of images.

Keywords: Biomedical portal, cloud computing, diagnostic and prognostic research, mathematical data analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1596
6909 Non-Invasive Data Extraction from Machine Display Units Using Video Analytics

Authors: Ravneet Kaur, Joydeep Acharya, Sudhanshu Gaur

Abstract:

Artificial Intelligence (AI) has the potential to transform manufacturing by improving shop floor processes such as production, maintenance and quality. However, industrial datasets are notoriously difficult to extract in a real-time, streaming fashion thus, negating potential AI benefits. The main example is some specialized industrial controllers that are operated by custom software which complicates the process of connecting them to an Information Technology (IT) based data acquisition network. Security concerns may also limit direct physical access to these controllers for data acquisition. To connect the Operational Technology (OT) data stored in these controllers to an AI application in a secure, reliable and available way, we propose a novel Industrial IoT (IIoT) solution in this paper. In this solution, we demonstrate how video cameras can be installed in a factory shop floor to continuously obtain images of the controller HMIs. We propose image pre-processing to segment the HMI into regions of streaming data and regions of fixed meta-data. We then evaluate the performance of multiple Optical Character Recognition (OCR) technologies such as Tesseract and Google vision to recognize the streaming data and test it for typical factory HMIs and realistic lighting conditions. Finally, we use the meta-data to match the OCR output with the temporal, domain-dependent context of the data to improve the accuracy of the output. Our IIoT solution enables reliable and efficient data extraction which will improve the performance of subsequent AI applications.

Keywords: Human machine interface, industrial internet of things, internet of things, optical character recognition, video analytic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 707
6908 Estimating the Flow Velocity Using Flow Generated Sound

Authors: Saeed Hosseini, Ali Reza Tahavvor

Abstract:

Sound processing is one the subjects that newly attracts a lot of researchers. It is efficient and usually less expensive than other methods. In this paper the flow generated sound is used to estimate the flow speed of free flows. Many sound samples are gathered. After analyzing the data, a parameter named wave power is chosen. For all samples the wave power is calculated and averaged for each flow speed. A curve is fitted to the averaged data and a correlation between the wave power and flow speed is found. Test data are used to validate the method and errors for all test data were under 10 percent. The speed of the flow can be estimated by calculating the wave power of the flow generated sound and using the proposed correlation.

Keywords: Flow generated sound, sound processing, speed, wave power.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2351
6907 Exponential Particle Swarm Optimization Approach for Improving Data Clustering

Authors: Neveen I. Ghali, Nahed El-Dessouki, Mervat A. N., Lamiaa Bakrawi

Abstract:

In this paper we use exponential particle swarm optimization (EPSO) to cluster data. Then we compare between (EPSO) clustering algorithm which depends on exponential variation for the inertia weight and particle swarm optimization (PSO) clustering algorithm which depends on linear inertia weight. This comparison is evaluated on five data sets. The experimental results show that EPSO clustering algorithm increases the possibility to find the optimal positions as it decrease the number of failure. Also show that (EPSO) clustering algorithm has a smaller quantization error than (PSO) clustering algorithm, i.e. (EPSO) clustering algorithm more accurate than (PSO) clustering algorithm.

Keywords: Particle swarm optimization, data clustering, exponential PSO.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1663
6906 Biosignal Measurement using Personal Area Network based on Human Body Communication

Authors: Yong-Gyu Lee, Jin-Hee Park, Gilwon Yoon

Abstract:

In this study, we introduced a communication system where human body was used as medium through which data were transferred. Multiple biosignal sensing units were attached to a subject and wireless personal area network was formed. Data of the sensing units were shared among them. We used wideband pulse communication that was simple, low-power consuming and high data rated. Each unit functioned as independent communication device or node. A method of channel search and communication among the modes was developed. A protocol of carrier sense multiple access/collision detect was implemented in order to avoid data collision or interferences. Biosignal sensing units should be located at different locations due to the nature of biosignal origin. Our research provided a flexibility of collecting data without using electrical wires. More non-constrained measurement was accomplished which was more suitable for u-Health monitoring.

Keywords: Human body communication, wideband pulse communication, personal area network, biosignal.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2157
6905 Development and Evaluation of a Portable Ammonia Gas Detector

Authors: Jaheon Gu, Wooyong Chung, Mijung Koo, Seonbok Lee, Gyoutae Park, Sangguk Ahn, Hiesik Kim, Jungil Park

Abstract:

In this paper, we present a portable ammonia gas detector for performing the gas safety management efficiently. The display of the detector is separated from its body. The display module is received the data measured from the detector using ZigBee. The detector has a rechargeable li-ion battery which can be use for 11~12 hours, and a Bluetooth module for sending the data to the PC or the smart devices. The data are sent to the server and can access using the web browser or mobile application. The range of the detection concentration is 0~100ppm.

Keywords: Ammonia, detector, gas safety, portable.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1513
6904 LAYMOD; A Layered and Modular Platform for CAx Collaboration Management and Supporting Product data Integration based on STEP Standard

Authors: Omid F. Valilai, Mahmoud Houshmand

Abstract:

Nowadays companies strive to survive in a competitive global environment. To speed up product development/modifications, it is suggested to adopt a collaborative product development approach. However, despite the advantages of new IT improvements still many CAx systems work separately and locally. Collaborative design and manufacture requires a product information model that supports related CAx product data models. To solve this problem many solutions are proposed, which the most successful one is adopting the STEP standard as a product data model to develop a collaborative CAx platform. However, the improvement of the STEP-s Application Protocols (APs) over the time, huge number of STEP AP-s and cc-s, the high costs of implementation, costly process for conversion of older CAx software files to the STEP neutral file format; and lack of STEP knowledge, that usually slows down the implementation of the STEP standard in collaborative data exchange, management and integration should be considered. In this paper the requirements for a successful collaborative CAx system is discussed. The STEP standard capability for product data integration and its shortcomings as well as the dominant platforms for supporting CAx collaboration management and product data integration are reviewed. Finally a platform named LAYMOD to fulfil the requirements of CAx collaborative environment and integrating the product data is proposed. The platform is a layered platform to enable global collaboration among different CAx software packages/developers. It also adopts the STEP modular architecture and the XML data structures to enable collaboration between CAx software packages as well as overcoming the STEP standard limitations. The architecture and procedures of LAYMOD platform to manage collaboration and avoid contradicts in product data integration are introduced.

Keywords: CAx, Collaboration management, STEP applicationmodules, STEP standard, XML data structures

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2200
6903 Grocery Customer Behavior Analysis using RFID-based Shopping Paths Data

Authors: In-Chul Jung, Young S. Kwon

Abstract:

Knowing about the customer behavior in a grocery has been a long-standing issue in the retailing industry. The advent of RFID has made it easier to collect moving data for an individual shopper's behavior. Most of the previous studies used the traditional statistical clustering technique to find the major characteristics of customer behavior, especially shopping path. However, in using the clustering technique, due to various spatial constraints in the store, standard clustering methods are not feasible because moving data such as the shopping path should be adjusted in advance of the analysis, which is time-consuming and causes data distortion. To alleviate this problem, we propose a new approach to spatial pattern clustering based on the longest common subsequence. Experimental results using real data obtained from a grocery confirm the good performance of the proposed method in finding the hot spot, dead spot and major path patterns of customer movements.

Keywords: customer path, shopping behavior, exploratoryanalysis, LCS, RFID

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3119
6902 Tool for Metadata Extraction and Content Packaging as Endorsed in OAIS Framework

Authors: Payal Abichandani, Rishi Prakash, Paras Nath Barwal, B. K. Murthy

Abstract:

Information generated from various computerization processes is a potential rich source of knowledge for its designated community. To pass this information from generation to generation without modifying the meaning is a challenging activity. To preserve and archive the data for future generations it’s very essential to prove the authenticity of the data. It can be achieved by extracting the metadata from the data which can prove the authenticity and create trust on the archived data. Subsequent challenge is the technology obsolescence. Metadata extraction and standardization can be effectively used to resolve and tackle this problem. Metadata can be categorized at two levels i.e. Technical and Domain level broadly. Technical metadata will provide the information that can be used to understand and interpret the data record, but only this level of metadata isn’t sufficient to create trustworthiness. We have developed a tool which will extract and standardize the technical as well as domain level metadata. This paper is about the different features of the tool and how we have developed this.  

Keywords: Digital Preservation, Metadata, OAIS, PDI, XML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1799
6901 Application of a New Hybrid Optimization Algorithm on Cluster Analysis

Authors: T. Niknam, M. Nayeripour, B.Bahmani Firouzi

Abstract:

Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.

Keywords: Ant Colony Optimization (ACO), Data clustering, Hybrid evolutionary optimization algorithm, K-means clustering, Particle Swarm Optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2180
6900 Analysis of DNA Microarray Data using Association Rules: A Selective Study

Authors: M. Anandhavalli Gauthaman

Abstract:

DNA microarrays allow the measurement of expression levels for a large number of genes, perhaps all genes of an organism, within a number of different experimental samples. It is very much important to extract biologically meaningful information from this huge amount of expression data to know the current state of the cell because most cellular processes are regulated by changes in gene expression. Association rule mining techniques are helpful to find association relationship between genes. Numerous association rule mining algorithms have been developed to analyze and associate this huge amount of gene expression data. This paper focuses on some of the popular association rule mining algorithms developed to analyze gene expression data.

Keywords: DNA microarray, gene expression, association rule mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2117
6899 Prospects, Problems of Marketing Research and Data Mining in Turkey

Authors: Sema Kurtuluş, Kemal Kurtuluş

Abstract:

The objective of this paper is to review and assess the methodological issues and problems in marketing research, data and knowledge mining in Turkey. As a summary, academic marketing research publications in Turkey have significant problems. The most vital problem seems to be related with modeling. Most of the publications had major weaknesses in modeling. There were also, serious problems regarding measurement and scaling, sampling and analyses. Analyses myopia seems to be the most important problem for young academia in Turkey. Another very important finding is the lack of publications on data and knowledge mining in the academic world.

Keywords: Marketing research, data mining, knowledge mining, research modeling, analyses.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1947
6898 Analysis and Comparison of Image Encryption Algorithms

Authors: İsmet Öztürk, İbrahim Soğukpınar

Abstract:

With the fast progression of data exchange in electronic way, information security is becoming more important in data storage and transmission. Because of widely using images in industrial process, it is important to protect the confidential image data from unauthorized access. In this paper, we analyzed current image encryption algorithms and compression is added for two of them (Mirror-like image encryption and Visual Cryptography). Implementations of these two algorithms have been realized for experimental purposes. The results of analysis are given in this paper.

Keywords: image encryption, image cryptosystem, security, transmission

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4926
6897 Risk Classification of SMEs by Early Warning Model Based on Data Mining

Authors: Nermin Ozgulbas, Ali Serhan Koyuncugil

Abstract:

One of the biggest problems of SMEs is their tendencies to financial distress because of insufficient finance background. In this study, an Early Warning System (EWS) model based on data mining for financial risk detection is presented. CHAID algorithm has been used for development of the EWS. Developed EWS can be served like a tailor made financial advisor in decision making process of the firms with its automated nature to the ones who have inadequate financial background. Besides, an application of the model implemented which covered 7,853 SMEs based on Turkish Central Bank (TCB) 2007 data. By using EWS model, 31 risk profiles, 15 risk indicators, 2 early warning signals, and 4 financial road maps has been determined for financial risk mitigation.

Keywords: Early Warning Systems, Data Mining, Financial Risk, SMEs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3360
6896 Using Data from Foursquare Web Service to Represent the Commercial Activity of a City

Authors: Taras Agryzkov, Almudena Nolasco-Cirugeda, Jos´e L. Oliver, Leticia Serrano-Estrada, Leandro Tortosa, Jos´e F. Vicent

Abstract:

This paper aims to represent the commercial activity of a city taking as source data the social network Foursquare. The city of Murcia is selected as case study, and the location-based social network Foursquare is the main source of information. After carrying out a reorganisation of the user-generated data extracted from Foursquare, it is possible to graphically display on a map the various city spaces and venues especially those related to commercial, food and entertainment sector businesses. The obtained visualisation provides information about activity patterns in the city of Murcia according to the people‘s interests and preferences and, moreover, interesting facts about certain characteristics of the town itself.

Keywords: Social networks, Foursquare, spatial analysis, data visualization, geocomputation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2653
6895 Long-Range Dependence of Financial Time Series Data

Authors: Chatchai Pesee

Abstract:

This paper examines long-range dependence or longmemory of financial time series on the exchange rate data by the fractional Brownian motion (fBm). The principle of spectral density function in Section 2 is used to find the range of Hurst parameter (H) of the fBm. If 0< H <1/2, then it has a short-range dependence (SRD). It simulates long-memory or long-range dependence (LRD) if 1/2< H <1. The curve of exchange rate data is fBm because of the specific appearance of the Hurst parameter (H). Furthermore, some of the definitions of the fBm, long-range dependence and selfsimilarity are reviewed in Section II as well. Our results indicate that there exists a long-memory or a long-range dependence (LRD) for the exchange rate data in section III. Long-range dependence of the exchange rate data and estimation of the Hurst parameter (H) are discussed in Section IV, while a conclusion is discussed in Section V.

Keywords: Fractional Brownian motion, long-rangedependence, memory, short-range dependence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1862
6894 Meta Random Forests

Authors: Praveen Boinee, Alessandro De Angelis, Gian Luca Foresti

Abstract:

Leo Breimans Random Forests (RF) is a recent development in tree based classifiers and quickly proven to be one of the most important algorithms in the machine learning literature. It has shown robust and improved results of classifications on standard data sets. Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets with mainly decision trees as their base classifiers. In this paper we experiment to apply these Meta learning techniques to the random forests. We experiment the working of the ensembles of random forests on the standard data sets available in UCI data sets. We compare the original random forest algorithm with their ensemble counterparts and discuss the results.

Keywords: Random Forests [RF], ensembles, UCI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2671