Search results for: data frames
7320 Comprehensive Analysis of Data Mining Tools
Authors: S. Sarumathi, N. Shanthi
Abstract:
Due to the fast and flawless technological innovation there is a tremendous amount of data dumping all over the world in every domain such as Pattern Recognition, Machine Learning, Spatial Data Mining, Image Analysis, Fraudulent Analysis, World Wide Web etc., This issue turns to be more essential for developing several tools for data mining functionalities. The major aim of this paper is to analyze various tools which are used to build a resourceful analytical or descriptive model for handling large amount of information more efficiently and user friendly. In this survey the diverse tools are illustrated with their extensive technical paradigm, outstanding graphical interface and inbuilt multipath algorithms in which it is very useful for handling significant amount of data more indeed.
Keywords: Classification, Clustering, Data Mining, Machine learning, Visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24397319 A Prediction of Attractive Evaluation Objects Based On Complex Sequential Data
Authors: Shigeaki Sakurai, Makino Kyoko, Shigeru Matsumoto
Abstract:
This paper proposes a method that predicts attractive evaluation objects. In the learning phase, the method inductively acquires trend rules from complex sequential data. The data is composed of two types of data. One is numerical sequential data. Each evaluation object has respective numerical sequential data. The other is text sequential data. Each evaluation object is described in texts. The trend rules represent changes of numerical values related to evaluation objects. In the prediction phase, the method applies new text sequential data to the trend rules and evaluates which evaluation objects are attractive. This paper verifies the effect of the proposed method by using stock price sequences and news headline sequences. In these sequences, each stock brand corresponds to an evaluation object. This paper discusses validity of predicted attractive evaluation objects, the process time of each phase, and the possibility of application tasks.
Keywords: Trend rule, frequent pattern, numerical sequential data, text sequential data, evaluation object.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12357318 Methods for Distinction of Cattle Using Supervised Learning
Authors: Radoslav Židek, Veronika Šidlová, Radovan Kasarda, Birgit Fuerst-Waltl
Abstract:
Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.
Keywords: Genetic data, Pinzgau cattle, supervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23187317 A Comparative Study of Fine Grained Security Techniques Based on Data Accessibility and Inference
Authors: Azhar Rauf, Sareer Badshah, Shah Khusro
Abstract:
This paper analyzes different techniques of the fine grained security of relational databases for the two variables-data accessibility and inference. Data accessibility measures the amount of data available to the users after applying a security technique on a table. Inference is the proportion of information leakage after suppressing a cell containing secret data. A row containing a secret cell which is suppressed can become a security threat if an intruder generates useful information from the related visible information of the same row. This paper measures data accessibility and inference associated with row, cell, and column level security techniques. Cell level security offers greatest data accessibility as it suppresses secret data only. But on the other hand, there is a high probability of inference in cell level security. Row and column level security techniques have least data accessibility and inference. This paper introduces cell plus innocent security technique that utilizes the cell level security method but suppresses some innocent data to dodge an intruder that a suppressed cell may not necessarily contain secret data. Four variations of the technique namely cell plus innocent 1/4, cell plus innocent 2/4, cell plus innocent 3/4, and cell plus innocent 4/4 respectively have been introduced to suppress innocent data equal to 1/4, 2/4, 3/4, and 4/4 percent of the true secret data inside the database. Results show that the new technique offers better control over data accessibility and inference as compared to the state-of-theart security techniques. This paper further discusses the combination of techniques together to be used. The paper shows that cell plus innocent 1/4, 2/4, and 3/4 techniques can be used as a replacement for the cell level security.
Keywords: Fine Grained Security, Data Accessibility, Inference, Row, Cell, Column Level Security.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14717316 Weka Based Desktop Data Mining as Web Service
Authors: Sujala.D.Shetty, S.Vadivel, Sakshi Vaghella
Abstract:
Data mining is the process of sifting through large volumes of data, analyzing data from different perspectives and summarizing it into useful information. One of the widely used desktop applications for data mining is the Weka tool which is nothing but a collection of machine learning algorithms implemented in Java and open sourced under the General Public License (GPL). A web service is a software system designed to support interoperable machine to machine interaction over a network using SOAP messages. Unlike a desktop application, a web service is easy to upgrade, deliver and access and does not occupy any memory on the system. Keeping in mind the advantages of a web service over a desktop application, in this paper we are demonstrating how this Java based desktop data mining application can be implemented as a web service to support data mining across the internet.Keywords: desktop application, Weka mining, web service
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 40817315 Influence of Parameters of Modeling and Data Distribution for Optimal Condition on Locally Weighted Projection Regression Method
Authors: Farhad Asadi, Mohammad Javad Mollakazemi, Aref Ghafouri
Abstract:
Recent research in neural networks science and neuroscience for modeling complex time series data and statistical learning has focused mostly on learning from high input space and signals. Local linear models are a strong choice for modeling local nonlinearity in data series. Locally weighted projection regression is a flexible and powerful algorithm for nonlinear approximation in high dimensional signal spaces. In this paper, different learning scenario of one and two dimensional data series with different distributions are investigated for simulation and further noise is inputted to data distribution for making different disordered distribution in time series data and for evaluation of algorithm in locality prediction of nonlinearity. Then, the performance of this algorithm is simulated and also when the distribution of data is high or when the number of data is less the sensitivity of this approach to data distribution and influence of important parameter of local validity in this algorithm with different data distribution is explained.
Keywords: Local nonlinear estimation, LWPR algorithm, Online training method.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16017314 Noise Reduction in Web Data: A Learning Approach Based on Dynamic User Interests
Authors: Julius Onyancha, Valentina Plekhanova
Abstract:
One of the significant issues facing web users is the amount of noise in web data which hinders the process of finding useful information in relation to their dynamic interests. Current research works consider noise as any data that does not form part of the main web page and propose noise web data reduction tools which mainly focus on eliminating noise in relation to the content and layout of web data. This paper argues that not all data that form part of the main web page is of a user interest and not all noise data is actually noise to a given user. Therefore, learning of noise web data allocated to the user requests ensures not only reduction of noisiness level in a web user profile, but also a decrease in the loss of useful information hence improves the quality of a web user profile. Noise Web Data Learning (NWDL) tool/algorithm capable of learning noise web data in web user profile is proposed. The proposed work considers elimination of noise data in relation to dynamic user interest. In order to validate the performance of the proposed work, an experimental design setup is presented. The results obtained are compared with the current algorithms applied in noise web data reduction process. The experimental results show that the proposed work considers the dynamic change of user interest prior to elimination of noise data. The proposed work contributes towards improving the quality of a web user profile by reducing the amount of useful information eliminated as noise.Keywords: Web log data, web user profile, user interest, noise web data learning, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17347313 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning
Authors: Walid Cherif
Abstract:
Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.
Keywords: Data mining, knowledge discovery, machine learning, similarity measurement, supervised classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15297312 Moving Data Mining Tools toward a Business Intelligence System
Authors: Nittaya Kerdprasop, Kittisak Kerdprasop
Abstract:
Data mining (DM) is the process of finding and extracting frequent patterns that can describe the data, or predict unknown or future values. These goals are achieved by using various learning algorithms. Each algorithm may produce a mining result completely different from the others. Some algorithms may find millions of patterns. It is thus the difficult job for data analysts to select appropriate models and interpret the discovered knowledge. In this paper, we describe a framework of an intelligent and complete data mining system called SUT-Miner. Our system is comprised of a full complement of major DM algorithms, pre-DM and post-DM functionalities. It is the post-DM packages that ease the DM deployment for business intelligence applications.Keywords: Business intelligence, data mining, functionalprogramming, intelligent system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17437311 Analysis of Diverse Clustering Tools in Data Mining
Authors: S. Sarumathi, N. Shanthi, M. Sharmila
Abstract:
Clustering in data mining is an unsupervised learning technique of aggregating the data objects into meaningful groups such that the intra cluster similarity of objects are maximized and inter cluster similarity of objects are minimized. Over the past decades several clustering tools were emerged in which clustering algorithms are inbuilt and are easier to use and extract the expected results. Data mining mainly deals with the huge databases that inflicts on cluster analysis and additional rigorous computational constraints. These challenges pave the way for the emergence of powerful expansive data mining clustering softwares. In this survey, a variety of clustering tools used in data mining are elucidated along with the pros and cons of each software.
Keywords: Cluster Analysis, Clustering Algorithms, Clustering Techniques, Association, Visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22027310 A Monte Carlo Method to Data Stream Analysis
Authors: Kittisak Kerdprasop, Nittaya Kerdprasop, Pairote Sattayatham
Abstract:
Data stream analysis is the process of computing various summaries and derived values from large amounts of data which are continuously generated at a rapid rate. The nature of a stream does not allow a revisit on each data element. Furthermore, data processing must be fast to produce timely analysis results. These requirements impose constraints on the design of the algorithms to balance correctness against timely responses. Several techniques have been proposed over the past few years to address these challenges. These techniques can be categorized as either dataoriented or task-oriented. The data-oriented approach analyzes a subset of data or a smaller transformed representation, whereas taskoriented scheme solves the problem directly via approximation techniques. We propose a hybrid approach to tackle the data stream analysis problem. The data stream has been both statistically transformed to a smaller size and computationally approximated its characteristics. We adopt a Monte Carlo method in the approximation step. The data reduction has been performed horizontally and vertically through our EMR sampling method. The proposed method is analyzed by a series of experiments. We apply our algorithm on clustering and classification tasks to evaluate the utility of our approach.Keywords: Data Stream, Monte Carlo, Sampling, DensityEstimation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14177309 Centralized Resource Management for Network Infrastructure Including Ip Telephony by Integrating a Mediator Between the Heterogeneous Data Sources
Authors: Mohammed Fethi Khalfi, Malika Kandouci
Abstract:
Over the past decade, mobile has experienced a revolution that will ultimately change the way we communicate.All these technologies have a common denominator exploitation of computer information systems, but their operation can be tedious because of problems with heterogeneous data sources.To overcome the problems of heterogeneous data sources, we propose to use a technique of adding an extra layer interfacing applications of management or supervision at the different data sources.This layer will be materialized by the implementation of a mediator between different host applications and information systems frequently used hierarchical and relational manner such that the heterogeneity is completely transparent to the VoIP platform.Keywords: TOIP, Data Integration, Mediation, informationcomputer system, heterogeneous data sources
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13327308 Secure Multiparty Computations for Privacy Preserving Classifiers
Authors: M. Sumana, K. S. Hareesha
Abstract:
Secure computations are essential while performing privacy preserving data mining. Distributed privacy preserving data mining involve two to more sites that cannot pool in their data to a third party due to the violation of law regarding the individual. Hence in order to model the private data without compromising privacy and information loss, secure multiparty computations are used. Secure computations of product, mean, variance, dot product, sigmoid function using the additive and multiplicative homomorphic property is discussed. The computations are performed on vertically partitioned data with a single site holding the class value.Keywords: Homomorphic property, secure product, secure mean and variance, secure dot product, vertically partitioned data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9207307 Design of Buffer Management for Industry to Avoid Sensor Data- Conflicts
Authors: Dae-ho Won, Jong-wook Hong, Yeon-Mo Yang, Jinung An
Abstract:
To reduce accidents in the industry, WSNs(Wireless Sensor networks)- sensor data is used. WSNs- sensor data has the persistence and continuity. therefore, we design and exploit the buffer management system that has the persistence and continuity to avoid and delivery data conflicts. To develop modules, we use the multi buffers and design the buffer management modules that transfer sensor data through the context-aware methods.Keywords: safe management system, buffer management, context-aware, input data stream
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15547306 Security in Resource Constraints Network Light Weight Encryption for Z-MAC
Authors: Mona Almansoori, Ahmed Mustafa, Ahmad Elshamy
Abstract:
Wireless sensor network was formed by a combination of nodes, systematically it transmitting the data to their base stations, this transmission data can be easily compromised if the limited processing power and the data consistency from these nodes are kept in mind; there is always a discussion to address the secure data transfer or transmission in actual time. This will present a mechanism to securely transmit the data over a chain of sensor nodes without compromising the throughput of the network by utilizing available battery resources available in the sensor node. Our methodology takes many different advantages of Z-MAC protocol for its efficiency, and it provides a unique key by sharing the mechanism using neighbor node MAC address. We present a light weighted data integrity layer which is embedded in the Z-MAC protocol to prove that our protocol performs well than Z-MAC when we introduce the different attack scenarios.
Keywords: Hybrid MAC protocol, data integrity, lightweight encryption, Neighbor based key sharing, Sensor node data processing, Z-MAC.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5687305 Data Recording for Remote Monitoring of Autonomous Vehicles
Authors: Rong-Terng Juang
Abstract:
Autonomous vehicles offer the possibility of significant benefits to social welfare. However, fully automated cars might not be going to happen in the near further. To speed the adoption of the self-driving technologies, many governments worldwide are passing laws requiring data recorders for the testing of autonomous vehicles. Currently, the self-driving vehicle, (e.g., shuttle bus) has to be monitored from a remote control center. When an autonomous vehicle encounters an unexpected driving environment, such as road construction or an obstruction, it should request assistance from a remote operator. Nevertheless, large amounts of data, including images, radar and lidar data, etc., have to be transmitted from the vehicle to the remote center. Therefore, this paper proposes a data compression method of in-vehicle networks for remote monitoring of autonomous vehicles. Firstly, the time-series data are rearranged into a multi-dimensional signal space. Upon the arrival, for controller area networks (CAN), the new data are mapped onto a time-data two-dimensional space associated with the specific CAN identity. Secondly, the data are sampled based on differential sampling. Finally, the whole set of data are encoded using existing algorithms such as Huffman, arithmetic and codebook encoding methods. To evaluate system performance, the proposed method was deployed on an in-house built autonomous vehicle. The testing results show that the amount of data can be reduced as much as 1/7 compared to the raw data.
Keywords: Autonomous vehicle, data recording, remote monitoring, controller area network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13537304 Cloud Computing Databases: Latest Trends and Architectural Concepts
Authors: Tarandeep Singh, Parvinder S. Sandhu
Abstract:
The Economic factors are leading to the rise of infrastructures provides software and computing facilities as a service, known as cloud services or cloud computing. Cloud services can provide efficiencies for application providers, both by limiting up-front capital expenses, and by reducing the cost of ownership over time. Such services are made available in a data center, using shared commodity hardware for computation and storage. There is a varied set of cloud services available today, including application services (salesforce.com), storage services (Amazon S3), compute services (Google App Engine, Amazon EC2) and data services (Amazon SimpleDB, Microsoft SQL Server Data Services, Google-s Data store). These services represent a variety of reformations of data management architectures, and more are on the horizon.Keywords: Data Management in Cloud, AWS, EC2, S3, SQS, TQG.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19877303 Data Annotation Models and Annotation Query Language
Authors: Neerja Bhatnagar, Benjoe A. Juliano, Renee S. Renner
Abstract:
This paper presents data annotation models at five levels of granularity (database, relation, column, tuple, and cell) of relational data to address the problem of unsuitability of most relational databases to express annotations. These models do not require any structural and schematic changes to the underlying database. These models are also flexible, extensible, customizable, database-neutral, and platform-independent. This paper also presents an SQL-like query language, named Annotation Query Language (AnQL), to query annotation documents. AnQL is simple to understand and exploits the already-existent wide knowledge and skill set of SQL.Keywords: annotation query language, data annotations, data annotation models, semantic data annotations
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23557302 Use XML Format like a Model of Data Backup
Authors: Souleymane Oumtanaga, Kadjo Tanon Lambert, Koné Tiémoman, Tety Pierre, Dowa N’sreke Florent
Abstract:
Nowadays data backup format doesn-t cease to appear raising so the anxiety on their accessibility and their perpetuity. XML is one of the most promising formats to guarantee the integrity of data. This article suggests while showing one thing man can do with XML. Indeed XML will help to create a data backup model. The main task will consist in defining an application in JAVA able to convert information of a database in XML format and restore them later.
Keywords: Backup, Proprietary format, parser, syntactic tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17307301 Fragility Assessment for Vertically Irregular Buildings with Soft Storey
Authors: N. Akhavan, Sh. Tavousi Tafreshi, A. Ghasemi
Abstract:
Seismic behavior of irregular structures through the past decades indicate that the stated buildings do not have appropriate performance. Among these subjects, the current paper has investigated the behavior of special steel moment frame with different configuration of soft storey vertically. The analyzing procedure has been evaluated with respect to incremental dynamic analysis (IDA), and numeric process was carried out by OpenSees finite element analysis package. To this end, nine 2D steel frames, with different numbers of stories and irregularity positions, which were subjected to seven pairs of ground motion records orthogonally with respect to Ibarra-Krawinkler deterioration model, have been investigated. This paper aims at evaluating the response of two-dimensional buildings incorporating soft storey which subjected to bi-directional seismic excitation. The IDAs were implemented for different stages of PGA with various ground motion records, in order to determine maximum inter-storey drift ratio. According to statistical elements and fracture range (standard deviation), the vulnerability or exceedance from above-mentioned cases has been examined. For this reason, fragility curves for different placement of soft storey in the first, middle and the last floor for 4, 8, and 16 storey buildings have been generated and compared properly.
Keywords: Special steel moment frame, soft storey, incremental dynamic analysis, fragility curve.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14817300 REDUCER – An Architectural Design Pattern for Reducing Large and Noisy Data Sets
Authors: Apkar Salatian
Abstract:
To relieve the burden of reasoning on a point to point basis, in many domains there is a need to reduce large and noisy data sets into trends for qualitative reasoning. In this paper we propose and describe a new architectural design pattern called REDUCER for reducing large and noisy data sets that can be tailored for particular situations. REDUCER consists of 2 consecutive processes: Filter which takes the original data and removes outliers, inconsistencies or noise; and Compression which takes the filtered data and derives trends in the data. In this seminal article we also show how REDUCER has successfully been applied to 3 different case studies.
Keywords: Design Pattern, filtering, compression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14917299 Comparing Emotion Recognition from Voice and Facial Data Using Time Invariant Features
Authors: Vesna Kirandziska, Nevena Ackovska, Ana Madevska Bogdanova
Abstract:
The problem of emotion recognition is a challenging problem. It is still an open problem from the aspect of both intelligent systems and psychology. In this paper, both voice features and facial features are used for building an emotion recognition system. A Support Vector Machine classifiers are built by using raw data from video recordings. In this paper, the results obtained for the emotion recognition are given, and a discussion about the validity and the expressiveness of different emotions is presented. A comparison between the classifiers build from facial data only, voice data only and from the combination of both data is made here. The need for a better combination of the information from facial expression and voice data is argued.
Keywords: Emotion recognition, facial recognition, signal processing, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20187298 Creative Art Practice in Response to Climate Change: How Art Transforms and Frames New Approaches to Speculative Ecological and Sustainable Futures
Authors: Wenwen Liu, Robert Burton, Simon McKeown
Abstract:
Climate change is seriously threatening human security and development, leading to global warming and economic, political, and social chaos. Many artists have created visual responses that challenge perceptions on climate change, actively guiding people to think about the climate issues and potential crises after urban industrialization and explore positive solutions. This project is an interdisciplinary and intertextual study where art practice is informed by culture, philosophy, psychology, ecology, and science. By correlating theory and artistic practice, it studies how art practice creates a visual way of understanding climate issues and uses art as a way of exploring speculative futures. In the context of practical-based research, arts-based practice as research and creative practice as interdisciplinary research are applied alternately to seek the original solution and new knowledge. Through creative art practice, this project has established visual ways of looking at climate change and has developed it into a model to generate more possibilities, an alternative social imagination. It not only encourages people to think and find a sustainable speculative future conducive to all species but also proves that people have the ability to realize positive futures.
Keywords: Climate change, creative practice as interdisciplinary research, arts-based practice as research, creative art practice, speculative future.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6377297 Analysis of Data Gathering Schemes for Layered Sensor Networks with Multihop Polling
Authors: Bhed Bahadur Bista, Danda B. Rawat
Abstract:
In this paper, we investigate multihop polling and data gathering schemes in layered sensor networks in order to extend the life time of the networks. A network consists of three layers. The lowest layer contains sensors. The middle layer contains so called super nodes with higher computational power, energy supply and longer transmission range than sensor nodes. The top layer contains a sink node. A node in each layer controls a number of nodes in lower layer by polling mechanism to gather data. We will present four types of data gathering schemes: intermediate nodes do not queue data packet, queue single packet, queue multiple packets and aggregate data, to see which data gathering scheme is more energy efficient for multihop polling in layered sensor networks.
Keywords: layered sensor network, polling, data gatheringschemes.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15697296 Incremental Algorithm to Cluster the Categorical Data with Frequency Based Similarity Measure
Authors: S.Aranganayagi, K.Thangavel
Abstract:
Clustering categorical data is more complicated than the numerical clustering because of its special properties. Scalability and memory constraint is the challenging problem in clustering large data set. This paper presents an incremental algorithm to cluster the categorical data. Frequencies of attribute values contribute much in clustering similar categorical objects. In this paper we propose new similarity measures based on the frequencies of attribute values and its cardinalities. The proposed measures and the algorithm are experimented with the data sets from UCI data repository. Results prove that the proposed method generates better clusters than the existing one.Keywords: Clustering, Categorical, Incremental, Frequency, Domain
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18217295 Multimedia Data Fusion for Event Detection in Twitter by Using Dempster-Shafer Evidence Theory
Authors: Samar M. Alqhtani, Suhuai Luo, Brian Regan
Abstract:
Data fusion technology can be the best way to extract useful information from multiple sources of data. It has been widely applied in various applications. This paper presents a data fusion approach in multimedia data for event detection in twitter by using Dempster-Shafer evidence theory. The methodology applies a mining algorithm to detect the event. There are two types of data in the fusion. The first is features extracted from text by using the bag-ofwords method which is calculated using the term frequency-inverse document frequency (TF-IDF). The second is the visual features extracted by applying scale-invariant feature transform (SIFT). The Dempster - Shafer theory of evidence is applied in order to fuse the information from these two sources. Our experiments have indicated that comparing to the approaches using individual data source, the proposed data fusion approach can increase the prediction accuracy for event detection. The experimental result showed that the proposed method achieved a high accuracy of 0.97, comparing with 0.93 with texts only, and 0.86 with images only.Keywords: Data fusion, Dempster-Shafer theory, data mining, event detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17997294 Bioclimatic Design, Evaluation of Energy Behavior and Energy-Saving Interventions at the Theagenio Cancer Hospital
Authors: Emmanouel Koumoulas, Aikaterini Rokkou, Marios Moschakis
Abstract:
Theagenio" in Thessaloniki exists and works for three centuries now as a hospital. Since 1975, it has been operating as an Integrated Special Cancer Hospital and since 1985 it has been integrated into the National Health System. "Theagenio" Cancer Hospital is located at the central web of Thessaloniki residential complex and consists of two buildings, the "Symeonidio Research Center", which was completed in 1962 and the Nursing Ward, a project that was later completed in 1975. This paper examines the design of the Hospital Unit according to the requirements of the energy design of buildings. Initially, the energy characteristics of the Hospital are recorded, followed by a detailed presentation of the electromechanical installations. After the existing situation has been captured and with the help of the software TEE-KENAK, different scenarios for the energy upgrading of the buildings have been studied. Proposals for upgrading concern both the shell, e.g. installation of external thermal insulation, replacement of frames, addition of shading systems, etc. as well as electromechanical installations, e.g. use of ceiling fans, improvements in heating and cooling systems, interventions in lighting, etc. The simulation calculates the future energy status of the buildings and presents the economic benefits of the proposed interventions with reference to the environmental profits that arise.Keywords: Energy consumption in hospitals, energy saving interventions, energy upgrading, hospital facilities.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8447293 Model Order Reduction for Frequency Response and Effect of Order of Method for Matching Condition
Authors: Aref Ghafouri, Mohammad Javad Mollakazemi, Farhad Asadi
Abstract:
In this paper, model order reduction method is used for approximation in linear and nonlinearity aspects in some experimental data. This method can be used for obtaining offline reduced model for approximation of experimental data and can produce and follow the data and order of system and also it can match to experimental data in some frequency ratios. In this study, the method is compared in different experimental data and influence of choosing of order of the model reduction for obtaining the best and sufficient matching condition for following the data is investigated in format of imaginary and reality part of the frequency response curve and finally the effect and important parameter of number of order reduction in nonlinear experimental data is explained further.
Keywords: Frequency response, Order of model reduction, frequency matching condition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20587292 Building a Scalable Telemetry Based Multiclass Predictive Maintenance Model in R
Authors: Jaya Mathew
Abstract:
Many organizations are faced with the challenge of how to analyze and build Machine Learning models using their sensitive telemetry data. In this paper, we discuss how users can leverage the power of R without having to move their big data around as well as a cloud based solution for organizations willing to host their data in the cloud. By using ScaleR technology to benefit from parallelization and remote computing or R Services on premise or in the cloud, users can leverage the power of R at scale without having to move their data around.
Keywords: Predictive maintenance, machine learning, big data, cloud, on premise SQL, R.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19217291 Evaluation of Exerting Force on the Heating Surface Due to Bubble Ebullition in Subcooled Flow Boiling
Authors: M. R. Nematollahi
Abstract:
Vibration characteristics of subcooled flow boiling on thin and long structures such as a heating rod were recently investigated by the author. The results show that the intensity of the subcooled boiling-induced vibration (SBIV) was influenced strongly by the conditions of the subcooling temperature, linear power density and flow velocity. Implosive bubble formation and collapse are the main nature of subcooled boiling, and their behaviors are the only sources to originate from SBIV. Therefore, in order to explain the phenomenon of SBIV, it is essential to obtain reliable information about bubble behavior in subcooled boiling conditions. This was investigated at different conditions of coolant subcooling temperatures of 25 to 75°C, coolant flow velocities of 0.16 to 0.53m/s, and linear power densities of 100 to 600 W/cm. High speed photography at 13,500 frames per second was performed at these conditions. The results show that even at the highest subcooling condition, the absolute majority of bubbles collapse very close to the surface after detaching from the heating surface. Based on these observations, a simple model of surface tension and momentum change is introduced to offer a rough quantitative estimate of the force exerted on the heating surface during the bubble ebullition. The formation of a typical bubble in subcooled boiling is predicted to exert an excitation force in the order of 10-4 N.Keywords: Subcooled boiling, vibration mechanism, bubble behavior.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1542