Search results for: Historical data
7383 A Genetic Algorithm for Clustering on Image Data
Authors: Qin Ding, Jim Gasvoda
Abstract:
Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets, especially on image data sets. The genetic algorithm uses the most time efficient techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real image data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.
Keywords: Clustering, data mining, genetic algorithm, image data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20537382 A Comparative Study of Regional Climate Models and Global Coupled Models over Uttarakhand
Authors: Sudip Kumar Kundu, Charu Singh
Abstract:
As a great physiographic divide, the Himalayas affecting a large system of water and air circulation which helps to determine the climatic condition in the Indian subcontinent to the south and mid-Asian highlands to the north. It creates obstacles by defending chill continental air from north side into India in winter and also defends rain-bearing southwesterly monsoon to give up maximum precipitation in that area in monsoon season. Nowadays extreme weather conditions such as heavy precipitation, cloudburst, flash flood, landslide and extreme avalanches are the regular happening incidents in the region of North Western Himalayan (NWH). The present study has been planned to investigate the suitable model(s) to find out the rainfall pattern over that region. For this investigation, selected models from Coordinated Regional Climate Downscaling Experiment (CORDEX) and Coupled Model Intercomparison Project Phase 5 (CMIP5) has been utilized in a consistent framework for the period of 1976 to 2000 (historical). The ability of these driving models from CORDEX domain and CMIP5 has been examined according to their capability of the spatial distribution as well as time series plot of rainfall over NWH in the rainy season and compared with the ground-based Indian Meteorological Department (IMD) gridded rainfall data set. It is noted from the analysis that the models like MIROC5 and MPI-ESM-LR from the both CORDEX and CMIP5 provide the best spatial distribution of rainfall over NWH region. But the driving models from CORDEX underestimates the daily rainfall amount as compared to CMIP5 driving models as it is unable to capture daily rainfall data properly when it has been plotted for time series (TS) individually for the state of Uttarakhand (UK) and Himachal Pradesh (HP). So finally it can be said that the driving models from CMIP5 are better than CORDEX domain models to investigate the rainfall pattern over NWH region.
Keywords: Global warming, rainfall, CMIP5, CORDEX, North Western Himalayan region.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10937381 A Holistic Framework for Unifying Data Security and Management in Modern Enterprises
Authors: Ashly Joseph
Abstract:
Modern businesses struggle significantly to secure and manage their data properly as the volume and complexity of their data both expand exponentially. Through the use of a multi-layered defense strategy, a centralized management platform, and cutting-edge technologies like AI, this research paper presents a comprehensive framework to integrate data security and management. The constraints of current data protection and management strategies, technological advancements, and the evolving threat landscape are all examined in this article. It suggests best practices for putting into practice integrated data security and governance models, placing an emphasis on ongoing adaptation. The advantages mentioned include a strengthened security posture, simpler procedures, lower costs, and reduced complexity. Additionally, issues including skill shortages, antiquated systems, and cultural obstacles are examined. Security executives and Chief Information Security Officers are given practical advice on how to evaluate, plan, and put into place strong data-centric security and management capabilities. The goal of the paper is to provide a thorough study of the data security and management landscape and to arm contemporary businesses with the knowledge they need to be proactive in protecting their data assets.
Keywords: Data security, security management, cloud computing, cybersecurity, data governance, security architecture, data management.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2697380 Post Mining- Discovering Valid Rules from Different Sized Data Sources
Authors: R. Nedunchezhian, K. Anbumani
Abstract:
A big organization may have multiple branches spread across different locations. Processing of data from these branches becomes a huge task when innumerable transactions take place. Also, branches may be reluctant to forward their data for centralized processing but are ready to pass their association rules. Local mining may also generate a large amount of rules. Further, it is not practically possible for all local data sources to be of the same size. A model is proposed for discovering valid rules from different sized data sources where the valid rules are high weighted rules. These rules can be obtained from the high frequency rules generated from each of the data sources. A data source selection procedure is considered in order to efficiently synthesize rules. Support Equalization is another method proposed which focuses on eliminating low frequency rules at the local sites itself thus reducing the rules by a significant amount.
Keywords: Association rules, multiple data stores, synthesizing, valid rules.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14047379 RFID-ready Master Data Management for Reverse Logistics
Authors: Jincheol Han, Hyunsun Ju, Jonghoon Chun
Abstract:
Sharing consistent and correct master data among disparate applications in a reverse-logistics chain has long been recognized as an intricate problem. Although a master data management (MDM) system can surely assume that responsibility, applications that need to co-operate with it must comply with proprietary query interfaces provided by the specific MDM system. In this paper, we present a RFID-ready MDM system which makes master data readily available for any participating applications in a reverse-logistics chain. We propose a RFID-wrapper as a part of our MDM. It acts as a gateway between any data retrieval request and query interfaces that process it. With the RFID-wrapper, any participating applications in a reverse-logistics chain can easily retrieve master data in a way that is analogous to retrieval of any other RFID-based logistics transactional data.Keywords: Reverse Logistics, Master Data Management, RFID.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19747378 Dynamic Models versus Frailty Models for Recurrent Event Data
Authors: Entisar A. Elgmati
Abstract:
Recurrent event data is a special type of multivariate survival data. Dynamic and frailty models are one of the approaches that dealt with this kind of data. A comparison between these two models is studied using the empirical standard deviation of the standardized martingale residual processes as a way of assessing the fit of the two models based on the Aalen additive regression model. Here we found both approaches took heterogeneity into account and produce residual standard deviations close to each other both in the simulation study and in the real data set.Keywords: Dynamic, frailty, misspecification, recurrent events.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23507377 Issues and Architecture for Supporting Data Warehouse Queries in Web Portals
Authors: Minsoo Lee, Yoon-kyung Lee, Hyejung Yoon, Soo-kyung Song, Sujeong Cheong
Abstract:
Data Warehousing tools have become very popular and currently many of them have moved to Web-based user interfaces to make it easier to access and use the tools. The next step is to enable these tools to be used within a portal framework. The portal framework consists of pages having several small windows that contain individual data warehouse query results. There are several issues that need to be considered when designing the architecture for a portal enabled data warehouse query tool. Some issues need special techniques that can overcome the limitations that are imposed by the nature of data warehouse queries. Issues such as single sign-on, query result caching and sharing, customization, scheduling and authorization need to be considered. This paper discusses such issues and suggests an architecture to support data warehouse queries within Web portal frameworks.
Keywords: Data Warehousing tools, data warehousing queries, web portal frameworks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21217376 Cultural Integration as a Factor of Genesis of the Kazakh Nation in the Conditions of Multicultural Society
Authors: Kadyraliyeva Altynay Mustafayevna, Zholdubayeva Azhar Kuanyshbekovna, Alimzhanova Aliya Sharabekovna, Zhiyenbekova Ainur Abdurakhmanovna, Asanov Seylbek Sadikovich
Abstract:
The article analyses historical aspects of the formation of the Kazakh nation in the conditions of the multicultural society. The authors underline cultural integration as a significant stage of the cultural advancement of the Kazakh nation. The transition to the modern-style houses, the adoption and development of the secular education gave a rise to the development of the society and culture on the whole.Keywords: Assimilation, culture genesis, cultural integration, multiculturalism
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18957375 Data Mining Using Learning Automata
Authors: M. R. Aghaebrahimi, S. H. Zahiri, M. Amiri
Abstract:
In this paper a data miner based on the learning automata is proposed and is called LA-miner. The LA-miner extracts classification rules from data sets automatically. The proposed algorithm is established based on the function optimization using learning automata. The experimental results on three benchmarks indicate that the performance of the proposed LA-miner is comparable with (sometimes better than) the Ant-miner (a data miner algorithm based on the Ant Colony optimization algorithm) and CNZ (a well-known data mining algorithm for classification).Keywords: Data mining, Learning automata, Classification rules, Knowledge discovery.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19357374 Secure and Efficient Transmission of Aggregated Data for Mobile Wireless Sensor Networks
Authors: A. Krishna Veni, R.Geetha
Abstract:
Wireless Sensor Networks (WSNs) are suitable for many scenarios in the real world. The retrieval of data is made efficient by the data aggregation techniques. Many techniques for the data aggregation are offered and most of the existing schemes are not energy efficient and secure. However, the existing techniques use the traditional clustering approach where there is a delay during the packet transmission since there is no proper scheduling. The presented system uses the Velocity Energy-efficient and Link-aware Cluster-Tree (VELCT) scheme in which there is a Data Collection Tree (DCT) which improves the lifetime of the network. The VELCT scheme and the construction of DCT reduce the delay and traffic. The network lifetime can be increased by avoiding the frequent change in cluster topology. Secure and Efficient Transmission of Aggregated data (SETA) improves the security of the data transmission via the trust value of the nodes prior the aggregation of data. Since SETA considers the data only from the trustworthy nodes for aggregation, it is more secure in transmitting the data thereby improving the accuracy of aggregated data.
Keywords: Aggregation, lifetime, network security, wireless sensor network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12177373 Development of Greenhouse Analysis Tools for Home Agriculture Project
Authors: M. Amir Abas, M. Dahlui
Abstract:
This paper presents the development of analysis tools for Home Agriculture project. The tools are required for monitoring the condition of greenhouse which involves two components: measurement hardware and data analysis engine. Measurement hardware is functioned to measure environment parameters such as temperature, humidity, air quality, dust and etc while analysis tool is used to analyse and interpret the integrated data against the condition of weather, quality of health, irradiance, quality of soil and etc. The current development of the tools is completed for off-line data recorded technique. The data is saved in MMC and transferred via ZigBee to Environment Data Manager (EDM) for data analysis. EDM converts the raw data and plot three combination graphs. It has been applied in monitoring three months data measurement for irradiance, temperature and humidity of the greenhouse..Keywords: Monitoring, Environment, Greenhouse, Analysis tools
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20187372 A Robust Data Hiding Technique based on LSB Matching
Authors: Emad T. Khalaf, Norrozila Sulaiman
Abstract:
Many researchers are working on information hiding techniques using different ideas and areas to hide their secrete data. This paper introduces a robust technique of hiding secret data in image based on LSB insertion and RSA encryption technique. The key of the proposed technique is to encrypt the secret data. Then the encrypted data will be converted into a bit stream and divided it into number of segments. However, the cover image will also be divided into the same number of segments. Each segment of data will be compared with each segment of image to find the best match segment, in order to create a new random sequence of segments to be inserted then in a cover image. Experimental results show that the proposed technique has a high security level and produced better stego-image quality.Keywords: steganography; LSB Matching; RSA Encryption; data segments
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22207371 Minaret of Medieval City Aktobe
Authors: Yeraly Akymbek, Beibit Baibugunov
Abstract:
In the article the remains of the base of the minaret, found in 2009 at the medieval fortress shakhristan Aktobe, which is located along the courses of the rivers Balta and Aksu. The minaret, which consists of two parts: the stylobate in the pit and base part refers to the XI-XII centuries. The preserved height of the building is 3.6 meters. Volume stylobat quadrangular minaret, the corners of which are aimed at the four corners of the world amounts to 8,65 x8, 5 m, height – 2.6 m. Diameter octagonal upper cap of 7.85 m and a height of preserved – 1 m. This minaret is of particular importance among the historical and architectural monuments of Kazakhstan, as it is so far the only minaret belonging to Karakhanid epoch in which Islam was the state religion.Keywords: Aktobe, medieval, minaret, stylobate.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17477370 Comprehensive Analysis of Data Mining Tools
Authors: S. Sarumathi, N. Shanthi
Abstract:
Due to the fast and flawless technological innovation there is a tremendous amount of data dumping all over the world in every domain such as Pattern Recognition, Machine Learning, Spatial Data Mining, Image Analysis, Fraudulent Analysis, World Wide Web etc., This issue turns to be more essential for developing several tools for data mining functionalities. The major aim of this paper is to analyze various tools which are used to build a resourceful analytical or descriptive model for handling large amount of information more efficiently and user friendly. In this survey the diverse tools are illustrated with their extensive technical paradigm, outstanding graphical interface and inbuilt multipath algorithms in which it is very useful for handling significant amount of data more indeed.
Keywords: Classification, Clustering, Data Mining, Machine learning, Visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24397369 Surface Elevation Dynamics Assessment Using Digital Elevation Models, Light Detection and Ranging, GPS and Geospatial Information Science Analysis: Ecosystem Modelling Approach
Authors: Ali K. M. Al-Nasrawi, Uday A. Al-Hamdany, Sarah M. Hamylton, Brian G. Jones, Yasir M. Alyazichi
Abstract:
Surface elevation dynamics have always responded to disturbance regimes. Creating Digital Elevation Models (DEMs) to detect surface dynamics has led to the development of several methods, devices and data clouds. DEMs can provide accurate and quick results with cost efficiency, in comparison to the inherited geomatics survey techniques. Nowadays, remote sensing datasets have become a primary source to create DEMs, including LiDAR point clouds with GIS analytic tools. However, these data need to be tested for error detection and correction. This paper evaluates various DEMs from different data sources over time for Apple Orchard Island, a coastal site in southeastern Australia, in order to detect surface dynamics. Subsequently, 30 chosen locations were examined in the field to test the error of the DEMs surface detection using high resolution global positioning systems (GPSs). Results show significant surface elevation changes on Apple Orchard Island. Accretion occurred on most of the island while surface elevation loss due to erosion is limited to the northern and southern parts. Concurrently, the projected differential correction and validation method aimed to identify errors in the dataset. The resultant DEMs demonstrated a small error ratio (≤ 3%) from the gathered datasets when compared with the fieldwork survey using RTK-GPS. As modern modelling approaches need to become more effective and accurate, applying several tools to create different DEMs on a multi-temporal scale would allow easy predictions in time-cost-frames with more comprehensive coverage and greater accuracy. With a DEM technique for the eco-geomorphic context, such insights about the ecosystem dynamic detection, at such a coastal intertidal system, would be valuable to assess the accuracy of the predicted eco-geomorphic risk for the conservation management sustainability. Demonstrating this framework to evaluate the historical and current anthropogenic and environmental stressors on coastal surface elevation dynamism could be profitably applied worldwide.
Keywords: DEMs, eco-geomorphic-dynamic processes, geospatial information science. Remote sensing, surface elevation changes.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11587368 A Prediction of Attractive Evaluation Objects Based On Complex Sequential Data
Authors: Shigeaki Sakurai, Makino Kyoko, Shigeru Matsumoto
Abstract:
This paper proposes a method that predicts attractive evaluation objects. In the learning phase, the method inductively acquires trend rules from complex sequential data. The data is composed of two types of data. One is numerical sequential data. Each evaluation object has respective numerical sequential data. The other is text sequential data. Each evaluation object is described in texts. The trend rules represent changes of numerical values related to evaluation objects. In the prediction phase, the method applies new text sequential data to the trend rules and evaluates which evaluation objects are attractive. This paper verifies the effect of the proposed method by using stock price sequences and news headline sequences. In these sequences, each stock brand corresponds to an evaluation object. This paper discusses validity of predicted attractive evaluation objects, the process time of each phase, and the possibility of application tasks.
Keywords: Trend rule, frequent pattern, numerical sequential data, text sequential data, evaluation object.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12357367 Methods for Distinction of Cattle Using Supervised Learning
Authors: Radoslav Židek, Veronika Šidlová, Radovan Kasarda, Birgit Fuerst-Waltl
Abstract:
Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.
Keywords: Genetic data, Pinzgau cattle, supervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23187366 A Comparative Study of Fine Grained Security Techniques Based on Data Accessibility and Inference
Authors: Azhar Rauf, Sareer Badshah, Shah Khusro
Abstract:
This paper analyzes different techniques of the fine grained security of relational databases for the two variables-data accessibility and inference. Data accessibility measures the amount of data available to the users after applying a security technique on a table. Inference is the proportion of information leakage after suppressing a cell containing secret data. A row containing a secret cell which is suppressed can become a security threat if an intruder generates useful information from the related visible information of the same row. This paper measures data accessibility and inference associated with row, cell, and column level security techniques. Cell level security offers greatest data accessibility as it suppresses secret data only. But on the other hand, there is a high probability of inference in cell level security. Row and column level security techniques have least data accessibility and inference. This paper introduces cell plus innocent security technique that utilizes the cell level security method but suppresses some innocent data to dodge an intruder that a suppressed cell may not necessarily contain secret data. Four variations of the technique namely cell plus innocent 1/4, cell plus innocent 2/4, cell plus innocent 3/4, and cell plus innocent 4/4 respectively have been introduced to suppress innocent data equal to 1/4, 2/4, 3/4, and 4/4 percent of the true secret data inside the database. Results show that the new technique offers better control over data accessibility and inference as compared to the state-of-theart security techniques. This paper further discusses the combination of techniques together to be used. The paper shows that cell plus innocent 1/4, 2/4, and 3/4 techniques can be used as a replacement for the cell level security.
Keywords: Fine Grained Security, Data Accessibility, Inference, Row, Cell, Column Level Security.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14717365 Weka Based Desktop Data Mining as Web Service
Authors: Sujala.D.Shetty, S.Vadivel, Sakshi Vaghella
Abstract:
Data mining is the process of sifting through large volumes of data, analyzing data from different perspectives and summarizing it into useful information. One of the widely used desktop applications for data mining is the Weka tool which is nothing but a collection of machine learning algorithms implemented in Java and open sourced under the General Public License (GPL). A web service is a software system designed to support interoperable machine to machine interaction over a network using SOAP messages. Unlike a desktop application, a web service is easy to upgrade, deliver and access and does not occupy any memory on the system. Keeping in mind the advantages of a web service over a desktop application, in this paper we are demonstrating how this Java based desktop data mining application can be implemented as a web service to support data mining across the internet.Keywords: desktop application, Weka mining, web service
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 40817364 The Culture of Interethnic Concord in Kazakhstan: Peculiarities of Formation and Development
Authors: Zh. Tolen, A. Kadyralyeva, A. Alimzhanova, G. Aldambergenova, K. Arymbayeva, A. Zhiyenbekova
Abstract:
This paper describes the historical development of interethnic concord in the Republic of Kazakhstan, and emphasizes the role of tolerance mentality of the Kazakh people in ethno-political policy of the country. Moreover, pointing out interethnic concord as a powerful stabilizing factor, it analyses the specifics of interethnic policy in multinational Kazakh society. It summarizes that the culture of interethnic concord can be a model of ethno- political policy of Kazakhstan.Keywords: Interethnic relations, the culture of interethnic concord, multiculturalism, tolerance, stability in society.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24267363 Influence of Parameters of Modeling and Data Distribution for Optimal Condition on Locally Weighted Projection Regression Method
Authors: Farhad Asadi, Mohammad Javad Mollakazemi, Aref Ghafouri
Abstract:
Recent research in neural networks science and neuroscience for modeling complex time series data and statistical learning has focused mostly on learning from high input space and signals. Local linear models are a strong choice for modeling local nonlinearity in data series. Locally weighted projection regression is a flexible and powerful algorithm for nonlinear approximation in high dimensional signal spaces. In this paper, different learning scenario of one and two dimensional data series with different distributions are investigated for simulation and further noise is inputted to data distribution for making different disordered distribution in time series data and for evaluation of algorithm in locality prediction of nonlinearity. Then, the performance of this algorithm is simulated and also when the distribution of data is high or when the number of data is less the sensitivity of this approach to data distribution and influence of important parameter of local validity in this algorithm with different data distribution is explained.
Keywords: Local nonlinear estimation, LWPR algorithm, Online training method.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16017362 Noise Reduction in Web Data: A Learning Approach Based on Dynamic User Interests
Authors: Julius Onyancha, Valentina Plekhanova
Abstract:
One of the significant issues facing web users is the amount of noise in web data which hinders the process of finding useful information in relation to their dynamic interests. Current research works consider noise as any data that does not form part of the main web page and propose noise web data reduction tools which mainly focus on eliminating noise in relation to the content and layout of web data. This paper argues that not all data that form part of the main web page is of a user interest and not all noise data is actually noise to a given user. Therefore, learning of noise web data allocated to the user requests ensures not only reduction of noisiness level in a web user profile, but also a decrease in the loss of useful information hence improves the quality of a web user profile. Noise Web Data Learning (NWDL) tool/algorithm capable of learning noise web data in web user profile is proposed. The proposed work considers elimination of noise data in relation to dynamic user interest. In order to validate the performance of the proposed work, an experimental design setup is presented. The results obtained are compared with the current algorithms applied in noise web data reduction process. The experimental results show that the proposed work considers the dynamic change of user interest prior to elimination of noise data. The proposed work contributes towards improving the quality of a web user profile by reducing the amount of useful information eliminated as noise.Keywords: Web log data, web user profile, user interest, noise web data learning, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17347361 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning
Authors: Walid Cherif
Abstract:
Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.
Keywords: Data mining, knowledge discovery, machine learning, similarity measurement, supervised classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15277360 Moving Data Mining Tools toward a Business Intelligence System
Authors: Nittaya Kerdprasop, Kittisak Kerdprasop
Abstract:
Data mining (DM) is the process of finding and extracting frequent patterns that can describe the data, or predict unknown or future values. These goals are achieved by using various learning algorithms. Each algorithm may produce a mining result completely different from the others. Some algorithms may find millions of patterns. It is thus the difficult job for data analysts to select appropriate models and interpret the discovered knowledge. In this paper, we describe a framework of an intelligent and complete data mining system called SUT-Miner. Our system is comprised of a full complement of major DM algorithms, pre-DM and post-DM functionalities. It is the post-DM packages that ease the DM deployment for business intelligence applications.Keywords: Business intelligence, data mining, functionalprogramming, intelligent system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17427359 Analysis of Diverse Clustering Tools in Data Mining
Authors: S. Sarumathi, N. Shanthi, M. Sharmila
Abstract:
Clustering in data mining is an unsupervised learning technique of aggregating the data objects into meaningful groups such that the intra cluster similarity of objects are maximized and inter cluster similarity of objects are minimized. Over the past decades several clustering tools were emerged in which clustering algorithms are inbuilt and are easier to use and extract the expected results. Data mining mainly deals with the huge databases that inflicts on cluster analysis and additional rigorous computational constraints. These challenges pave the way for the emergence of powerful expansive data mining clustering softwares. In this survey, a variety of clustering tools used in data mining are elucidated along with the pros and cons of each software.
Keywords: Cluster Analysis, Clustering Algorithms, Clustering Techniques, Association, Visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22017358 A Monte Carlo Method to Data Stream Analysis
Authors: Kittisak Kerdprasop, Nittaya Kerdprasop, Pairote Sattayatham
Abstract:
Data stream analysis is the process of computing various summaries and derived values from large amounts of data which are continuously generated at a rapid rate. The nature of a stream does not allow a revisit on each data element. Furthermore, data processing must be fast to produce timely analysis results. These requirements impose constraints on the design of the algorithms to balance correctness against timely responses. Several techniques have been proposed over the past few years to address these challenges. These techniques can be categorized as either dataoriented or task-oriented. The data-oriented approach analyzes a subset of data or a smaller transformed representation, whereas taskoriented scheme solves the problem directly via approximation techniques. We propose a hybrid approach to tackle the data stream analysis problem. The data stream has been both statistically transformed to a smaller size and computationally approximated its characteristics. We adopt a Monte Carlo method in the approximation step. The data reduction has been performed horizontally and vertically through our EMR sampling method. The proposed method is analyzed by a series of experiments. We apply our algorithm on clustering and classification tasks to evaluate the utility of our approach.Keywords: Data Stream, Monte Carlo, Sampling, DensityEstimation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14177357 STLF Based on Optimized Neural Network Using PSO
Authors: H. Shayeghi, H. A. Shayanfar, G. Azimi
Abstract:
The quality of short term load forecasting can improve the efficiency of planning and operation of electric utilities. Artificial Neural Networks (ANNs) are employed for nonlinear short term load forecasting owing to their powerful nonlinear mapping capabilities. At present, there is no systematic methodology for optimal design and training of an artificial neural network. One has often to resort to the trial and error approach. This paper describes the process of developing three layer feed-forward large neural networks for short-term load forecasting and then presents a heuristic search algorithm for performing an important task of this process, i.e. optimal networks structure design. Particle Swarm Optimization (PSO) is used to develop the optimum large neural network structure and connecting weights for one-day ahead electric load forecasting problem. PSO is a novel random optimization method based on swarm intelligence, which has more powerful ability of global optimization. Employing PSO algorithms on the design and training of ANNs allows the ANN architecture and parameters to be easily optimized. The proposed method is applied to STLF of the local utility. Data are clustered due to the differences in their characteristics. Special days are extracted from the normal training sets and handled separately. In this way, a solution is provided for all load types, including working days and weekends and special days. The experimental results show that the proposed method optimized by PSO can quicken the learning speed of the network and improve the forecasting precision compared with the conventional Back Propagation (BP) method. Moreover, it is not only simple to calculate, but also practical and effective. Also, it provides a greater degree of accuracy in many cases and gives lower percent errors all the time for STLF problem compared to BP method. Thus, it can be applied to automatically design an optimal load forecaster based on historical data.
Keywords: Large Neural Network, Short-Term Load Forecasting, Particle Swarm Optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 22247356 The Protection and Enhancement of the Roman Roads in Algeria
Abstract:
The Romain paths or roads offer a very interesting archaeological material, because they allow us to understand the history of human settlement and are also factors that increase territorial identity. Roman roads are one of the hallmarks of the Roman empire, which extends to North Africa. The objective of this investigation is to attract the attention of researchers of the importance of Roman roads and paths, which are found in Algeria, according to the quality of the materials and techniques used in this period our history, and to encourage other decision makers to protect and enhance these routes because the current urbanization, intensive agricultural practices, or simply forgotten, decreases the sustainability of this important historical heritage.
Keywords: Romain paths, material Materials, Property, Valuation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16807355 Improved Data Warehousing: Lessons Learnt from the Systems Approach
Authors: Roelien Goede
Abstract:
Data warehousing success is not high enough. User dissatisfaction and failure to adhere to time frames and budgets are too common. Most traditional information systems practices are rooted in hard systems thinking. Today, the great systems thinkers are forgotten by information systems developers. A data warehouse is still a system and it is worth investigating whether systems thinkers such as Churchman can enhance our practices today. This paper investigates data warehouse development practices from a systems thinking perspective. An empirical investigation is done in order to understand the everyday practices of data warehousing professionals from a systems perspective. The paper presents a model for the application of Churchman-s systems approach in data warehouse development.Keywords: Data warehouse development, Information systemsdevelopment, Interpretive case study, Systems thinking
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15967354 Centralized Resource Management for Network Infrastructure Including Ip Telephony by Integrating a Mediator Between the Heterogeneous Data Sources
Authors: Mohammed Fethi Khalfi, Malika Kandouci
Abstract:
Over the past decade, mobile has experienced a revolution that will ultimately change the way we communicate.All these technologies have a common denominator exploitation of computer information systems, but their operation can be tedious because of problems with heterogeneous data sources.To overcome the problems of heterogeneous data sources, we propose to use a technique of adding an extra layer interfacing applications of management or supervision at the different data sources.This layer will be materialized by the implementation of a mediator between different host applications and information systems frequently used hierarchical and relational manner such that the heterogeneity is completely transparent to the VoIP platform.Keywords: TOIP, Data Integration, Mediation, informationcomputer system, heterogeneous data sources
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1332