Search results for: data migration

7374 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.

Keywords: Data cleaning, dependency rules, violation data discovery, data repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2561

7373 Coalescing Data Marts

Authors: N. Parimala, P. Pahwa

Abstract:

OLAP uses multidimensional structures, to provide access to data for analysis. Traditionally, OLAP operations are more focused on retrieving data from a single data mart. An exception is the drill across operator. This, however, is restricted to retrieving facts on common dimensions of the multiple data marts. Our concern is to define further operations while retrieving data from multiple data marts. Towards this, we have defined six operations which coalesce data marts. While doing so we consider the common as well as the non-common dimensions of the data marts.

Keywords: Data warehouse, Dimension, OLAP, Star Schema.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1515

7372 Socio-Economic Insight of the Secondary Housing Market in Colombo Suburbs: Seller’s Point of Views

Authors: R. G. Ariyawansa, M. A. N. R. M. Perera

Abstract:

“House” is a powerful symbol of socio-economic background of individuals and families. In fact, housing provides all types of needs/wants from basic needs to self-actualization needs. This phenomenon can be realized only having analyzed hidden motives of buyers and sellers of the housing market. Hence, the aim of this study is to examine the socio-economic insight of the secondary housing market in Colombo suburbs. This broader aim was achieved via analyzing the general pattern of the secondary housing market, identifying socio-economic motives of sellers of the secondary housing market, and reviewing sellers’ experience of buyer behavior. A purposive sample of 50 sellers from popular residential areas in Colombo such as Maharagama, Kottawa, Piliyandala, Punnipitiya, and Nugegoda was used to collect primary data instead of relevant secondary data from published and unpublished reports. The sample was limited to selling price ranging from Rs15 million to Rs25 million, which apparently falls into middle and upper-middle income houses in the context. Participatory observation and semi-structured interviews were adopted as key data collection tools. Data were descriptively analyzed. This study found that the market is mainly handled by informal agents who are unqualified and unorganized. People such as taxi/tree-wheel drivers, boutique venders, security personals etc. are engaged in housing brokerage as a part time career. Few fulltime and formally organized agents were found but they were also not professionally qualified. As far as housing quality is concerned, it was observed that 90% of houses was poorly maintained and illegally modified. They are situated in poorly maintained neighborhoods as well. Among the observed houses, 2% was moderately maintained and 8% was well maintained and modified. Major socio-economic motives of sellers were “migrating foreign countries for education and employment” (80% and 10% respectively), “family problems” (4%), and “social status” (3%). Other motives were “health” and “environmental/neighborhood problems” (3%). This study further noted that the secondary middle income housing market in the area directly related with the migrants who motivated for education in foreign countries, mainly Australia, UK and USA. As per the literature, families motivated for education tend to migrate Colombo suburbs from remote areas of the country. They are seeking temporary accommodation in lower middle income housing. However, the secondary middle income housing market relates with the migration from Colombo to major global cities. Therefore, final transaction price of this market may depend on migration related dates such as university deadlines, visa and other agreements. Hence, it creates a buyers’ market lowering the selling price. Also it was revealed that the buyers tend to trust more on this market as far as the quality of construction of houses is concerned than brand new houses which are built for selling purpose.

Keywords: Informal housing market, hidden motives of buyers and sellers, secondary housing market, socio-economic insight.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 642

7371 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2418

7370 Four Decades of Greek Artistic Presence in Paris (1970-2010): Theory and Interpretation

Authors: Sapfo A. Mortaki

Abstract:

This article examines the presence of Greek immigrant artists (painters and sculptors) in Paris during 1970-2010. The aim is to highlight their presence in the French capital through archival research in the daily and periodical press as well as present the impact of their artistic activity on the French intellectual life and society. At the same time, their contribution to the development of cultural life in Greece becomes apparent. The integration of those migrant artists into an environment of cultural coexistence and the understanding of the social phenomenon of their migration, in the context of postmodernity, are being investigated. The cultural relations between the two countries are studied in the context of support mechanisms, such as the Greek community, cultural institutions, museums and galleries. The recognition of the Greek artists by the French society and the social dimension in the context of their activity in Paris, are discussed in terms of the assimilation theory. Since the 1970s, and especially since the fall of the dictatorship in Greece, in opposition to the prior situation, artists' contacts with their homeland have been significantly enhanced, with most of them now travelling to Paris, while others work in parallel in both countries. As a result, not only do the stages of the development of their work through their pursuits become visible, but, most importantly, the artistic world becomes informed about the multifaceted expression of art through the succession of various contemporary currents. Thus, the participation of Greek artists in the international cultural landscape is demonstrated.

Keywords: Artistic migration, cultural impact, Greek artists, postmodernity, theory of assimilation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 750

7369 Comparative Analysis of Diverse Collection of Big Data Analytics Tools

Authors: S. Vidhya, S. Sarumathi, N. Shanthi

Abstract:

Over the past era, there have been a lot of efforts and studies are carried out in growing proficient tools for performing various tasks in big data. Recently big data have gotten a lot of publicity for their good reasons. Due to the large and complex collection of datasets it is difficult to process on traditional data processing applications. This concern turns to be further mandatory for producing various tools in big data. Moreover, the main aim of big data analytics is to utilize the advanced analytic techniques besides very huge, different datasets which contain diverse sizes from terabytes to zettabytes and diverse types such as structured or unstructured and batch or streaming. Big data is useful for data sets where their size or type is away from the capability of traditional relational databases for capturing, managing and processing the data with low-latency. Thus the out coming challenges tend to the occurrence of powerful big data tools. In this survey, a various collection of big data tools are illustrated and also compared with the salient features.

Keywords: Big data, Big data analytics, Business analytics, Data analysis, Data visualization, Data discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3727

7368 Multi-labeled Data Expressed by a Set of Labels

Authors: Tetsuya Furukawa, Masahiro Kuzunishi

Abstract:

Collected data must be organized to be utilized efficiently, and hierarchical classification of data is efficient approach to organize data. When data is classified to multiple categories or annotated with a set of labels, users request multi-labeled data by giving a set of labels. There are several interpretations of the data expressed by a set of labels. This paper discusses which data is expressed by a set of labels by introducing orders for sets of labels and shows that there are four types of orders, which are characterized by whether the labels of expressed data includes every label of the given set of labels within the range of the set. Desirable properties of the orders, data is also expressed by the higher set of labels and different sets of labels express different data, are discussed for the orders.

Keywords: Classification Hierarchies, Multi-labeled Data, Multiple Classificaiton, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1256

7367 The Comparison of Data Replication in Distributed Systems

Authors: Iman Zangeneh, Mostafa Moradi, Ali Mokhtarbaf

Abstract:

The necessity of ever-increasing use of distributed data in computer networks is obvious for all. One technique that is performed on the distributed data for increasing of efficiency and reliablity is data rplication. In this paper, after introducing this technique and its advantages, we will examine some dynamic data replication. We will examine their characteristies for some overus scenario and the we will propose some suggestion for their improvement.

Keywords: data replication, data hiding, consistency, dynamicdata replication strategy

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1593

7366 Evaluation of Aquifer Protective Capacity and Soil Corrosivity Using Geoelectrical Method

Authors: M. T. Tsepav, Y. Adamu, M. A. Umar

Abstract:

A geoelectric survey was carried out in some parts of Angwan Gwari, an outskirt of Lapai Local Government Area on Niger State which belongs to the Nigerian Basement Complex, with the aim of evaluating the soil corrosivity, aquifer transmissivity and protective capacity of the area from which aquifer characterisation was made. The G41 Resistivity Meter was employed to obtain fifteen Schlumberger Vertical Electrical Sounding data along profiles in a square grid network. The data were processed using interpex 1-D sounding inversion software, which gives vertical electrical sounding curves with layered model comprising of the apparent resistivities, overburden thicknesses, and depth. This information was used to evaluate longitudinal conductance and transmissivities of the layers. The results show generally low resistivities across the survey area and an average longitudinal conductance variation from 0.0237Siemens in VES 6 to 0.1261Siemens in VES 15 with almost the entire area giving values less than 1.0 Siemens. The average transmissivity values range from 96.45 Ω.m2 in VES 4 to 299070 Ω.m2 in VES 1. All but VES 4 and VES14 had an average overburden greater than 400 Ω.m2, these results suggest that the aquifers are highly permeable to fluid movement within, leading to the possibility of enhanced migration and circulation of contaminants in the groundwater system and that the area is generally corrosive.

Keywords: Geoelectric survey, corrosivity, protective capacity, transmissivity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2189

7365 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: Clustering, data mining, DBSCAN, k-means, k-medoids, sensor data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1958

7364 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: Big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1986

7363 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analyzing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2724

7362 Automatic Real-Patient Medical Data De-Identification for Research Purposes

Authors: Petr Vcelak, Jana Kleckova

Abstract:

Our Medicine-oriented research is based on a medical data set of real patients. It is a security problem to share patient private data with peoples other than clinician or hospital staff. We have to remove person identification information from medical data. The medical data without private data are available after a de-identification process for any research purposes. In this paper, we introduce an universal automatic rule-based de-identification application to do all this stuff on an heterogeneous medical data. A patient private identification is replaced by an unique identification number, even in burnedin annotation in pixel data. The identical identification is used for all patient medical data, so it keeps relationships in a data. Hospital can take an advantage of a research feedback based on results.

Keywords: DASTA, De-identification, DICOM, Health Level Seven, Medical data, OCR, Personal data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1589

7361 Analyzing Multi-Labeled Data Based on the Roll of a Concept against a Semantic Range

Authors: Masahiro Kuzunishi, Tetsuya Furukawa, Ke Lu

Abstract:

Classifying data hierarchically is an efficient approach to analyze data. Data is usually classified into multiple categories, or annotated with a set of labels. To analyze multi-labeled data, such data must be specified by giving a set of labels as a semantic range. There are some certain purposes to analyze data. This paper shows which multi-labeled data should be the target to be analyzed for those purposes, and discusses the role of a label against a set of labels by investigating the change when a label is added to the set of labels. These discussions give the methods for the advanced analysis of multi-labeled data, which are based on the role of a label against a semantic range.

Keywords: Classification Hierarchies, Data Analysis, Multilabeled Data, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1164

7360 Women's Employment Issues in Georgia and Solutions Based on European Experience

Authors: N. Damenia, E. Kharaishvili, N. Sagareishvili, M. Saghareishvili

Abstract:

Women's Employment is one of the most important issues in the global economy. The article discusses the stated topic in Georgia, through historical content, Soviet experience, and modern perspectives. The paper discusses segmentation insa terms of employment and related problems. Based on statistical analysis, women's unemployment rate and its factors are analyzed. The level of employment of women in Transcaucasia (Georgia, Armenia, and Azerbaijan) is discussed and is compared with Baltic countries (Lithuania, Latvia, and Estonia). The study analyzes women’s level of development, according to the average age of marriage and migration level. The focus is on Georgia's Association Agreement with the EU in 2014, which includes economic, social, trade and political issues. One part of it is gender equality at workplaces. According to the research, the average monthly remuneration of women managers in the financial and insurance sector equaled to 1044.6 Georgian Lari, while in overall business sector average monthly remuneration equaled to 961.1 GEL. Average salaries are increasing; however, the employment rate remains problematic. For example, in 2017, 74.6% of men and 50.8% of women were employed from a total workforce. It is also interesting that the proportion of men and women at managerial positions is 29% (women) to 71% (men). Based on the results, the main recommendation for government and civil society is to consider women as a part of the country’s economic development. In this aspect, the experience of developed countries should be considered. It is important to create additional jobs in urban or rural areas and help migrant women return and use their working resources properly.

Keywords: Employment of women, segregation in terms of employment, women's employment level in Transcaucasia, migration level.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 630

7359 Steganalysis of Data Hiding via Halftoning and Coordinate Projection

Authors: Woong Hee Kim, Ilhwan Park

Abstract:

Steganography is the art of hiding and transmitting data through apparently innocuous carriers in an effort to conceal the existence of the data. A lot of steganography algorithms have been proposed recently. Many of them use the digital image data as a carrier. In data hiding scheme of halftoning and coordinate projection, still image data is used as a carrier, and the data of carrier image are modified for data embedding. In this paper, we present three features for analysis of data hiding via halftoning and coordinate projection. Also, we present a classifier using the proposed three features.

Keywords: Steganography, steganalysis, digital halftoning, data hiding.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1550

7358 Biological Data Integration using SOA

Authors: Noura Meshaan Al-Otaibi, Amin Yousef Noaman

Abstract:

Nowadays scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data accessing, analyzing, and visualization tools. This research suggests the use of Service Oriented Architecture (SOA) to integrate biological data from different data sources. This work shows SOA will solve the problems that facing integration process and if the biologist scientists can access the biological data in easier way. There are several methods to implement SOA but web service is the most popular method. The Microsoft .Net Framework used to implement proposed architecture.

Keywords: Bioinformatics, Biological data, Data Integration, SOA and Web Services.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2416

7357 STATISTICA Software: A State of the Art Review

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, P. Ranjetha

Abstract:

Data mining idea is mounting rapidly in admiration and also in their popularity. The foremost aspire of data mining method is to extract data from a huge data set into several forms that could be comprehended for additional use. The data mining is a technology that contains with rich potential resources which could be supportive for industries and businesses that pay attention to collect the necessary information of the data to discover their customer’s performances. For extracting data there are several methods are available such as Classification, Clustering, Association, Discovering, and Visualization… etc., which has its individual and diverse algorithms towards the effort to fit an appropriate model to the data. STATISTICA mostly deals with excessive groups of data that imposes vast rigorous computational constraints. These results trials challenge cause the emergence of powerful STATISTICA Data Mining technologies. In this survey an overview of the STATISTICA software is illustrated along with their significant features.

Keywords: Data Mining, STATISTICA Data Miner, Text Miner, Enterprise Server, Classification, Association, Clustering, Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2557

7356 A Risk Assessment for the Small Hive Beetle Based on Meteorological Standard Measurements

Authors: J. Junk, M. Eickermann

Abstract:

The Small Hive Beetle, Aethina tumida (Coleoptera: Nitidulidae) is a parasite for honey bee colonies, Apis mellifera, and was recently introduced to the European continent, accidentally. Based on the literature, a model was developed by using regional meteorological variables (daily values of minimum, maximum and mean air temperature as well as mean soil temperature at 50 mm depth) to calculate the time-point of hive invasion by A. tumida in springtime, the development duration of pupae as well as the number of generations of A. tumida per year. Luxembourg was used as a test region for our model for 2005 to 2013. The model output indicates a successful surviving of the Small Hive Beetle in Luxembourg with two up to three generations per year. Additionally, based on our meteorological data sets a first migration of SHB to apiaries can be expected from mid of March up to April. Our approach can be transferred easily to other countries to estimate the risk potential for a successful introduction and spreading of A. tumida in Western Europe.

Keywords: Aethina tumida, air temperature, larval development, soil temperature.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 703

7355 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: Communication, computer network, data collection, probe.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1742

7354 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: Data mining, fuzzy sets, linguistic summarization, patent data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1164

7353 Metadata Update Mechanism Improvements in Data Grid

Authors: S. Farokhzad, M. Reza Salehnamadi

Abstract:

Grid environments include aggregation of geographical distributed resources. Grid is put forward in three types of computational, data and storage. This paper presents a research on data grid. Data grid is used for covering and securing accessibility to data from among many heterogeneous sources. Users are not worry on the place where data is located in it, provided that, they should get access to the data. Metadata is used for getting access to data in data grid. Presently, application metadata catalogue and SRB middle-ware package are used in data grids for management of metadata. At this paper, possibility of updating, streamlining and searching is provided simultaneously and rapidly through classified table of preserving metadata and conversion of each table to numerous tables. Meanwhile, with regard to the specific application, the most appropriate and best division is set and determined. Concurrency of implementation of some of requests and execution of pipeline is adaptability as a result of this technique.

Keywords: Grids, data grid, metadata, update.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1651

7352 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1017

7351 Using Data Clustering in Oral Medicine

Authors: Fahad Shahbaz Khan, Rao Muhammad Anwer, Olof Torgersson

Abstract:

The vast amount of information hidden in huge databases has created tremendous interests in the field of data mining. This paper examines the possibility of using data clustering techniques in oral medicine to identify functional relationships between different attributes and classification of similar patient examinations. Commonly used data clustering algorithms have been reviewed and as a result several interesting results have been gathered.

Keywords: Oral Medicine, Cluto, Data Clustering, Data Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1925

7350 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: Data mining, data analysis, prediction, optimization, building operational performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3645

7349 Query Algebra for Semistuctured Data

Authors: Ei Ei Myat, Ni Lar Thein

Abstract:

With the tremendous growth of World Wide Web (WWW) data, there is an emerging need for effective information retrieval at the document level. Several query languages such as XML-QL, XPath, XQL, Quilt and XQuery are proposed in recent years to provide faster way of querying XML data, but they still lack of generality and efficiency. Our approach towards evolving a framework for querying semistructured documents is based on formal query algebra. Two elements are introduced in the proposed framework: first, a generic and flexible data model for logical representation of semistructured data and second, a set of operators for the manipulation of objects defined in the data model. In additional to accommodating several peculiarities of semistructured data, our model offers novel features such as bidirectional paths for navigational querying and partitions for data transformation that are not available in other proposals.

Keywords: Algebra, Semistructured data, Query Algebra.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1331

7348 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: Simulation data, data summarization, spatial histograms, exploration and visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 693

7347 Joint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis

Authors: Reza Nadimi, Fariborz Jolai

Abstract:

This article combines two techniques: data envelopment analysis (DEA) and Factor analysis (FA) to data reduction in decision making units (DMU). Data envelopment analysis (DEA), a popular linear programming technique is useful to rate comparatively operational efficiency of decision making units (DMU) based on their deterministic (not necessarily stochastic) input–output data and factor analysis techniques, have been proposed as data reduction and classification technique, which can be applied in data envelopment analysis (DEA) technique for reduction input – output data. Numerical results reveal that the new approach shows a good consistency in ranking with DEA.

Keywords: Effectiveness, Decision Making, Data EnvelopmentAnalysis, Factor Analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2377

7346 Regionalism and Regionalization in Central Asia

Authors: L. Delovarova, A. Davar, S. Asanov, F. Kukeyeva

Abstract:

This article is dedicated to the question of regionalism and regionalization in contemporary international relations, with a specific focus on Central Asia. The article addresses the question of whether or not Central Asia can be referred to as a true geopolitical region. In addressing this question, the authors examine particular factors that are essential for the formation of a region, including those tied to the economy, energy, culture, and labor migration.

Keywords: Central Asia, integration, regionalization, regionalism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2742

7345 A Two Level Load Balancing Approach for Cloud Environment

Authors: Anurag Jain, Rajneesh Kumar

Abstract:

Cloud computing is the outcome of rapid growth of internet. Due to elastic nature of cloud computing and unpredictable behavior of user, load balancing is the major issue in cloud computing paradigm. An efficient load balancing technique can improve the performance in terms of efficient resource utilization and higher customer satisfaction. Load balancing can be implemented through task scheduling, resource allocation and task migration. Various parameters to analyze the performance of load balancing approach are response time, cost, data processing time and throughput. This paper demonstrates a two level load balancer approach by combining join idle queue and join shortest queue approach. Authors have used cloud analyst simulator to test proposed two level load balancer approach. The results are analyzed and compared with the existing algorithms and as observed, proposed work is one step ahead of existing techniques.

Keywords: Cloud Analyst, Cloud Computing, Join Idle Queue, Join Shortest Queue, Load balancing, Task Scheduling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1951