Search results for: Data Partition
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7417

Search results for: Data Partition

7387 A Decision Boundary based Discretization Technique using Resampling

Authors: Taimur Qureshi, Djamel A Zighed

Abstract:

Many supervised induction algorithms require discrete data, even while real data often comes in a discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on speed, accuracy and understandability of the induction models. Usually, discretization and other types of statistical processes are applied to subsets of the population as the entire population is practically inaccessible. For this reason we argue that the discretization performed on a sample of the population is only an estimate of the entire population. Most of the existing discretization methods, partition the attribute range into two or several intervals using a single or a set of cut points. In this paper, we introduce a technique by using resampling (such as bootstrap) to generate a set of candidate discretization points and thus, improving the discretization quality by providing a better estimation towards the entire population. Thus, the goal of this paper is to observe whether the resampling technique can lead to better discretization points, which opens up a new paradigm to construction of soft decision trees.

Keywords: Bootstrap, discretization, resampling, soft decision trees.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1397
7386 Real-Time Data Stream Partitioning over a Sliding Window in Real-Time Spatial Big Data

Authors: Sana Hamdi, Emna Bouazizi, Sami Faiz

Abstract:

In recent years, real-time spatial applications, like location-aware services and traffic monitoring, have become more and more important. Such applications result dynamic environments where data as well as queries are continuously moving. As a result, there is a tremendous amount of real-time spatial data generated every day. The growth of the data volume seems to outspeed the advance of our computing infrastructure. For instance, in real-time spatial Big Data, users expect to receive the results of each query within a short time period without holding in account the load of the system. But with a huge amount of real-time spatial data generated, the system performance degrades rapidly especially in overload situations. To solve this problem, we propose the use of data partitioning as an optimization technique. Traditional horizontal and vertical partitioning can increase the performance of the system and simplify data management. But they remain insufficient for real-time spatial Big data; they can’t deal with real-time and stream queries efficiently. Thus, in this paper, we propose a novel data partitioning approach for real-time spatial Big data named VPA-RTSBD (Vertical Partitioning Approach for Real-Time Spatial Big data). This contribution is an implementation of the Matching algorithm for traditional vertical partitioning. We find, firstly, the optimal attribute sequence by the use of Matching algorithm. Then, we propose a new cost model used for database partitioning, for keeping the data amount of each partition more balanced limit and for providing a parallel execution guarantees for the most frequent queries. VPA-RTSBD aims to obtain a real-time partitioning scheme and deals with stream data. It improves the performance of query execution by maximizing the degree of parallel execution. This affects QoS (Quality Of Service) improvement in real-time spatial Big Data especially with a huge volume of stream data. The performance of our contribution is evaluated via simulation experiments. The results show that the proposed algorithm is both efficient and scalable, and that it outperforms comparable algorithms.

Keywords: Real-Time Spatial Big Data, Quality Of Service, Vertical partitioning, Horizontal partitioning, Matching algorithm, Hamming distance, Stream query.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1016
7385 BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis

Authors: Mohamed A. Mahfouz, M. A. Ismail

Abstract:

Biclustering is a very useful data mining technique for identifying patterns where different genes are co-related based on a subset of conditions in gene expression analysis. Association rules mining is an efficient approach to achieve biclustering as in BIMODULE algorithm but it is sensitive to the value given to its input parameters and the discretization procedure used in the preprocessing step, also when noise is present, classical association rules miners discover multiple small fragments of the true bicluster, but miss the true bicluster itself. This paper formally presents a generalized noise tolerant bicluster model, termed as μBicluster. An iterative algorithm termed as BIDENS based on the proposed model is introduced that can discover a set of k possibly overlapping biclusters simultaneously. Our model uses a more flexible method to partition the dimensions to preserve meaningful and significant biclusters. The proposed algorithm allows discovering biclusters that hard to be discovered by BIMODULE. Experimental study on yeast, human gene expression data and several artificial datasets shows that our algorithm offers substantial improvements over several previously proposed biclustering algorithms.

Keywords: Machine learning, biclustering, bi-dimensional clustering, gene expression analysis, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1922
7384 Effect of Clustering on Energy Efficiency and Network Lifetime in Wireless Sensor Networks

Authors: Prakash G L, Chaitra K Meti, Poojitha K, Divya R.K.

Abstract:

Wireless Sensor Network is Multi hop Self-configuring Wireless Network consisting of sensor nodes. The deployment of wireless sensor networks in many application areas, e.g., aggregation services, requires self-organization of the network nodes into clusters. Efficient way to enhance the lifetime of the system is to partition the network into distinct clusters with a high energy node as cluster head. The different methods of node clustering techniques have appeared in the literature, and roughly fall into two families; those based on the construction of a dominating set and those which are based solely on energy considerations. Energy optimized cluster formation for a set of randomly scattered wireless sensors is presented. Sensors within a cluster are expected to be communicating with cluster head only. The energy constraint and limited computing resources of the sensor nodes present the major challenges in gathering the data. In this paper we propose a framework to study how partially correlated data affect the performance of clustering algorithms. The total energy consumption and network lifetime can be analyzed by combining random geometry techniques and rate distortion theory. We also present the relation between compression distortion and data correlation.

Keywords: Clusters, multi hop, random geometry, rate distortion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1596
7383 Removal of Polycyclic Aromatic Hydrocarbons Present in Tyre Pyrolytic Oil Using Low Cost Natural Adsorbents

Authors: Neha Budhwani

Abstract:

Polycyclic aromatic hydrocarbons (PAHs) are formed during the pyrolysis of scrap tyres to produce tyre pyrolytic oil (TPO). Due to carcinogenic, mutagenic, and toxic properties PAHs are priority pollutants. Hence it is essential to remove PAHs from TPO before utilising TPO as a petroleum fuel alternative (to run the engine). Agricultural wastes have promising future to be utilized as biosorbent due to their cost effectiveness, abundant availability, high biosorption capacity and renewability. Various low cost adsorbents were prepared from natural sources. Uptake of PAHs present in tyre pyrolytic oil was investigated using various low-cost adsorbents of natural origin including sawdust (shisham), coconut fiber, neem bark, chitin, activated charcoal. Adsorption experiments of different PAHs viz. naphthalene, acenaphthalene, biphenyl and anthracene have been carried out at ambient temperature (25°C) and at pH 7. It was observed that for any given PAH, the adsorption capacity increases with the lignin content. Freundlich constant Kf and 1/n have been evaluated and it was found that the adsorption isotherms of PAHs were in agreement with a Freundlich model, while the uptake capacity of PAHs followed the order: activated charcoal> saw dust (shisham) > coconut fiber > chitin. The partition coefficients in acetone-water, and the adsorption constants at equilibrium, could be linearly correlated with octanol–water partition coefficients. It is observed that natural adsorbents are good alternative for PAHs removal. Sawdust of Dalbergia sissoo, a by-product of sawmills was found to be a promising adsorbent for the removal of PAHs present in TPO. It is observed that adsorbents studied were comparable to those of some conventional adsorbents.

Keywords: Acenaphthene, anthracene, biphenyl, Coconut fiber, naphthalene, natural adsorbent, PAHs, TPO and wood powder (shisham).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4021
7382 Fuzzy Clustering Analysis in Real Estate Companies in China

Authors: Jianfeng Li, Feng Jin, Xiaoyu Yang

Abstract:

This paper applies fuzzy clustering algorithm in classifying real estate companies in China according to some general financial indexes, such as income per share, share accumulation fund, net profit margins, weighted net assets yield and shareholders' equity. By constructing and normalizing initial partition matrix, getting fuzzy similar matrix with Minkowski metric and gaining the transitive closure, the dynamic fuzzy clustering analysis for real estate companies is shown clearly that different clustered result change gradually with the threshold reducing, and then, it-s shown there is the similar relationship with the prices of those companies in stock market. In this way, it-s great valuable in contrasting the real estate companies- financial condition in order to grasp some good chances of investment, and so on.

Keywords: Fuzzy clustering algorithm, data mining, real estate company, financial analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1881
7381 Clustering of Variables Based On a Probabilistic Approach Defined on the Hypersphere

Authors: Paulo Gomes, Adelaide Figueiredo

Abstract:

We consider n individuals described by p standardized variables, represented by points of the surface of the unit hypersphere Sn-1. For a previous choice of n individuals we suppose that the set of observables variables comes from a mixture of bipolar Watson distribution defined on the hypersphere. EM and Dynamic Clusters algorithms are used for identification of such mixture. We obtain estimates of parameters for each Watson component and then a partition of the set of variables into homogeneous groups of variables. Additionally we will present a factor analysis model where unobservable factors are just the maximum likelihood estimators of Watson directional parameters, exactly the first principal component of data matrix associated to each group previously identified. Such alternative model it will yield us to directly interpretable solutions (simple structure), avoiding factors rotations.

Keywords: Dynamic Clusters algorithm, EM algorithm, Factor analysis model, Hierarchical Clustering, Watson distribution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1587
7380 Modeling of Cross Flow Classifier with Water Injection

Authors: E. Pikushchak, J. Dueck, L. Minkov

Abstract:

In hydrocyclones, the particle separation efficiency is limited by the suspended fine particles, which are discharged with the coarse product in the underflow. It is well known that injecting water in the conical part of the cyclone reduces the fine particle fraction in the underflow. This paper presents a mathematical model that simulates the water injection in the conical component. The model accounts for the fluid flow and the particle motion. Particle interaction, due to hindered settling caused by increased density and viscosity of the suspension, and fine particle entrainment by settling coarse particles are included in the model. Water injection in the conical part of the hydrocyclone is performed to reduce fine particle discharge in the underflow. The model demonstrates the impact of the injection rate, injection velocity, and injection location on the shape of the partition curve. The simulations are compared with experimental data of a 50-mm cyclone.

Keywords: Classification, fine particle processing, hydrocyclone, water injection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1923
7379 New Class of Chaotic Mappings in Symbol Space

Authors: Inese Bula

Abstract:

Symbolic dynamics studies dynamical systems on the basis of the symbol sequences obtained for a suitable partition of the state space. This approach exploits the property that system dynamics reduce to a shift operation in symbol space. This shift operator is a chaotic mapping. In this article we show that in the symbol space exist other chaotic mappings.

Keywords: Infinite symbol space, prefix metric, chaotic mapping, generator function, jump mapping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1486
7378 Democracy in Pakistan: A Critical Review Through the Lens of Dr. Israr Ahmed and Western Philosophers

Authors: Zoaib Mirza

Abstract:

Pakistan is an Islamic country that got its partition from India in 1947 so that the people could practice the religion of Islam. The political slogan to strive for independence was “What does Pakistan mean? There is no God but Allah”. The ideology of Pakistan was based on the notion that sovereignty only belonged to God Almighty (in Arabic, God means “Allah”), and Muslims will live in accordance with Islam principles. The Quran (Holy Book) and Sunnah (authentic practices of Prophet Mohammad, Peace Be Upon Him, that explains the application of the Quran) are foundations of the Islamic principles. It has been over 75 years, but unfortunately, Pakistan, due to its own political, social, and economic mistakes, is responsible for not being able to become a true Islamic nation to justify its partition from India. The rationale for writing this paper is to analyze the factors that led to changes in the democratic movements impacting the country's political, social, and economic growth. The methodology to examine the historical and political context of Pakistan’s history is by referencing the scholarly work of Israr Ahmed. He focused on Islamic theology, philosophy, and studies, offering insights into the historical and political context of the country. While from a Western perspective, Karl Marx, Mar Weber, Hannah Arendt, Sheldon Wolin, Paulo Freire, and Jacques Ranciere's philosophies specific to totalitarianism, politics, military rule, religion, capitalism, and superpower are used as the framework to analyze Pakistan’s democracy. The study's findings conclude that Pakistan's democracy is unstable and has been impacted by military and civilian governance, which led to political, social, and economic downfall. To improve the current situation, the citizens of Pakistan have to realize that the success of a nation is only dependent on the level of consciousness of the leader and not the political system. Therefore, it is the responsibility of every citizen to be conscious of how they select their leader and take responsibility for the current situation in Pakistan.

Keywords: Pakistan, Islam, democracy, totalitarianism, military, religion, capitalism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 298
7377 Comparison of Router Intelligent and Cooperative Host Intelligent Algorithms in a Continuous Model of Fixed Telecommunication Networks

Authors: Dávid Csercsik, Sándor Imre

Abstract:

The performance of state of the art worldwide telecommunication networks strongly depends on the efficiency of the applied routing mechanism. Game theoretical approaches to this problem offer new solutions. In this paper a new continuous network routing model is defined to describe data transfer in fixed telecommunication networks of multiple hosts. The nodes of the network correspond to routers whose latency is assumed to be traffic dependent. We propose that the whole traffic of the network can be decomposed to a finite number of tasks, which belong to various hosts. To describe the different latency-sensitivity, utility functions are defined for each task. The model is used to compare router and host intelligent types of routing methods, corresponding to various data transfer protocols. We analyze host intelligent routing as a transferable utility cooperative game with externalities. The main aim of the paper is to provide a framework in which the efficiency of various routing algorithms can be compared and the transferable utility game arising in the cooperative case can be analyzed.

Keywords: Routing, Telecommunication networks, Performance evaluation, Cooperative game theory, Partition function form games

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1816
7376 N-Sun Decomposition of Complete, Complete Bipartite and Some Harary Graphs

Authors: R. Anitha, R. S. Lekshmi

Abstract:

Graph decompositions are vital in the study of combinatorial design theory. A decomposition of a graph G is a partition of its edge set. An n-sun graph is a cycle Cn with an edge terminating in a vertex of degree one attached to each vertex. In this paper, we define n-sun decomposition of some even order graphs with a perfect matching. We have proved that the complete graph K2n, complete bipartite graph K2n, 2n and the Harary graph H4, 2n have n-sun decompositions. A labeling scheme is used to construct the n-suns.

Keywords: Decomposition, Hamilton cycle, n-sun graph, perfect matching, spanning tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2357
7375 Algebraic Quantum Error Correction Codes

Authors: Ming-Chung Tsai, Kuan-Peng Chen, Zheng-Yao

Abstract:

A systematic and exhaustive method based on the group structure of a unitary Lie algebra is proposed to generate an enormous number of quantum codes. With respect to the algebraic structure, the orthogonality condition, which is the central rule of generating quantum codes, is proved to be fully equivalent to the distinguishability of the elements in this structure. In addition, four types of quantum codes are classified according to the relation of the codeword operators and some initial quantum state. By linking the unitary Lie algebra with the additive group, the classical correspondences of some of these quantum codes can be rendered.

Keywords: Quotient-Algebra Partition, Codeword Spinors, Basis Codewords, Syndrome Spinors

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1380
7374 Bernstein-Galerkin Approach for Perturbed Constant-Coefficient Differential Equations, One-Dimensional Analysis

Authors: Diego Garijo

Abstract:

A numerical approach for solving constant-coefficient differential equations whose solutions exhibit boundary layer structure is built by inserting Bernstein Partition of Unity into Galerkin variational weak form. Due to the reproduction capability of Bernstein basis, such implementation shows excellent accuracy at boundaries and is able to capture sharp gradients of the field variable by p-refinement using regular distributions of equi-spaced evaluation points. The approximation is subjected to convergence experimentation and a procedure to assemble the discrete equations without a background integration mesh is proposed.

Keywords: Bernstein polynomials, Galerkin, differential equation, boundary layer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1807
7373 Influence of Non-Structural Elements on Dynamic Response of Multi-Storey Rc Building to Mining Shock

Authors: Joanna M. Dulińska, Maria Fabijańska

Abstract:

In the paper the results of calculations of the dynamic response of a multi-storey reinforced concrete building to a strong mining shock originated from the main region of mining activity in Poland (i.e. the Legnica-Glogow Copper District) are presented. The representative time histories of accelerations registered in three directions were used as ground motion data in calculations of the dynamic response of the structure. Two variants of a numerical model were applied: the model including only structural elements of the building and the model including both structural and non-structural elements (i.e. partition walls and ventilation ducts made of brick). It turned out that non-structural elements of multi-storey RC buildings have a small impact of about 10 % on natural frequencies of these structures. It was also proved that the dynamic response of building to mining shock obtained in case of inclusion of all non-structural elements in the numerical model is about 20 % smaller than in case of consideration of structural elements only. The principal stresses obtained in calculations of dynamic response of multi-storey building to strong mining shock are situated on the level of about 30% of values obtained from static analysis (dead load).

Keywords: Dynamic characteristics of buildings, mining shocks, dynamic response of buildings, non-structural elements

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1848
7372 Big Data: Big Challenges to Privacy and Data Protection

Authors: Abu Bakar Munir, Siti Hajar Mohd Yasin, Firdaus Muhammad-Sukki

Abstract:

This paper seeks to analyse the benefits of big data and more importantly the challenges it pose to the subject of privacy and data protection. First, the nature of big data will be briefly deliberated before presenting the potential of big data in the present days. Afterwards, the issue of privacy and data protection is highlighted before discussing the challenges of implementing this issue in big data. In conclusion, the paper will put forward the debate on the adequacy of the existing legal framework in protecting personal data in the era of big data.

Keywords: Big data, data protection, information, privacy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3876
7371 Color Constancy using Superpixel

Authors: Xingsheng Yuan, Zhengzhi Wang

Abstract:

Color constancy algorithms are generally based on the simplified assumption about the spectral distribution or the reflection attributes of the scene surface. However, in reality, these assumptions are too restrictive. The methodology is proposed to extend existing algorithm to applying color constancy locally to image patches rather than globally to the entire images. In this paper, a method based on low-level image features using superpixels is proposed. Superpixel segmentation partition an image into regions that are approximately uniform in size and shape. Instead of using entire pixel set for estimating the illuminant, only superpixels with the most valuable information are used. Based on large scale experiments on real-world scenes, it can be derived that the estimation is more accurate using superpixels than when using the entire image.

Keywords: color constancy, illuminant estimation, superpixel

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1429
7370 Effect of Flowrate and Coolant Temperature on the Efficiency of Progressive Freeze Concentration on Simulated Wastewater

Authors: M. Jusoh, R. Mohd Yunus, M. A. Abu Hassan

Abstract:

Freeze concentration freezes or crystallises the water molecules out as ice crystals and leaves behind a highly concentrated solution. In conventional suspension freeze concentration where ice crystals formed as a suspension in the mother liquor, separation of ice is difficult. The size of the ice crystals is still very limited which will require usage of scraped surface heat exchangers, which is very expensive and accounted for approximately 30% of the capital cost. This research is conducted using a newer method of freeze concentration, which is progressive freeze concentration. Ice crystals were formed as a layer on the designed heat exchanger surface. In this particular research, a helical structured copper crystallisation chamber was designed and fabricated. The effect of two operating conditions on the performance of the newly designed crystallisation chamber was investigated, which are circulation flowrate and coolant temperature. The performance of the design was evaluated by the effective partition constant, K, calculated from the volume and concentration of the solid and liquid phase. The system was also monitored by a data acquisition tool in order to see the temperature profile throughout the process. On completing the experimental work, it was found that higher flowrate resulted in a lower K, which translated into high efficiency. The efficiency is the highest at 1000 ml/min. It was also found that the process gives the highest efficiency at a coolant temperature of -6 °C.

Keywords: Freeze concentration, progressive freeze concentration, freeze wastewater treatment, ice crystals.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2133
7369 System Survivability in Networks in the Context of Defense/Attack Strategies: The Large Scale

Authors: A. Ben Yaghlane, M. N. Azaiez, M. Mrad

Abstract:

We investigate the large scale of networks in the context of network survivability under attack. We use appropriate techniques to evaluate and the attacker-based- and the defenderbased- network survivability. The attacker is unaware of the operated links by the defender. Each attacked link has some pre-specified probability to be disconnected. The defender choice is so that to maximize the chance of successfully sending the flow to the destination node. The attacker however will select the cut-set with the highest chance to be disabled in order to partition the network. Moreover, we extend the problem to the case of selecting the best p paths to operate by the defender and the best k cut-sets to target by the attacker, for arbitrary integers p,k>1. We investigate some variations of the problem and suggest polynomial-time solutions.

Keywords: Defense/attack strategies, large scale, networks, partitioning a network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1444
7368 Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs

Authors: Weizhi Xu, Zhiyong Liu, Dongrui Fan, Shuai Jiao, Xiaochun Ye, Fenglong Song, Chenggang Yan

Abstract:

Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO portion of the matrix is partitioned recursively into a ELL portion and a COO portion in the process of creating HYB-R format to ensure that there are as many non-zeros as possible in ELL format. The method of partitioning the matrix is an important problem for HYB-R kernel, so we also try to tune the parameters to partition the matrix for higher performance. Experimental results show that our method can get better performance than the fastest kernel (HYB) in NVIDIA-s SpMV library with as high as 17% speedup.

Keywords: GPU, HYB-R, Many-core, Performance Tuning, SpMV

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1954
7367 Data Preprocessing for Supervised Leaning

Authors: S. B. Kotsiantis, D. Kanellopoulos, P. E. Pintelas

Abstract:

Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.

Keywords: Data mining, feature selection, data cleaning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5960
7366 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: Analytics, Big Data in Education, Hadoop, Learning Analytics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4832
7365 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, Wang Qun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSql), and gives 6 data cleaning methods based on these algorithms.

Keywords: Data cleaning, dependency rules, violation data discovery, data repair.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2572
7364 Coalescing Data Marts

Authors: N. Parimala, P. Pahwa

Abstract:

OLAP uses multidimensional structures, to provide access to data for analysis. Traditionally, OLAP operations are more focused on retrieving data from a single data mart. An exception is the drill across operator. This, however, is restricted to retrieving facts on common dimensions of the multiple data marts. Our concern is to define further operations while retrieving data from multiple data marts. Towards this, we have defined six operations which coalesce data marts. While doing so we consider the common as well as the non-common dimensions of the data marts.

Keywords: Data warehouse, Dimension, OLAP, Star Schema.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1523
7363 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: Mining Big Data, Big Data, Machine learning, Data Streams, Telecommunication.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2432
7362 Graph-based High Level Motion Segmentation using Normalized Cuts

Authors: Sungju Yun, Anjin Park, Keechul Jung

Abstract:

Motion capture devices have been utilized in producing several contents, such as movies and video games. However, since motion capture devices are expensive and inconvenient to use, motions segmented from captured data was recycled and synthesized to utilize it in another contents, but the motions were generally segmented by contents producers in manual. Therefore, automatic motion segmentation is recently getting a lot of attentions. Previous approaches are divided into on-line and off-line, where on-line approaches segment motions based on similarities between neighboring frames and off-line approaches segment motions by capturing the global characteristics in feature space. In this paper, we propose a graph-based high-level motion segmentation method. Since high-level motions consist of several repeated frames within temporal distances, we consider all similarities among all frames within the temporal distance. This is achieved by constructing a graph, where each vertex represents a frame and the edges between the frames are weighted by their similarity. Then, normalized cuts algorithm is used to partition the constructed graph into several sub-graphs by globally finding minimum cuts. In the experiments, the results using the proposed method showed better performance than PCA-based method in on-line and GMM-based method in off-line, as the proposed method globally segment motions from the graph constructed based similarities between neighboring frames as well as similarities among all frames within temporal distances.

Keywords: Capture Devices, High-Level Motion, Motion Segmentation, Normalized Cuts

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1282
7361 Comparative Analysis of Diverse Collection of Big Data Analytics Tools

Authors: S. Vidhya, S. Sarumathi, N. Shanthi

Abstract:

Over the past era, there have been a lot of efforts and studies are carried out in growing proficient tools for performing various tasks in big data. Recently big data have gotten a lot of publicity for their good reasons. Due to the large and complex collection of datasets it is difficult to process on traditional data processing applications. This concern turns to be further mandatory for producing various tools in big data. Moreover, the main aim of big data analytics is to utilize the advanced analytic techniques besides very huge, different datasets which contain diverse sizes from terabytes to zettabytes and diverse types such as structured or unstructured and batch or streaming. Big data is useful for data sets where their size or type is away from the capability of traditional relational databases for capturing, managing and processing the data with low-latency. Thus the out coming challenges tend to the occurrence of powerful big data tools. In this survey, a various collection of big data tools are illustrated and also compared with the salient features.

Keywords: Big data, Big data analytics, Business analytics, Data analysis, Data visualization, Data discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3743
7360 Numerical Investigation on the Progressive Collapse Resistance of an RC Building with Brick Infills under Column Loss

Authors: Meng-Hao Tsai, Tsuei-Chiang Huang

Abstract:

Interior brick-infill partitions are usually considered as non-structural components and only their weight is accounted for in practical structural design. In this study, their effect on the progressive collapse resistance of an RC building subjected to sudden column loss is investigated. Three notional column loss conditions with four different brick-infill locations are considered. Column-loss response analyses of the RC building with and without brick infills are carried out. Analysis results indicate that the collapse resistance is only slightly influenced by the brick infills due to their brittle failure characteristic. Even so, they may help to reduce the inelastic displacement response under column loss. For practical engineering, it is reasonably conservative to only consider the weight of brick-infill partitions in the structural analysis.

Keywords: Progressive collapse, column loss, brick-infill partition, compression strut.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2095
7359 Forecasting US Dollar/Euro Exchange Rate with Genetic Fuzzy Predictor

Authors: R. Mechgoug, A. Titaouine

Abstract:

Fuzzy systems have been successfully used for exchange rate forecasting. However, fuzzy system is very confusing and complex to be designed by an expert, as there is a large set of parameters (fuzzy knowledge base) that must be selected, it is not a simple task to select the appropriate fuzzy knowledge base for an exchange rate forecasting. The researchers often look the effect of fuzzy knowledge base on the performances of fuzzy system forecasting. This paper proposes a genetic fuzzy predictor to forecast the future value of daily US Dollar/Euro exchange rate time’s series. A range of methodologies based on a set of fuzzy predictor’s which allow the forecasting of the same time series, but with a different fuzzy partition. Each fuzzy predictor is built from two stages, where each stage is performed by a real genetic algorithm.

Keywords: Foreign exchange rate, time series forecasting, Fuzzy System, and Genetic Algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1966
7358 Multi-labeled Data Expressed by a Set of Labels

Authors: Tetsuya Furukawa, Masahiro Kuzunishi

Abstract:

Collected data must be organized to be utilized efficiently, and hierarchical classification of data is efficient approach to organize data. When data is classified to multiple categories or annotated with a set of labels, users request multi-labeled data by giving a set of labels. There are several interpretations of the data expressed by a set of labels. This paper discusses which data is expressed by a set of labels by introducing orders for sets of labels and shows that there are four types of orders, which are characterized by whether the labels of expressed data includes every label of the given set of labels within the range of the set. Desirable properties of the orders, data is also expressed by the higher set of labels and different sets of labels express different data, are discussed for the orders.

Keywords: Classification Hierarchies, Multi-labeled Data, Multiple Classificaiton, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1269