Search results for: data stream
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25005

Search results for: data stream

24525 Generating Swarm Satellite Data Using Long Short-Term Memory and Generative Adversarial Networks for the Detection of Seismic Precursors

Authors: Yaxin Bi

Abstract:

Accurate prediction and understanding of the evolution mechanisms of earthquakes remain challenging in the fields of geology, geophysics, and seismology. This study leverages Long Short-Term Memory (LSTM) networks and Generative Adversarial Networks (GANs), a generative model tailored to time-series data, for generating synthetic time series data based on Swarm satellite data, which will be used for detecting seismic anomalies. LSTMs demonstrated commendable predictive performance in generating synthetic data across multiple countries. In contrast, the GAN models struggled to generate synthetic data, often producing non-informative values, although they were able to capture the data distribution of the time series. These findings highlight both the promise and challenges associated with applying deep learning techniques to generate synthetic data, underscoring the potential of deep learning in generating synthetic electromagnetic satellite data.

Keywords: LSTM, GAN, earthquake, synthetic data, generative AI, seismic precursors

Procedia PDF Downloads 17
24524 Generation of Quasi-Measurement Data for On-Line Process Data Analysis

Authors: Hyun-Woo Cho

Abstract:

For ensuring the safety of a manufacturing process one should quickly identify an assignable cause of a fault in an on-line basis. To this end, many statistical techniques including linear and nonlinear methods have been frequently utilized. However, such methods possessed a major problem of small sample size, which is mostly attributed to the characteristics of empirical models used for reference models. This work presents a new method to overcome the insufficiency of measurement data in the monitoring and diagnosis tasks. Some quasi-measurement data are generated from existing data based on the two indices of similarity and importance. The performance of the method is demonstrated using a real data set. The results turn out that the presented methods are able to handle the insufficiency problem successfully. In addition, it is shown to be quite efficient in terms of computational speed and memory usage, and thus on-line implementation of the method is straightforward for monitoring and diagnosis purposes.

Keywords: data analysis, diagnosis, monitoring, process data, quality control

Procedia PDF Downloads 469
24523 Geomorphologic Evolution of the Southern Habble-Rud River Basin, North of Iran

Authors: Maryam Jaberi, Siavosh Shayan, Mojtaba Yamani

Abstract:

Habble-Rud River basin (HR), up to 100 km length, one of the largest watersheds which drain into deserts to the north of Central Iran (Dasht-e Kavir). This stream is oblique with the NE-SW trending, flow in the southern range of central Alborz Mountains and the northern border of Central Iran. The end of the ~17 km suddenly change direction and with the southern trending to have a morphology which meanders passes through the Alborz Mountain ridge and flows into the Garmsar plain where it forms one of the largest alluvial fans in Iran, i.e. the vast Garmsar alluvial fan with an area of 476 km2. This study was carried out through morphometric analyses, longitudinal river profiles, and study of geomorpholic evidence such as fluvial terraces, gypsum-salt domes, seismic data, and satellite images. This study aimed to investigate the changes in the pattern of rivers in the southern part of the HR river basin. The southern part of HR river basin located at the southern foothills of the Central Alborz is characterized the thrust faults (Sorkheh-Kalut and Garmsar faults), folds,diapirs and arid climate. The activity of more than 10 salt domes that belong to the Oligocene-Miocene period has considerably influenced the pattern of streams in this region. Dissolution of these domes has not only reduced the quality of water and soil resources, but also has led to the formation of badlands and gullies.Our results indicated that the pattern of rivers in the southern part of HR river basin was influenced by discharge of the HR river in Quaternary, geological structure, subsidence of Central Iran and vertical uplift of Alborz mountain. These agents caused the formation meanders in the southern part of the HR River and evaluation of the seasonal rivers like Shoor-Darre and Garmabsar.

Keywords: geomorphologic evaluation, rivers pattern, Habble-Rud River basin, seasonal rivers

Procedia PDF Downloads 493
24522 Evaluation of Water Quality for the Kurtbogazi Dam Outlet and the Streams Feeding the Dam (Ankara, Turkey)

Authors: Gulsen Tozsin, Fatma Bakir, Cemil Acar, Ercument Koc

Abstract:

Kurtbogazi Dam has gained special meaning for Ankara, Turkey for the last decade due to the rapid depletion of nearby resources of drinking water. In this study, the results of the analyses of Kurtbogazi Dam outlet water and the rivers flowing into the Kurtbogazi Dam were discussed for the period of last five years between 2008 and 2012. The quality of these surface water resources were evaluated in terms of pH, temperature, biochemical oxygen demand (BOD5), nitrate, phosphate and chlorine. They were classified according to the Council Directive (75/440/EEC). Moreover, the properties of these surface waters were assessed to determine the quality of water for drinking and irrigation purposes using Piper, US Salinity Laboratory and Wilcox diagrams. The results revealed that the quality of all the investigated water sources are generally at satisfactory level as surface water except for Pazar Stream in terms of ortho-phosphate and BOD5 concentration for 2008.

Keywords: Kurtbogazi dam, water quality assessment, Ankara water, water supply

Procedia PDF Downloads 359
24521 Emerging Technology for Business Intelligence Applications

Authors: Hsien-Tsen Wang

Abstract:

Business Intelligence (BI) has long helped organizations make informed decisions based on data-driven insights and gain competitive advantages in the marketplace. In the past two decades, businesses witnessed not only the dramatically increasing volume and heterogeneity of business data but also the emergence of new technologies, such as Artificial Intelligence (AI), Semantic Web (SW), Cloud Computing, and Big Data. It is plausible that the convergence of these technologies would bring more value out of business data by establishing linked data frameworks and connecting in ways that enable advanced analytics and improved data utilization. In this paper, we first review and summarize current BI applications and methodology. Emerging technologies that can be integrated into BI applications are then discussed. Finally, we conclude with a proposed synergy framework that aims at achieving a more flexible, scalable, and intelligent BI solution.

Keywords: business intelligence, artificial intelligence, semantic web, big data, cloud computing

Procedia PDF Downloads 84
24520 Temporal Estimation of Hydrodynamic Parameter Variability in Constructed Wetlands

Authors: Mohammad Moezzibadi, Isabelle Charpentier, Adrien Wanko, Robert Mosé

Abstract:

The calibration of hydrodynamic parameters for subsurface constructed wetlands (CWs) is a sensitive process since highly non-linear equations are involved in unsaturated flow modeling. CW systems are engineered systems designed to favour natural treatment processes involving wetland vegetation, soil, and their microbial flora. Their significant efficiency at reducing the ecological impact of urban runoff has been recently proved in the field. Numerical flow modeling in a vertical variably saturated CW is here carried out by implementing the Richards model by means of a mixed hybrid finite element method (MHFEM), particularly well adapted to the simulation of heterogeneous media, and the van Genuchten-Mualem parametrization. For validation purposes, MHFEM results were compared to those of HYDRUS (a software based on a finite element discretization). As van Genuchten-Mualem soil hydrodynamic parameters depend on water content, their estimation is subject to considerable experimental and numerical studies. In particular, the sensitivity analysis performed with respect to the van Genuchten-Mualem parameters reveals a predominant influence of the shape parameters α, n and the saturated conductivity of the filter on the piezometric heads, during saturation and desaturation. Modeling issues arise when the soil reaches oven-dry conditions. A particular attention should also be brought to boundary condition modeling (surface ponding or evaporation) to be able to tackle different sequences of rainfall-runoff events. For proper parameter identification, large field datasets would be needed. As these are usually not available, notably due to the randomness of the storm events, we thus propose a simple, robust and low-cost numerical method for the inverse modeling of the soil hydrodynamic properties. Among the methods, the variational data assimilation technique introduced by Le Dimet and Talagrand is applied. To that end, a variational data assimilation technique is implemented by applying automatic differentiation (AD) to augment computer codes with derivative computations. Note that very little effort is needed to obtain the differentiated code using the on-line Tapenade AD engine. Field data are collected for a three-layered CW located in Strasbourg (Alsace, France) at the water edge of the urban water stream Ostwaldergraben, during several months. Identification experiments are conducted by comparing measured and computed piezometric head by means of the least square objective function. The temporal variability of hydrodynamic parameter is then assessed and analyzed.

Keywords: automatic differentiation, constructed wetland, inverse method, mixed hybrid FEM, sensitivity analysis

Procedia PDF Downloads 146
24519 Using Equipment Telemetry Data for Condition-Based maintenance decisions

Authors: John Q. Todd

Abstract:

Given that modern equipment can provide comprehensive health, status, and error condition data via built-in sensors, maintenance organizations have a new and valuable source of insight to take advantage of. This presentation will expose what these data payloads might look like and how they can be filtered, visualized, calculated into metrics, used for machine learning, and generate alerts for further action.

Keywords: condition based maintenance, equipment data, metrics, alerts

Procedia PDF Downloads 169
24518 Ethics Can Enable Open Source Data Research

Authors: Dragana Calic

Abstract:

The openness, availability and the sheer volume of big data have provided, what some regard as, an invaluable and rich dataset. Researchers, businesses, advertising agencies, medical institutions, to name only a few, collect, share, and analyze this data to enable their processes and decision making. However, there are important ethical considerations associated with the use of big data. The rapidly evolving nature of online technologies has overtaken the many legislative, privacy, and ethical frameworks and principles that exist. For example, should we obtain consent to use people’s online data, and under what circumstances can privacy considerations be overridden? Current guidance on how to appropriately and ethically handle big data is inconsistent. Consequently, this paper focuses on two quite distinct but related ethical considerations that are at the core of the use of big data for research purposes. They include empowering the producers of data and empowering researchers who want to study big data. The first consideration focuses on informed consent which is at the core of empowering producers of data. In this paper, we discuss some of the complexities associated with informed consent and consider studies of producers’ perceptions to inform research ethics guidelines and practice. The second consideration focuses on the researcher. Similarly, we explore studies that focus on researchers’ perceptions and experiences.

Keywords: big data, ethics, producers’ perceptions, researchers’ perceptions

Procedia PDF Downloads 277
24517 Peristaltic Transport of a Jeffrey Fluid with Double-Diffusive Convection in Nanofluids in the Presence of Inclined Magnetic Field

Authors: Safia Akram

Abstract:

In this article, the effects of peristaltic transport with double-diffusive convection in nanofluids through an asymmetric channel with different waveforms is presented. Mathematical modelling for two-dimensional and two directional flows of a Jeffrey fluid model along with double-diffusive convection in nanofluids are given. Exact solutions are obtained for nanoparticle fraction field, concentration field, temperature field, stream functions, pressure gradient and pressure rise in terms of axial and transverse coordinates under the restrictions of long wavelength and low Reynolds number. With the help of computational and graphical results the effects of Brownian motion, thermophoresis, Dufour, Soret, and Grashof numbers (thermal, concentration, nanoparticles) on peristaltic flow patterns with double-diffusive convection are discussed.

Keywords: nanofluid particles, peristaltic flow, Jeffrey fluid, magnetic field, asymmetric channel, different waveforms

Procedia PDF Downloads 370
24516 The Proposal of Modification of California Pipe Method for Inclined Pipe

Authors: Wojciech Dąbrowski, Joanna Bąk, Laurent Solliec

Abstract:

Nowadays technical and technological progress and constant development of methods and devices applied to sanitary engineering is indispensable. Issues related to sanitary engineering involve flow measurements for water and wastewater. The precise measurement is very important and pivotal for further actions, like monitoring. There are many methods and techniques of flow measurement in the area of sanitary engineering. Weirs and flumes are well–known methods and common used. But also there are alternative methods. Some of them are very simple methods, others are solutions using high technique. The old–time method combined with new technique could be more useful than earlier. Paper describes substitute method of flow gauging (California pipe method) and proposal of modification of this method used for inclined pipe. Examination of possibility of improving and developing old–time methods is direction of the investigation.

Keywords: California pipe, sewerage, flow rate measurement, water, wastewater, improve, modification, hydraulic monitoring, stream

Procedia PDF Downloads 426
24515 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning

Authors: Walid Cherif

Abstract:

Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.

Keywords: data mining, knowledge discovery, machine learning, similarity measurement, supervised classification

Procedia PDF Downloads 456
24514 Seismic Data Scaling: Uncertainties, Potential and Applications in Workstation Interpretation

Authors: Ankur Mundhra, Shubhadeep Chakraborty, Y. R. Singh, Vishal Das

Abstract:

Seismic data scaling affects the dynamic range of a data and with present day lower costs of storage and higher reliability of Hard Disk data, scaling is not suggested. However, in dealing with data of different vintages, which perhaps were processed in 16 bits or even 8 bits and are need to be processed with 32 bit available data, scaling is performed. Also, scaling amplifies low amplitude events in deeper region which disappear due to high amplitude shallow events that saturate amplitude scale. We have focused on significance of scaling data to aid interpretation. This study elucidates a proper seismic loading procedure in workstations without using default preset parameters as available in most software suites. Differences and distribution of amplitude values at different depth for seismic data are probed in this exercise. Proper loading parameters are identified and associated steps are explained that needs to be taken care of while loading data. Finally, the exercise interprets the un-certainties which might arise when correlating scaled and unscaled versions of seismic data with synthetics. As, seismic well tie correlates the seismic reflection events with well markers, for our study it is used to identify regions which are enhanced and/or affected by scaling parameter(s).

Keywords: clipping, compression, resolution, seismic scaling

Procedia PDF Downloads 460
24513 Avoiding Packet Drop for Improved through Put in the Multi-Hop Wireless N/W

Authors: Manish Kumar Rajak, Sanjay Gupta

Abstract:

Mobile ad hoc networks (MANETs) are infrastructure less and intercommunicate using single-hop and multi-hop paths. Network based congestion avoidance which involves managing the queues in the network devices is an integral part of any network. QoS: A set of service requirements that are met by the network while transferring a packet stream from a source to a destination. Especially in MANETs, packet loss results in increased overheads. This paper presents a new algorithm to avoid congestion using one or more queue on nodes and corresponding flow rate decided in advance for each node. When any node attains an initial value of queue then it sends this status to its downstream nodes which in turn uses the pre-decided flow rate of packet transfer to its upstream nodes. The flow rate on each node is adjusted according to the status received from its upstream nodes. This proposed algorithm uses the existing infrastructure to inform to other nodes about its current queue status.

Keywords: mesh networks, MANET, packet count, threshold, throughput

Procedia PDF Downloads 460
24512 The Issues of Irrigation and Drainage in Kebbi State and Their Effective Solution for a Sustainable Agriculture in Kebbi State, Nigeria

Authors: Mumtaz Ahmed Sohag, Ishaq Ahmed Sohag

Abstract:

Kebbi State, located in the Nort-West of Nigeria, is rich in water resources as the major rivers viz. Niger and Rima irrigate a vast majority of land. Besides, there is significant amount of groundwater, which farmers use for agriculture purpose. The groundwater is also a major source of agricultural and domestic water as wells are installed in almost all parts of the region. Although Kebbi State is rich in water, however, there are some pertinent issues which are hampering its agricultural productivity. The low lands (locally called Fadama), has spread out to a vast area. It is inundated every year during the rainy season which lasts from June to September every year. The farmers grow rice during the rainy season when water is standing. They cannot do further agricultural activity for almost two months due to high standing water. This has resulted in widespread waterlogging problem. Besides, the impact of climate change is resulting in rapid variation in river/stream flows. The information about water bodies regarding the availability of water for agricultural and other uses and the behavior of rivers at different flows is seldom available. Furthermore, sediment load (suspended and bedload) is not measured due to which land erosion cannot be countered effectively. This study, carried out in seven different irrigation regions of Kebbi state, found that diversion structures need to be constructed at some strategic locations for the supply of surface water to the farmers. The water table needs to be lowered through an effective drainage system. The monitoring of water bodies is crucial for sound data to help efficient regulation and management of water. Construction of embankments is necessary to control frequent floods in the rivers of Niger and Rima. Furthermore, farmers need capacity and awareness for participatory irrigation management.

Keywords: water bodies, floods, agriculture, waterlogging

Procedia PDF Downloads 226
24511 Association of Social Data as a Tool to Support Government Decision Making

Authors: Diego Rodrigues, Marcelo Lisboa, Elismar Batista, Marcos Dias

Abstract:

Based on data on child labor, this work arises questions about how to understand and locate the factors that make up the child labor rates, and which properties are important to analyze these cases. Using data mining techniques to discover valid patterns on Brazilian social databases were evaluated data of child labor in the State of Tocantins (located north of Brazil with a territory of 277000 km2 and comprises 139 counties). This work aims to detect factors that are deterministic for the practice of child labor and their relationships with financial indicators, educational, regional and social, generating information that is not explicit in the government database, thus enabling better monitoring and updating policies for this purpose.

Keywords: social data, government decision making, association of social data, data mining

Procedia PDF Downloads 359
24510 Outlier Detection in Stock Market Data using Tukey Method and Wavelet Transform

Authors: Sadam Alwadi

Abstract:

Outlier values become a problem that frequently occurs in the data observation or recording process. Thus, the need for data imputation has become an essential matter. In this work, it will make use of the methods described in the prior work to detect the outlier values based on a collection of stock market data. In order to implement the detection and find some solutions that maybe helpful for investors, real closed price data were obtained from the Amman Stock Exchange (ASE). Tukey and Maximum Overlapping Discrete Wavelet Transform (MODWT) methods will be used to impute the detect the outlier values.

Keywords: outlier values, imputation, stock market data, detecting, estimation

Procedia PDF Downloads 74
24509 PEINS: A Generic Compression Scheme Using Probabilistic Encoding and Irrational Number Storage

Authors: P. Jayashree, S. Rajkumar

Abstract:

With social networks and smart devices generating a multitude of data, effective data management is the need of the hour for networks and cloud applications. Some applications need effective storage while some other applications need effective communication over networks and data reduction comes as a handy solution to meet out both requirements. Most of the data compression techniques are based on data statistics and may result in either lossy or lossless data reductions. Though lossy reductions produce better compression ratios compared to lossless methods, many applications require data accuracy and miniature details to be preserved. A variety of data compression algorithms does exist in the literature for different forms of data like text, image, and multimedia data. In the proposed work, a generic progressive compression algorithm, based on probabilistic encoding, called PEINS is projected as an enhancement over irrational number stored coding technique to cater to storage issues of increasing data volumes as a cost effective solution, which also offers data security as a secondary outcome to some extent. The proposed work reveals cost effectiveness in terms of better compression ratio with no deterioration in compression time.

Keywords: compression ratio, generic compression, irrational number storage, probabilistic encoding

Procedia PDF Downloads 278
24508 Iot Device Cost Effective Storage Architecture and Real-Time Data Analysis/Data Privacy Framework

Authors: Femi Elegbeleye, Omobayo Esan, Muienge Mbodila, Patrick Bowe

Abstract:

This paper focused on cost effective storage architecture using fog and cloud data storage gateway and presented the design of the framework for the data privacy model and data analytics framework on a real-time analysis when using machine learning method. The paper began with the system analysis, system architecture and its component design, as well as the overall system operations. The several results obtained from this study on data privacy model shows that when two or more data privacy model is combined we tend to have a more stronger privacy to our data, and when fog storage gateway have several advantages over using the traditional cloud storage, from our result shows fog has reduced latency/delay, low bandwidth consumption, and energy usage when been compare with cloud storage, therefore, fog storage will help to lessen excessive cost. This paper dwelt more on the system descriptions, the researchers focused on the research design and framework design for the data privacy model, data storage, and real-time analytics. This paper also shows the major system components and their framework specification. And lastly, the overall research system architecture was shown, its structure, and its interrelationships.

Keywords: IoT, fog, cloud, data analysis, data privacy

Procedia PDF Downloads 87
24507 Comparison of Selected Pier-Scour Equations for Wide Piers Using Field Data

Authors: Nordila Ahmad, Thamer Mohammad, Bruce W. Melville, Zuliziana Suif

Abstract:

Current methods for predicting local scour at wide bridge piers, were developed on the basis of laboratory studies and very limited scour prediction were tested with field data. Laboratory wide pier scour equation from previous findings with field data were presented. A wide range of field data were used and it consists of both live-bed and clear-water scour. A method for assessing the quality of the data was developed and applied to the data set. Three other wide pier-scour equations from the literature were used to compare the performance of each predictive method. The best-performing scour equation were analyzed using statistical analysis. Comparisons of computed and observed scour depths indicate that the equation from the previous publication produced the smallest discrepancy ratio and RMSE value when compared with the large amount of laboratory and field data.

Keywords: field data, local scour, scour equation, wide piers

Procedia PDF Downloads 395
24506 The Maximum Throughput Analysis of UAV Datalink 802.11b Protocol

Authors: Inkyu Kim, SangMan Moon

Abstract:

This IEEE 802.11b protocol provides up to 11Mbps data rate, whereas aerospace industry wants to seek higher data rate COTS data link system in the UAV. The Total Maximum Throughput (TMT) and delay time are studied on many researchers in the past years This paper provides theoretical data throughput performance of UAV formation flight data link using the existing 802.11b performance theory. We operate the UAV formation flight with more than 30 quad copters with 802.11b protocol. We may be predicting that UAV formation flight numbers have to bound data link protocol performance limitations.

Keywords: UAV datalink, UAV formation flight datalink, UAV WLAN datalink application, UAV IEEE 802.11b datalink application

Procedia PDF Downloads 376
24505 Methods for Distinction of Cattle Using Supervised Learning

Authors: Radoslav Židek, Veronika Šidlová, Radovan Kasarda, Birgit Fuerst-Waltl

Abstract:

Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.

Keywords: genetic data, Pinzgau cattle, supervised learning, machine learning

Procedia PDF Downloads 536
24504 Router 1X3 - RTL Design and Verification

Authors: Nidhi Gopal

Abstract:

Routing is the process of moving a packet of data from source to destination and enables messages to pass from one computer to another and eventually reach the target machine. A router is a networking device that forwards data packets between computer networks. It is connected to two or more data lines from different networks (as opposed to a network switch, which connects data lines from one single network). This paper mainly emphasizes upon the study of router device, its top level architecture, and how various sub-modules of router i.e. Register, FIFO, FSM and Synchronizer are synthesized, and simulated and finally connected to its top module.

Keywords: data packets, networking, router, routing

Procedia PDF Downloads 790
24503 Assessment of the Water Quality of the Nhue River in Vietnam and its Suitability for Irrigation Water

Authors: Thi Lan Huong Nguyen, Motohei Kanayama, Takahiro Higashi, Van Chinh Le, Thu Ha Doan, Anh Dao Chu

Abstract:

The Nhue River in Vietnam is the main source of irrigation water for suburban agricultural land and fish farm. Wastewater from the industrial plants located along these rivers has been discharged, which has degraded the water quality of the rivers. The present paper describes the chemical properties of water from the river focusing on heavy metal pollution and the suitability of water quality for irrigation. Water from the river was heavily polluted with heavy metals such as Pb, Cu, Zn, Cr, Cd, and Ni. Dissolved oxygen, COD, and total suspended solids, and the concentrations of all heavy metals exceeded the Vietnamese standard for surface water quality in all investigated sites. The concentrations of some heavy metals such as Cu, Cd, Cr and Ni were over the internationally recommended WHO maximum limits for irrigation water. A wide variation in heavy metal concentration of water due to metal types is the result of wastewater discharged from different industrial sources.

Keywords: heavy metals, stream water, irrigation, industry

Procedia PDF Downloads 388
24502 Noise Reduction in Web Data: A Learning Approach Based on Dynamic User Interests

Authors: Julius Onyancha, Valentina Plekhanova

Abstract:

One of the significant issues facing web users is the amount of noise in web data which hinders the process of finding useful information in relation to their dynamic interests. Current research works consider noise as any data that does not form part of the main web page and propose noise web data reduction tools which mainly focus on eliminating noise in relation to the content and layout of web data. This paper argues that not all data that form part of the main web page is of a user interest and not all noise data is actually noise to a given user. Therefore, learning of noise web data allocated to the user requests ensures not only reduction of noisiness level in a web user profile, but also a decrease in the loss of useful information hence improves the quality of a web user profile. Noise Web Data Learning (NWDL) tool/algorithm capable of learning noise web data in web user profile is proposed. The proposed work considers elimination of noise data in relation to dynamic user interest. In order to validate the performance of the proposed work, an experimental design setup is presented. The results obtained are compared with the current algorithms applied in noise web data reduction process. The experimental results show that the proposed work considers the dynamic change of user interest prior to elimination of noise data. The proposed work contributes towards improving the quality of a web user profile by reducing the amount of useful information eliminated as noise.

Keywords: web log data, web user profile, user interest, noise web data learning, machine learning

Procedia PDF Downloads 252
24501 Data Mining and Knowledge Management Application to Enhance Business Operations: An Exploratory Study

Authors: Zeba Mahmood

Abstract:

The modern business organizations are adopting technological advancement to achieve competitive edge and satisfy their consumer. The development in the field of Information technology systems has changed the way of conducting business today. Business operations today rely more on the data they obtained and this data is continuously increasing in volume. The data stored in different locations is difficult to find and use without the effective implementation of Data mining and Knowledge management techniques. Organizations who smartly identify, obtain and then convert data in useful formats for their decision making and operational improvements create additional value for their customers and enhance their operational capabilities. Marketers and Customer relationship departments of firm use Data mining techniques to make relevant decisions, this paper emphasizes on the identification of different data mining and Knowledge management techniques that are applied to different business industries. The challenges and issues of execution of these techniques are also discussed and critically analyzed in this paper.

Keywords: knowledge, knowledge management, knowledge discovery in databases, business, operational, information, data mining

Procedia PDF Downloads 520
24500 Indexing and Incremental Approach Using Map Reduce Bipartite Graph (MRBG) for Mining Evolving Big Data

Authors: Adarsh Shroff

Abstract:

Big data is a collection of dataset so large and complex that it becomes difficult to process using data base management tools. To perform operations like search, analysis, visualization on big data by using data mining; which is the process of extraction of patterns or knowledge from large data set. In recent years, the data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. This project uses i2MapReduce, an incremental processing extension to Map Reduce, the most widely used framework for mining big data. I2MapReduce performs key-value pair level incremental processing rather than task level re-computation, supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. To optimize the mining results, evaluate i2MapReduce using a one-step algorithm and three iterative algorithms with diverse computation characteristics for efficient mining.

Keywords: big data, map reduce, incremental processing, iterative computation

Procedia PDF Downloads 335
24499 Analyzing Large Scale Recurrent Event Data with a Divide-And-Conquer Approach

Authors: Jerry Q. Cheng

Abstract:

Currently, in analyzing large-scale recurrent event data, there are many challenges such as memory limitations, unscalable computing time, etc. In this research, a divide-and-conquer method is proposed using parametric frailty models. Specifically, the data is randomly divided into many subsets, and the maximum likelihood estimator from each individual data set is obtained. Then a weighted method is proposed to combine these individual estimators as the final estimator. It is shown that this divide-and-conquer estimator is asymptotically equivalent to the estimator based on the full data. Simulation studies are conducted to demonstrate the performance of this proposed method. This approach is applied to a large real dataset of repeated heart failure hospitalizations.

Keywords: big data analytics, divide-and-conquer, recurrent event data, statistical computing

Procedia PDF Downloads 150
24498 Small Entrepreneurs as Creators of Chaos: Increasing Returns Requires Scaling

Authors: M. B. Neace, Xin GAo

Abstract:

Small entrepreneurs are ubiquitous. Regardless of location their success depends on several behavioral characteristics and several market conditions. In this concept paper, we extend this paradigm to include elements from the science of chaos. Our observations, research findings, literature search and intuition lead us to the proposition that all entrepreneurs seek increasing returns, as did the many small entrepreneurs we have interviewed over the years. There will be a few whose initial perturbations may create tsunami-like waves of increasing returns over time resulting in very large market consequences–the butterfly impact. When small entrepreneurs perturb the market-place and their initial efforts take root a series of phase-space transitions begin to occur. They sustain the stream of increasing returns by scaling up. Chaos theory contributes to our understanding of this phenomenon. Sustaining and nourishing increasing returns of small entrepreneurs as complex adaptive systems requires scaling. In this paper we focus on the most critical element of the small entrepreneur scaling process–the mindset of the owner-operator.

Keywords: entrepreneur, increasing returns, scaling, chaos

Procedia PDF Downloads 446
24497 Adoption of Big Data by Global Chemical Industries

Authors: Ashiff Khan, A. Seetharaman, Abhijit Dasgupta

Abstract:

The new era of big data (BD) is influencing chemical industries tremendously, providing several opportunities to reshape the way they operate and help them shift towards intelligent manufacturing. Given the availability of free software and the large amount of real-time data generated and stored in process plants, chemical industries are still in the early stages of big data adoption. The industry is just starting to realize the importance of the large amount of data it owns to make the right decisions and support its strategies. This article explores the importance of professional competencies and data science that influence BD in chemical industries to help it move towards intelligent manufacturing fast and reliable. This article utilizes a literature review and identifies potential applications in the chemical industry to move from conventional methods to a data-driven approach. The scope of this document is limited to the adoption of BD in chemical industries and the variables identified in this article. To achieve this objective, government, academia, and industry must work together to overcome all present and future challenges.

Keywords: chemical engineering, big data analytics, industrial revolution, professional competence, data science

Procedia PDF Downloads 69
24496 Secure Multiparty Computations for Privacy Preserving Classifiers

Authors: M. Sumana, K. S. Hareesha

Abstract:

Secure computations are essential while performing privacy preserving data mining. Distributed privacy preserving data mining involve two to more sites that cannot pool in their data to a third party due to the violation of law regarding the individual. Hence in order to model the private data without compromising privacy and information loss, secure multiparty computations are used. Secure computations of product, mean, variance, dot product, sigmoid function using the additive and multiplicative homomorphic property is discussed. The computations are performed on vertically partitioned data with a single site holding the class value.

Keywords: homomorphic property, secure product, secure mean and variance, secure dot product, vertically partitioned data

Procedia PDF Downloads 403