Search results for: data stream mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7782

Search results for: data stream mining

7242 Comparative Analysis of Diverse Collection of Big Data Analytics Tools

Authors: S. Vidhya, S. Sarumathi, N. Shanthi

Abstract:

Over the past era, there have been a lot of efforts and studies are carried out in growing proficient tools for performing various tasks in big data. Recently big data have gotten a lot of publicity for their good reasons. Due to the large and complex collection of datasets it is difficult to process on traditional data processing applications. This concern turns to be further mandatory for producing various tools in big data. Moreover, the main aim of big data analytics is to utilize the advanced analytic techniques besides very huge, different datasets which contain diverse sizes from terabytes to zettabytes and diverse types such as structured or unstructured and batch or streaming. Big data is useful for data sets where their size or type is away from the capability of traditional relational databases for capturing, managing and processing the data with low-latency. Thus the out coming challenges tend to the occurrence of powerful big data tools. In this survey, a various collection of big data tools are illustrated and also compared with the salient features.

Keywords: Big data, Big data analytics, Business analytics, Data analysis, Data visualization, Data discovery.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3775
7241 Learning Classifier Systems Approach for Automated Discovery of Censored Production Rules

Authors: Suraiya Jabin, Kamal K. Bharadwaj

Abstract:

In the recent past Learning Classifier Systems have been successfully used for data mining. Learning Classifier System (LCS) is basically a machine learning technique which combines evolutionary computing, reinforcement learning, supervised or unsupervised learning and heuristics to produce adaptive systems. A LCS learns by interacting with an environment from which it receives feedback in the form of numerical reward. Learning is achieved by trying to maximize the amount of reward received. All LCSs models more or less, comprise four main components; a finite population of condition–action rules, called classifiers; the performance component, which governs the interaction with the environment; the credit assignment component, which distributes the reward received from the environment to the classifiers accountable for the rewards obtained; the discovery component, which is responsible for discovering better rules and improving existing ones through a genetic algorithm. The concatenate of the production rules in the LCS form the genotype, and therefore the GA should operate on a population of classifier systems. This approach is known as the 'Pittsburgh' Classifier Systems. Other LCS that perform their GA at the rule level within a population are known as 'Mitchigan' Classifier Systems. The most predominant representation of the discovered knowledge is the standard production rules (PRs) in the form of IF P THEN D. The PRs, however, are unable to handle exceptions and do not exhibit variable precision. The Censored Production Rules (CPRs), an extension of PRs, were proposed by Michalski and Winston that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: IF P THEN D UNLESS C, where Censor C is an exception to the rule. Such rules are employed in situations, in which conditional statement IF P THEN D holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence are tight or there is simply no information available as to whether it holds or not. Thus, the IF P THEN D part of CPR expresses important information, while the UNLESS C part acts only as a switch and changes the polarity of D to ~D. In this paper Pittsburgh style LCSs approach is used for automated discovery of CPRs. An appropriate encoding scheme is suggested to represent a chromosome consisting of fixed size set of CPRs. Suitable genetic operators are designed for the set of CPRs and individual CPRs and also appropriate fitness function is proposed that incorporates basic constraints on CPR. Experimental results are presented to demonstrate the performance of the proposed learning classifier system.

Keywords: Censored Production Rule, Data Mining, GeneticAlgorithm, Learning Classifier System, Machine Learning, PittsburgApproach, , Reinforcement learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1530
7240 Application of a Similarity Measure for Graphs to Web-based Document Structures

Authors: Matthias Dehmer, Frank Emmert Streib, Alexander Mehler, Jürgen Kilian, Max Mühlhauser

Abstract:

Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods for mining the structure of web-based documents is of considerable interest. In this paper we present a similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as linear integer strings, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments for solving a novel and challenging problem: Measuring the structural similarity of generalized trees. In other words: We first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem for developing a efficient graph similarity measure. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based document structures.

Keywords: Graph similarity, hierarchical and directed graphs, hypertext, generalized trees, web structure mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892
7239 Learning an Overcomplete Dictionary using a Cauchy Mixture Model for Sparse Decay

Authors: E. S. Gower, M. O. J. Hawksford

Abstract:

An algorithm for learning an overcomplete dictionary using a Cauchy mixture model for sparse decomposition of an underdetermined mixing system is introduced. The mixture density function is derived from a ratio sample of the observed mixture signals where 1) there are at least two but not necessarily more mixture signals observed, 2) the source signals are statistically independent and 3) the sources are sparse. The basis vectors of the dictionary are learned via the optimization of the location parameters of the Cauchy mixture components, which is shown to be more accurate and robust than the conventional data mining methods usually employed for this task. Using a well known sparse decomposition algorithm, we extract three speech signals from two mixtures based on the estimated dictionary. Further tests with additive Gaussian noise are used to demonstrate the proposed algorithm-s robustness to outliers.

Keywords: expectation-maximization, Pitman estimator, sparsedecomposition

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1948
7238 Development of Subjective Measures of Interestingness: From Unexpectedness to Shocking

Authors: Eiad Yafi, M. A. Alam, Ranjit Biswas

Abstract:

Knowledge Discovery of Databases (KDD) is the process of extracting previously unknown but useful and significant information from large massive volume of databases. Data Mining is a stage in the entire process of KDD which applies an algorithm to extract interesting patterns. Usually, such algorithms generate huge volume of patterns. These patterns have to be evaluated by using interestingness measures to reflect the user requirements. Interestingness is defined in different ways, (i) Objective measures (ii) Subjective measures. Objective measures such as support and confidence extract meaningful patterns based on the structure of the patterns, while subjective measures such as unexpectedness and novelty reflect the user perspective. In this report, we try to brief the more widely spread and successful subjective measures and propose a new subjective measure of interestingness, i.e. shocking.

Keywords: Shocking rules (SHR).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1536
7237 Multi-labeled Data Expressed by a Set of Labels

Authors: Tetsuya Furukawa, Masahiro Kuzunishi

Abstract:

Collected data must be organized to be utilized efficiently, and hierarchical classification of data is efficient approach to organize data. When data is classified to multiple categories or annotated with a set of labels, users request multi-labeled data by giving a set of labels. There are several interpretations of the data expressed by a set of labels. This paper discusses which data is expressed by a set of labels by introducing orders for sets of labels and shows that there are four types of orders, which are characterized by whether the labels of expressed data includes every label of the given set of labels within the range of the set. Desirable properties of the orders, data is also expressed by the higher set of labels and different sets of labels express different data, are discussed for the orders.

Keywords: Classification Hierarchies, Multi-labeled Data, Multiple Classificaiton, Orders of Sets of Labels

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1304
7236 The Investigation of Enzymatic Activity in the Soils under the Impact of Metallurgical Industrial Activity in Lori Marz, Armenia

Authors: T. H. Derdzyan, K. A. Ghazaryan, G. A. Gevorgyan

Abstract:

Beta-glucosidase, chitinase, leucine-aminopeptidase, acid phosphomonoesterase and acetate-esterase enzyme activities in the soils under the impact of metallurgical industrial activity in Lori marz (district) were investigated. The results of the study showed that the activities of the investigated enzymes in the soils decreased with increasing distance from the Shamlugh copper mine, the Chochkan tailings storage facility and the ore transportation road. Statistical analysis revealed that the activities of the enzymes were positively correlated (significant) to each other according to the observation sites which indicated that enzyme activities were affected by the same anthropogenic factor. The investigations showed that the soils were polluted with heavy metals (Cu, Pb, As, Co, Ni, Zn) due to copper mining activity in this territory. The results of Pearson correlation analysis revealed a significant negative correlation between heavy metal pollution degree (Nemerow integrated pollution index) and soil enzyme activity. All of this indicated that copper mining activity in this territory causing the heavy metal pollution of the soils resulted in the inhabitation of the activities of the enzymes which are considered as biological catalysts to decompose organic materials and facilitate the cycling of nutrients.

Keywords: Armenia, metallurgical industrial activity, heavy metal pollutionl, soil enzyme activity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2566
7235 Application of Building Information Modeling in Energy Management of Individual Departments Occupying University Facilities

Authors: Kung-Jen Tu, Danny Vernatha

Abstract:

To assist individual departments within universities in their energy management tasks, this study explores the application of Building Information Modeling in establishing the ‘BIM based Energy Management Support System’ (BIM-EMSS). The BIM-EMSS consists of six components: (1) sensors installed for each occupant and each equipment, (2) electricity sub-meters (constantly logging lighting, HVAC, and socket electricity consumptions of each room), (3) BIM models of all rooms within individual departments’ facilities, (4) data warehouse (for storing occupancy status and logged electricity consumption data), (5) building energy management system that provides energy managers with various energy management functions, and (6) energy simulation tool (such as eQuest) that generates real time 'standard energy consumptions' data against which 'actual energy consumptions' data are compared and energy efficiency evaluated. Through the building energy management system, the energy manager is able to (a) have 3D visualization (BIM model) of each room, in which the occupancy and equipment status detected by the sensors and the electricity consumptions data logged are displayed constantly; (b) perform real time energy consumption analysis to compare the actual and standard energy consumption profiles of a space; (c) obtain energy consumption anomaly detection warnings on certain rooms so that energy management corrective actions can be further taken (data mining technique is employed to analyze the relation between space occupancy pattern with current space equipment setting to indicate an anomaly, such as when appliances turn on without occupancy); and (d) perform historical energy consumption analysis to review monthly and annually energy consumption profiles and compare them against historical energy profiles. The BIM-EMSS was further implemented in a research lab in the Department of Architecture of NTUST in Taiwan and implementation results presented to illustrate how it can be used to assist individual departments within universities in their energy management tasks.

Keywords: Sensor, electricity sub-meters, database, energy anomaly detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2284
7234 People Counting in Transport Vehicles

Authors: Sebastien Harasse, Laurent Bonnaud, Michel Desvignes

Abstract:

Counting people from a video stream in a noisy environment is a challenging task. This project aims at developing a counting system for transport vehicles, integrated in a video surveillance product. This article presents a method for the detection and tracking of multiple faces in a video by using a model of first and second order local moments. An iterative process is used to estimate the position and shape of multiple faces in images, and to track them. the trajectories are then processed to count people entering and leaving the vehicle.

Keywords: face detection, tracking, counting, local statistics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1766
7233 Gammarus:Asellus Ratio as an Index of Organic Pollution – (A Case Study in Markeaton, Kedleston Hall, and Allestree Park Lakes Derby) UK

Authors: U. Bawa

Abstract:

Macro invertebrates have been used to monitor organic pollution in rivers and streams. Several biotic indices based on macro invertebrates have been developed over the years including the Biological Monitoring Working Party (BMWP). A new biotic index, the Gammarus:Asellus ratio has been recently proposed as an index of organic pollution. This study tested the validity of the Gammarus:Asellus ratio as an index of organic pollution, by examining the relationship between the Gammarus:Asellus ratio and physical chemical parameters, and other biotic indices such as BMWP and, Average Score Per Taxon (ASPT) from lakes and streams at Markeaton Park, Allestree Park and Kedleston Hall, Derbyshire. Macro invertebrates were sampled using the standard five minute kick sampling techniques physical and chemical environmental variables were obtained based on standard sampling techniques. Eighteen sites were sampled, six sites from Markeaton Park (three sites across the stream and three sites across the lake). Six sites each were also sampled from Allestree Park and Kedleston Hall lakes. The Gammarus:Asellus ratio showed an opposite significant positive correlations with parameters indicative of organic pollution such as the level of nitrates, phosphates, and calcium and also revealed a negatively significant correlations with other biotic indices (BMWP/ASPT). The BMWP score correlated positively significantly with some water quality parameters such as dissolved oxygen and flow rate, but revealed no correlations with other chemical environmental variables. The BMWP score was significantly higher in the stream than the lake in Markeaton Park, also The ASPT scores appear to be significantly higher in the upper Lakes than the middle and lower lakes. This study has further strengthened the use of BMWP/ASPT score as an index of organic pollution. But additional application is required to validate the use of Gammarus:Asellus as a rapid bio monitoring tool.

Keywords: Asellus, Biotic index, Gammarus, Organic pollution, Macro invertebrate.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2914
7232 Practical Issues for Real-Time Video Tracking

Authors: Vitaliy Tayanov

Abstract:

In this paper we present the algorithm which allows us to have an object tracking close to real time in Full HD videos. The frame rate (FR) of a video stream is considered to be between 5 and 30 frames per second. The real time track building will be achieved if the algorithm can follow 5 or more frames per second. The principle idea is to use fast algorithms when doing preprocessing to obtain the key points and track them after. The procedure of matching points during assignment is hardly dependent on the number of points. Because of this we have to limit pointed number of points using the most informative of them.

Keywords: video tracking, real-time, Hungarian algorithm, Full HD video.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1537
7231 Latent Semantic Inference for Agriculture FAQ Retrieval

Authors: Dawei Wang, Rujing Wang, Ying Li, Baozi Wei

Abstract:

FAQ system can make user find answer to the problem that puzzles them. But now the research on Chinese FAQ system is still on the theoretical stage. This paper presents an approach to semantic inference for FAQ mining. To enhance the efficiency, a small pool of the candidate question-answering pairs retrieved from the system for the follow-up work according to the concept of the agriculture domain extracted from user input .Input queries or questions are converted into four parts, the question word segment (QWS), the verb segment (VS), the concept of agricultural areas segment (CS), the auxiliary segment (AS). A semantic matching method is presented to estimate the similarity between the semantic segments of the query and the questions in the pool of the candidate. A thesaurus constructed from the HowNet, a Chinese knowledge base, is adopted for word similarity measure in the matcher. The questions are classified into eleven intension categories using predefined question stemming keywords. For FAQ mining, given a query, the question part and answer part in an FAQ question-answer pair is matched with the input query, respectively. Finally, the probabilities estimated from these two parts are integrated and used to choose the most likely answer for the input query. These approaches are experimented on an agriculture FAQ system. Experimental results indicate that the proposed approach outperformed the FAQ-Finder system in agriculture FAQ retrieval.

Keywords: FAQ, Semantic Inference, Ontology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1379
7230 The Comparison of Data Replication in Distributed Systems

Authors: Iman Zangeneh, Mostafa Moradi, Ali Mokhtarbaf

Abstract:

The necessity of ever-increasing use of distributed data in computer networks is obvious for all. One technique that is performed on the distributed data for increasing of efficiency and reliablity is data rplication. In this paper, after introducing this technique and its advantages, we will examine some dynamic data replication. We will examine their characteristies for some overus scenario and the we will propose some suggestion for their improvement.

Keywords: data replication, data hiding, consistency, dynamicdata replication strategy

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1635
7229 La promoted Ni/α-Al2O3 Catalysts for Syngas Methanation

Authors: Anmin Zhao, Weiyong Yingı , Haitao Zhang, Hongfang Ma, Dingye Fang

Abstract:

The Ni/α-Al2O3 catalysts with different amounts of La as promoter from 0 to 4 wt % were prepared, characterized and their catalytic activity was investigated in syngas methanation reaction. Effects of reaction temperature and lanthanum loading on carbon oxides conversion and methane selectivity were also studied. Adding certain amount of lanthanum to 10Ni /α-Al2O3 catalysts can decrease the average NiO crystallite diameter which leads to higher activity and stability while excessive addition would cause deactivation quickly. Stability on stream towards deactivation was observed up to 800 min at 500 °C, 0.1MPa and 600000 mL·g-1·h-1.

Keywords: Methanation; Nickel catalysts; Syngas methanation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3645
7228 Genetic Programming Approach to Hierarchical Production Rule Discovery

Authors: Basheer M. Al-Maqaleh, Kamal K. Bharadwaj

Abstract:

Automated discovery of hierarchical structures in large data sets has been an active research area in the recent past. This paper focuses on the issue of mining generalized rules with crisp hierarchical structure using Genetic Programming (GP) approach to knowledge discovery. The post-processing scheme presented in this work uses flat rules as initial individuals of GP and discovers hierarchical structure. Suitable genetic operators are proposed for the suggested encoding. Based on the Subsumption Matrix(SM), an appropriate fitness function is suggested. Finally, Hierarchical Production Rules (HPRs) are generated from the discovered hierarchy. Experimental results are presented to demonstrate the performance of the proposed algorithm.

Keywords: Genetic Programming, Hierarchy, Knowledge Discovery in Database, Subsumption Matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1451
7227 Video Data Mining based on Information Fusion for Tamper Detection

Authors: Girija Chetty, Renuka Biswas

Abstract:

In this paper, we propose novel algorithmic models based on information fusion and feature transformation in crossmodal subspace for different types of residue features extracted from several intra-frame and inter-frame pixel sub-blocks in video sequences for detecting digital video tampering or forgery. An evaluation of proposed residue features – the noise residue features and the quantization features, their transformation in cross-modal subspace, and their multimodal fusion, for emulated copy-move tamper scenario shows a significant improvement in tamper detection accuracy as compared to single mode features without transformation in cross-modal subspace.

Keywords: image tamper detection, digital forensics, correlation features image fusion

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1899
7226 Instability of Electron Plasma Waves in an Electron-Hole Bounded Quantum Dusty Plasma

Authors: Basudev Ghosh, Sailendranath Paul, Sreyasi Banerjee

Abstract:

Using quantum hydrodynamical (QHD) model the linear dispersion relation for the electron plasma waves propagating in a cylindrical waveguide filled with a dense plasma containing streaming electron, hole and stationary charged dust particles has been derived. It is shown that the effect of finite boundary and stream velocity of electrons and holes make some of the possible modes of propagation linearly unstable. The growth rate of this instability is shown to depend significantly on different plasma parameters.

Keywords: Electron Plasma wave, Quantum plasma, Quantum Hydrodynamical model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1702
7225 Effects of Livestream Affordances on Consumer Purchase Willingness: Explicit IT Affordances Perspective

Authors: Isaac O. Asante, Yushi Jiang, Hailin Tao

Abstract:

Livestreaming marketing, the new electronic commerce element, has become an optional marketing channel following the COVID-19 pandemic, and many sellers are leveraging the features presented by livestreaming to increase sales. This study was conducted to measure real-time observable interactions between consumers and sellers. Based on the affordance theory, this study conceptualized constructs representing the interactive features and examined how they drive consumers’ purchase willingness during livestreaming sessions using 1238 datasets from Amazon Live, following the manual observation of transaction records. Using structural equation modeling, the ordinary least square regression suggests that live viewers, new followers, live chats, and likes positively affect purchase willingness. The Sobel and Monte Carlo tests show that new followers, live chats, and likes significantly mediate the relationship between live viewers and purchase willingness. The study presents a way of measuring interactions in livestreaming commerce and proposes a way to manually gather data on consumer behaviors in livestreaming platforms when the application programming interface (API) of such platforms does not support data mining algorithms.

Keywords: Livestreaming marketing, live chats, live viewers, likes, new followers, purchase willingness.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 149
7224 Analyzing Factors Impacting COVID-19 Vaccination Rates

Authors: Dongseok Cho, Mitchell Driedger, Sera Han, Noman Khan, Mohammed Elmorsy, Mohamad El-Hajj

Abstract:

Since the approval of the COVID-19 vaccine in late 2020, vaccination rates have varied around the globe. Access to a vaccine supply, mandated vaccination policy, and vaccine hesitancy contribute to these rates. This study used COVID-19 vaccination data from Our World in Data and the Multilateral Leaders Task Force on COVID-19 to create two COVID-19 vaccination indices. The first index is the Vaccine Utilization Index (VUI), which measures how effectively each country has utilized its vaccine supply to doubly vaccinate its population. The second index is the Vaccination Acceleration Index (VAI), which evaluates how efficiently each country vaccinated their populations within their first 150 days. Pearson correlations were created between these indices and country indicators obtained from the World Bank. Results of these correlations identify countries with stronger Health indicators such as lower mortality rates, lower age-dependency ratios, and higher rates of immunization to other diseases display higher VUI and VAI scores than countries with lesser values. VAI scores are also positively correlated to Governance and Economic indicators, such as regulatory quality, control of corruption, and GDP per capita. As represented by the VUI, proper utilization of the COVID-19 vaccine supply by country is observed in countries that display excellence in health practices. A country’s motivation to accelerate its vaccination rates within the first 150 days of vaccinating, as represented by the VAI, was largely a product of the governing body’s effectiveness and economic status, as well as overall excellence in health practises.

Keywords: Data mining, Pearson Correlation, COVID-19, vaccination rates, hesitancy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 345
7223 Does Practice Reflect Theory? An Exploratory Study of a Successful Knowledge Management System

Authors: Janet L. Kourik, Peter E. Maher

Abstract:

To investigate the correspondence of theory and practice, a successfully implemented Knowledge Management System (KMS) is explored through the lens of Alavi and Leidner-s proposed KMS framework for the analysis of an information system in knowledge management (Framework-AISKM). The applied KMS system was designed to manage curricular knowledge in a distributed university environment. The motivation for the KMS is discussed along with the types of knowledge necessary in an academic setting. Elements of the KMS involved in all phases of capturing and disseminating knowledge are described. As the KMS matures the resulting data stores form the precursor to and the potential for knowledge mining. The findings from this exploratory study indicate substantial correspondence between the successful KMS and the theory-based framework providing provisional confirmation for the framework while suggesting factors that contributed to the system-s success. Avenues for future work are described.

Keywords: Applied KMS, education, knowledge management (KM), KM framework, knowledge management system (KMS).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1037
7222 A Robust Audio Fingerprinting Algorithm in MP3 Compressed Domain

Authors: Ruili Zhou, Yuesheng Zhu

Abstract:

In this paper, a new robust audio fingerprinting algorithm in MP3 compressed domain is proposed with high robustness to time scale modification (TSM). Instead of simply employing short-term information of the MP3 stream, the new algorithm extracts the long-term features in MP3 compressed domain by using the modulation frequency analysis. Our experiment has demonstrated that the proposed method can achieve a hit rate of above 95% in audio retrieval and resist the attack of 20% TSM. It has lower bit error rate (BER) performance compared to the other algorithms. The proposed algorithm can also be used in other compressed domains, such as AAC.

Keywords: Audio Fingerprinting, MP3, Modulation Frequency, TSM

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2196
7221 Control of Pressure Gradient in the Contraction of a Wind Tunnel

Authors: Dehghan Manshadi M., Mirzaei M., Soltani M. R., Ghorbanian K.

Abstract:

Subsonic wind tunnel experiments were conducted to study the effect of tripped boundary layer on the pressure distribution in the contraction region of the tunnel. Measurements were performed by installing trip strip at two different positions in the concave portion of the contraction. The results show that installation of the trip strips, have significant effects on both turbulence and pressure distribution. The reduction in the free stream turbulence and reduction of the wall static pressure distribution deferred signified with the location of the trip strip.

Keywords: Contraction, pressure distribution, trip strip, turbulence intensity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3038
7220 Development of Prediction Models of Day-Ahead Hourly Building Electricity Consumption and Peak Power Demand Using the Machine Learning Method

Authors: Dalin Si, Azizan Aziz, Bertrand Lasternas

Abstract:

To encourage building owners to purchase electricity at the wholesale market and reduce building peak demand, this study aims to develop models that predict day-ahead hourly electricity consumption and demand using artificial neural network (ANN) and support vector machine (SVM). All prediction models are built in Python, with tool Scikit-learn and Pybrain. The input data for both consumption and demand prediction are time stamp, outdoor dry bulb temperature, relative humidity, air handling unit (AHU), supply air temperature and solar radiation. Solar radiation, which is unavailable a day-ahead, is predicted at first, and then this estimation is used as an input to predict consumption and demand. Models to predict consumption and demand are trained in both SVM and ANN, and depend on cooling or heating, weekdays or weekends. The results show that ANN is the better option for both consumption and demand prediction. It can achieve 15.50% to 20.03% coefficient of variance of root mean square error (CVRMSE) for consumption prediction and 22.89% to 32.42% CVRMSE for demand prediction, respectively. To conclude, the presented models have potential to help building owners to purchase electricity at the wholesale market, but they are not robust when used in demand response control.

Keywords: Building energy prediction, data mining, demand response, electricity market.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2205
7219 CFD Modeling of Air Stream Pressure Drop inside Combustion Air Duct of Coal-Fired Power Plant with and without Airfoil

Authors: Pakawhat Khumkhreung, Yottana Khunatorn

Abstract:

The flow pattern inside rectangular intake air duct of 300 MW lignite coal-fired power plant is investigated in order to analyze and reduce overall inlet system pressure drop. The system consists of the 45-degree inlet elbow, the flow instrument, the 90-degree mitered elbow and fans, respectively. The energy loss in each section can be determined by Bernoulli’s equation and ASHRAE standard table. Hence, computational fluid dynamics (CFD) is used in this study based on Navier-Stroke equation and the standard k-epsilon turbulence modeling. Input boundary condition is 175 kg/s mass flow rate inside the 11-m2 cross sectional duct. According to the inlet air flow rate, the Reynolds number of airstream is 2.7x106 (based on the hydraulic duct diameter), thus the flow behavior is turbulence. The numerical results are validated with the real operation data. It is found that the numerical result agrees well with the operating data, and dominant loss occurs at the flow rate measurement device. Normally, the air flow rate is measured by the airfoil and it gets high pressure drop inside the duct. To overcome this problem, the airfoil is planned to be replaced with the other type measuring instrument, such as the average pitot tube which generates low pressure drop of airstream. The numerical result in case of average pitot tube shows that the pressure drop inside the inlet airstream duct is decreased significantly. It should be noted that the energy consumption of inlet air system is reduced too.

Keywords: Airfoil, average pitot tube, combustion air, CFD, pressure drop, rectangular duct.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1080
7218 Using Pattern Search Methods for Minimizing Clustering Problems

Authors: Parvaneh Shabanzadeh, Malik Hj Abu Hassan, Leong Wah June, Maryam Mohagheghtabar

Abstract:

Clustering is one of an interesting data mining topics that can be applied in many fields. Recently, the problem of cluster analysis is formulated as a problem of nonsmooth, nonconvex optimization, and an algorithm for solving the cluster analysis problem based on nonsmooth optimization techniques is developed. This optimization problem has a number of characteristics that make it challenging: it has many local minimum, the optimization variables can be either continuous or categorical, and there are no exact analytical derivatives. In this study we show how to apply a particular class of optimization methods known as pattern search methods to address these challenges. These methods do not explicitly use derivatives, an important feature that has not been addressed in previous studies. Results of numerical experiments are presented which demonstrate the effectiveness of the proposed method.

Keywords: Clustering functions, Non-smooth Optimization, Nonconvex Optimization, Pattern Search Method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1641
7217 Measuring Heterogeneous Traffic Density

Authors: V. Thamizh Arasan, G. Dhivya

Abstract:

Traffic Density provides an indication of the level of service being provided to the road users. Hence, there is a need to study the traffic flow characteristics with specific reference to density in detail. When the length and speed of the vehicles in a traffic stream vary significantly, the concept of occupancy, rather than density, is more appropriate to describe traffic concentration. When the concept of occupancy is applied to heterogeneous traffic condition, it is necessary to consider the area of the road space and the area of the vehicles as the bases. Hence, a new concept named, 'area-occupancy' is proposed here. It has been found that the estimated area-occupancy gives consistent values irrespective of change in traffic composition.

Keywords: Density Measurement, Heterogeneity, Occupancy, Traffic Flow.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3239
7216 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: Big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2143
7215 GIS-Based Spatial Distribution and Evaluation of Selected Heavy Metals Contamination in Topsoil around Ecton Mining Area, Derbyshire, UK

Authors: Zahid O. Alibrahim, Craig D. Williams, Clive L. Roberts

Abstract:

The study area (Ecton mining area) is located in the southern part of the Peak District in Derbyshire, England. It is bounded by the River Manifold from the west. This area has been mined for a long period. As a result, huge amounts of potentially toxic metals were released into the surrounding area and are most likely to be a significant source of heavy metal contamination to the local soil, water and vegetation. In order to appraise the potential heavy metal pollution in this area, 37 topsoil samples (5-20 cm depth) were collected and analysed for their total content of Cu, Pb, Zn, Mn, Cr, Ni and V using ICP (Inductively Coupled Plasma) optical emission spectroscopy. Multivariate Geospatial analyses using the GIS technique were utilised to draw geochemical maps of the metals of interest over the study area. A few hotspot points, areas of elevated concentrations of metals, were specified, which are presumed to be the results of anthropogenic activities. In addition, the soil’s environmental quality was evaluated by calculating the Mullers’ Geoaccumulation index (I geo), which suggests that the degree of contamination of the investigated heavy metals has the following trend: Pb > Zn > Cu > Mn > Ni = Cr = V. Furthermore, the potential ecological risk, using the enrichment factor (EF), was also specified. On the basis of the calculated amount or the EF, the levels of pollution for the studied metals in the study area have the following order: Pb>Zn>Cu>Cr>V>Ni>Mn.

Keywords: Heavy metals, GIS, multivariate analysis, geoaccumulation index, enrichment factor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1241
7214 An Improvement of Multi-Label Image Classification Method Based on Histogram of Oriented Gradient

Authors: Ziad Abdallah, Mohamad Oueidat, Ali El-Zaart

Abstract:

Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.

Keywords: Data mining, information retrieval system, multi-label, problem transformation, histogram of gradients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1315
7213 Development of a Technology Assessment Model by Patents and Customers' Review Data

Authors: Kisik Song, Sungjoo Lee

Abstract:

Recent years have seen an increasing number of patent disputes due to excessive competition in the global market and a reduced technology life-cycle; this has increased the risk of investment in technology development. While many global companies have started developing a methodology to identify promising technologies and assess for decisions, the existing methodology still has some limitations. Post hoc assessments of the new technology are not being performed, especially to determine whether the suggested technologies turned out to be promising. For example, in existing quantitative patent analysis, a patent’s citation information has served as an important metric for quality assessment, but this analysis cannot be applied to recently registered patents because such information accumulates over time. Therefore, we propose a new technology assessment model that can replace citation information and positively affect technological development based on post hoc analysis of the patents for promising technologies. Additionally, we collect customer reviews on a target technology to extract keywords that show the customers’ needs, and we determine how many keywords are covered in the new technology. Finally, we construct a portfolio (based on a technology assessment from patent information) and a customer-based marketability assessment (based on review data), and we use them to visualize the characteristics of the new technologies.

Keywords: Technology assessment, patents, citation information, opinion mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 992