Search results for: data comparison
26997 GPU-Based Back-Projection of Synthetic Aperture Radar (SAR) Data onto 3D Reference Voxels
Authors: Joshua Buli, David Pietrowski, Samuel Britton
Abstract:
Processing SAR data usually requires constraints in extent in the Fourier domain as well as approximations and interpolations onto a planar surface to form an exploitable image. This results in a potential loss of data requires several interpolative techniques, and restricts visualization to two-dimensional plane imagery. The data can be interpolated into a ground plane projection, with or without terrain as a component, all to better view SAR data in an image domain comparable to what a human would view, to ease interpretation. An alternate but computationally heavy method to make use of more of the data is the basis of this research. Pre-processing of the SAR data is completed first (matched-filtering, motion compensation, etc.), the data is then range compressed, and lastly, the contribution from each pulse is determined for each specific point in space by searching the time history data for the reflectivity values for each pulse summed over the entire collection. This results in a per-3D-point reflectivity using the entire collection domain. New advances in GPU processing have finally allowed this rapid projection of acquired SAR data onto any desired reference surface (called backprojection). Mathematically, the computations are fast and easy to implement, despite limitations in SAR phase history data size and 3D-point cloud size. Backprojection processing algorithms are embarrassingly parallel since each 3D point in the scene has the same reflectivity calculation applied for all pulses, independent of all other 3D points and pulse data under consideration. Therefore, given the simplicity of the single backprojection calculation, the work can be spread across thousands of GPU threads allowing for accurate reflectivity representation of a scene. Furthermore, because reflectivity values are associated with individual three-dimensional points, a plane is no longer the sole permissible mapping base; a digital elevation model or even a cloud of points (collected from any sensor capable of measuring ground topography) can be used as a basis for the backprojection technique. This technique minimizes any interpolations and modifications of the raw data, maintaining maximum data integrity. This innovative processing will allow for SAR data to be rapidly brought into a common reference frame for immediate exploitation and data fusion with other three-dimensional data and representations.Keywords: backprojection, data fusion, exploitation, three-dimensional, visualization
Procedia PDF Downloads 8626996 Comparison of Interactive Performance of Clicking Tasks Using Cursor Control Devices under Different Feedback Modes
Authors: Jinshou Shi, Xiaozhou Zhou, Yingwei Zhou, Tuoyang Zhou, Ning Li, Chi Zhang, Zhanshuo Zhang, Ziang Chen
Abstract:
In order to select the optimal interaction method for common computer click tasks, the click experiment test adopts the ISO 9241-9 task paradigm, using four common operations: mouse, trackball, touch, and eye control under visual feedback, auditory feedback, and no feedback. Through data analysis of various parameters of movement time, throughput, and accuracy, it is found that the movement time of touch-control is the shortest, the operation accuracy and throughput are higher than others, and the overall operation performance is the best. In addition, the motion time of the click operation with auditory feedback is significantly lower than the other two feedback methods in each operation mode experiment. In terms of the size of the click target, it is found that when the target is too small (less than 14px), the click performance of all aspects is reduced, so it is proposed that the design of the interface button should not be less than 28px. In this article, we discussed in detail the advantages and disadvantages of the operation and feedback methods, and the results of the discussion of the click operation can be applied to the design of the buttons in the interactive interface.Keywords: cursor control performance, feedback, human computer interaction, throughput
Procedia PDF Downloads 19726995 Integration of Knowledge and Metadata for Complex Data Warehouses and Big Data
Authors: Jean Christian Ralaivao, Fabrice Razafindraibe, Hasina Rakotonirainy
Abstract:
This document constitutes a resumption of work carried out in the field of complex data warehouses (DW) relating to the management and formalization of knowledge and metadata. It offers a methodological approach for integrating two concepts, knowledge and metadata, within the framework of a complex DW architecture. The objective of the work considers the use of the technique of knowledge representation by description logics and the extension of Common Warehouse Metamodel (CWM) specifications. This will lead to a fallout in terms of the performance of a complex DW. Three essential aspects of this work are expected, including the representation of knowledge in description logics and the declination of this knowledge into consistent UML diagrams while respecting or extending the CWM specifications and using XML as pivot. The field of application is large but will be adapted to systems with heteroge-neous, complex and unstructured content and moreover requiring a great (re)use of knowledge such as medical data warehouses.Keywords: data warehouse, description logics, integration, knowledge, metadata
Procedia PDF Downloads 13826994 Comparison of Unit Hydrograph Models to Simulate Flood Events at the Field Scale
Authors: Imene Skhakhfa, Lahbaci Ouerdachi
Abstract:
To ensure the overall coherence of simulated results, it is necessary to develop a robust validation process. In many applications, it is no longer content to calibrate and validate the model only in relation to the hydro graph measured at the outlet, but we try to better simulate the functioning of the watershed in space. Therefore the timing also performs compared to other variables such as water level measurements in intermediate stations or groundwater levels. As part of this work, we limit ourselves to modeling flood of short duration for which the process of evapotranspiration is negligible. The main parameters to identify the models are related to the method of unit hydro graph (HU). Three different models were tested: SNYDER, CLARK and SCS. These models differ in their mathematical structure and parameters to be calibrated while hydrological data are the same, the initial water content and precipitation. The models are compared on the basis of their performance in terms six objective criteria, three global criteria and three criteria representing volume, peak flow, and the mean square error. The first type of criteria gives more weight to strong events whereas the second considers all events to be of equal weight. The results show that the calibrated parameter values are dependent and also highlight the problems associated with the simulation of low flow events and intermittent precipitation.Keywords: model calibration, intensity, runoff, hydrograph
Procedia PDF Downloads 48626993 Data Analytics in Energy Management
Authors: Sanjivrao Katakam, Thanumoorthi I., Antony Gerald, Ratan Kulkarni, Shaju Nair
Abstract:
With increasing energy costs and its impact on the business, sustainability today has evolved from a social expectation to an economic imperative. Therefore, finding methods to reduce cost has become a critical directive for Industry leaders. Effective energy management is the only way to cut costs. However, Energy Management has been a challenge because it requires a change in old habits and legacy systems followed for decades. Today exorbitant levels of energy and operational data is being captured and stored by Industries, but they are unable to convert these structured and unstructured data sets into meaningful business intelligence. It must be noted that for quick decisions, organizations must learn to cope with large volumes of operational data in different formats. Energy analytics not only helps in extracting inferences from these data sets, but also is instrumental in transformation from old approaches of energy management to new. This in turn assists in effective decision making for implementation. It is the requirement of organizations to have an established corporate strategy for reducing operational costs through visibility and optimization of energy usage. Energy analytics play a key role in optimization of operations. The paper describes how today energy data analytics is extensively used in different scenarios like reducing operational costs, predicting energy demands, optimizing network efficiency, asset maintenance, improving customer insights and device data insights. The paper also highlights how analytics helps transform insights obtained from energy data into sustainable solutions. The paper utilizes data from an array of segments such as retail, transportation, and water sectors.Keywords: energy analytics, energy management, operational data, business intelligence, optimization
Procedia PDF Downloads 36426992 Efficient Frequent Itemset Mining Methods over Real-Time Spatial Big Data
Authors: Hamdi Sana, Emna Bouazizi, Sami Faiz
Abstract:
In recent years, there is a huge increase in the use of spatio-temporal applications where data and queries are continuously moving. As a result, the need to process real-time spatio-temporal data seems clear and real-time stream data management becomes a hot topic. Sliding window model and frequent itemset mining over dynamic data are the most important problems in the context of data mining. Thus, sliding window model for frequent itemset mining is a widely used model for data stream mining due to its emphasis on recent data and its bounded memory requirement. These methods use the traditional transaction-based sliding window model where the window size is based on a fixed number of transactions. Actually, this model supposes that all transactions have a constant rate which is not suited for real-time applications. And the use of this model in such applications endangers their performance. Based on these observations, this paper relaxes the notion of window size and proposes the use of a timestamp-based sliding window model. In our proposed frequent itemset mining algorithm, support conditions are used to differentiate frequents and infrequent patterns. Thereafter, a tree is developed to incrementally maintain the essential information. We evaluate our contribution. The preliminary results are quite promising.Keywords: real-time spatial big data, frequent itemset, transaction-based sliding window model, timestamp-based sliding window model, weighted frequent patterns, tree, stream query
Procedia PDF Downloads 16226991 The Extent of Big Data Analysis by the External Auditors
Authors: Iyad Ismail, Fathilatul Abdul Hamid
Abstract:
This research was mainly investigated to recognize the extent of big data analysis by external auditors. This paper adopts grounded theory as a framework for conducting a series of semi-structured interviews with eighteen external auditors. The research findings comprised the availability extent of big data and big data analysis usage by the external auditors in Palestine, Gaza Strip. Considering the study's outcomes leads to a series of auditing procedures in order to improve the external auditing techniques, which leads to high-quality audit process. Also, this research is crucial for auditing firms by giving an insight into the mechanisms of auditing firms to identify the most important strategies that help in achieving competitive audit quality. These results are aims to instruct the auditing academic and professional institutions in developing techniques for external auditors in order to the big data analysis. This paper provides appropriate information for the decision-making process and a source of future information which affects technological auditing.Keywords: big data analysis, external auditors, audit reliance, internal audit function
Procedia PDF Downloads 7026990 High-Accuracy Satellite Image Analysis and Rapid DSM Extraction for Urban Environment Evaluations (Tripoli-Libya)
Authors: Abdunaser Abduelmula, Maria Luisa M. Bastos, José A. Gonçalves
Abstract:
The modeling of the earth's surface and evaluation of urban environment, with 3D models, is an important research topic. New stereo capabilities of high-resolution optical satellites images, such as the tri-stereo mode of Pleiades, combined with new image matching algorithms, are now available and can be applied in urban area analysis. In addition, photogrammetry software packages gained new, more efficient matching algorithms, such as SGM, as well as improved filters to deal with shadow areas, can achieve denser and more precise results. This paper describes a comparison between 3D data extracted from tri-stereo and dual stereo satellite images, combined with pixel based matching and Wallis filter. The aim was to improve the accuracy of 3D models especially in urban areas, in order to assess if satellite images are appropriate for a rapid evaluation of urban environments. The results showed that 3D models achieved by Pleiades tri-stereo outperformed, both in terms of accuracy and detail, the result obtained from a Geo-eye pair. The assessment was made with reference digital surface models derived from high-resolution aerial photography. This could mean that tri-stereo images can be successfully used for the proposed urban change analyses.Keywords: 3D models, environment, matching, pleiades
Procedia PDF Downloads 33026989 A Model of Teacher Leadership in History Instruction
Authors: Poramatdha Chutimant
Abstract:
The objective of the research was to propose a model of teacher leadership in history instruction for utilization. Everett M. Rogers’ Diffusion of Innovations Theory is applied as theoretical framework. Qualitative method is to be used in the study, and the interview protocol used as an instrument to collect primary data from best practices who awarded by Office of National Education Commission (ONEC). Open-end questions will be used in interview protocol in order to gather the various data. Then, information according to international context of history instruction is the secondary data used to support in the summarizing process (Content Analysis). Dendrogram is a key to interpret and synthesize the primary data. Thus, secondary data comes as the supportive issue in explanation and elaboration. In-depth interview is to be used to collected information from seven experts in educational field. The focal point is to validate a draft model in term of future utilization finally.Keywords: history study, nationalism, patriotism, responsible citizenship, teacher leadership
Procedia PDF Downloads 28026988 The Effect of Institutions on Economic Growth: An Analysis Based on Bayesian Panel Data Estimation
Authors: Mohammad Anwar, Shah Waliullah
Abstract:
This study investigated panel data regression models. This paper used Bayesian and classical methods to study the impact of institutions on economic growth from data (1990-2014), especially in developing countries. Under the classical and Bayesian methodology, the two-panel data models were estimated, which are common effects and fixed effects. For the Bayesian approach, the prior information is used in this paper, and normal gamma prior is used for the panel data models. The analysis was done through WinBUGS14 software. The estimated results of the study showed that panel data models are valid models in Bayesian methodology. In the Bayesian approach, the effects of all independent variables were positively and significantly affected by the dependent variables. Based on the standard errors of all models, we must say that the fixed effect model is the best model in the Bayesian estimation of panel data models. Also, it was proved that the fixed effect model has the lowest value of standard error, as compared to other models.Keywords: Bayesian approach, common effect, fixed effect, random effect, Dynamic Random Effect Model
Procedia PDF Downloads 6826987 Regularizing Software for Aerosol Particles
Authors: Christine Böckmann, Julia Rosemann
Abstract:
We present an inversion algorithm that is used in the European Aerosol Lidar Network for the inversion of data collected with multi-wavelength Raman lidar. These instruments measure backscatter coefficients at 355, 532, and 1064 nm, and extinction coefficients at 355 and 532 nm. The algorithm is based on manually controlled inversion of optical data which allows for detailed sensitivity studies and thus provides us with comparably high quality of the derived data products. The algorithm allows us to derive particle effective radius, volume, surface-area concentration with comparably high confidence. The retrieval of the real and imaginary parts of the complex refractive index still is a challenge in view of the accuracy required for these parameters in climate change studies in which light-absorption needs to be known with high accuracy. Single-scattering albedo (SSA) can be computed from the retrieve microphysical parameters and allows us to categorize aerosols into high and low absorbing aerosols. From mathematical point of view the algorithm is based on the concept of using truncated singular value decomposition as regularization method. This method was adapted to work for the retrieval of the particle size distribution function (PSD) and is called hybrid regularization technique since it is using a triple of regularization parameters. The inversion of an ill-posed problem, such as the retrieval of the PSD, is always a challenging task because very small measurement errors will be amplified most often hugely during the solution process unless an appropriate regularization method is used. Even using a regularization method is difficult since appropriate regularization parameters have to be determined. Therefore, in a next stage of our work we decided to use two regularization techniques in parallel for comparison purpose. The second method is an iterative regularization method based on Pade iteration. Here, the number of iteration steps serves as the regularization parameter. We successfully developed a semi-automated software for spherical particles which is able to run even on a parallel processor machine. From a mathematical point of view, it is also very important (as selection criteria for an appropriate regularization method) to investigate the degree of ill-posedness of the problem which we found is a moderate ill-posedness. We computed the optical data from mono-modal logarithmic PSD and investigated particles of spherical shape in our simulations. We considered particle radii as large as 6 nm which does not only cover the size range of particles in the fine-mode fraction of naturally occurring PSD but also covers a part of the coarse-mode fraction of PSD. We considered errors of 15% in the simulation studies. For the SSA, 100% of all cases achieve relative errors below 12%. In more detail, 87% of all cases for 355 nm and 88% of all cases for 532 nm are well below 6%. With respect to the absolute error for non- and weak-absorbing particles with real parts 1.5 and 1.6 in all modes the accuracy limit +/- 0.03 is achieved. In sum, 70% of all cases stay below +/-0.03 which is sufficient for climate change studies.Keywords: aerosol particles, inverse problem, microphysical particle properties, regularization
Procedia PDF Downloads 34326986 Evaluation of the Boiling Liquid Expanding Vapor Explosion Thermal Effects in Hassi R'Mel Gas Processing Plant Using Fire Dynamics Simulator
Authors: Brady Manescau, Ilyas Sellami, Khaled Chetehouna, Charles De Izarra, Rachid Nait-Said, Fati Zidani
Abstract:
During a fire in an oil and gas refinery, several thermal accidents can occur and cause serious damage to people and environment. Among these accidents, the BLEVE (Boiling Liquid Expanding Vapor Explosion) is most observed and remains a major concern for risk decision-makers. It corresponds to a violent vaporization of explosive nature following the rupture of a vessel containing a liquid at a temperature significantly higher than its normal boiling point at atmospheric pressure. Their effects on the environment generally appear in three ways: blast overpressure, radiation from the fireball if the liquid involved is flammable and fragment hazards. In order to estimate the potential damage that would be caused by such an explosion, risk decision-makers often use quantitative risk analysis (QRA). This analysis is a rigorous and advanced approach that requires a reliable data in order to obtain a good estimate and control of risks. However, in most cases, the data used in QRA are obtained from the empirical correlations. These empirical correlations generally overestimate BLEVE effects because they are based on simplifications and do not take into account real parameters like the geometry effect. Considering that these risk analyses are based on an assessment of BLEVE effects on human life and plant equipment, more precise and reliable data should be provided. From this point of view, the CFD modeling of BLEVE effects appears as a solution to the empirical law limitations. In this context, the main objective is to develop a numerical tool in order to predict BLEVE thermal effects using the CFD code FDS version 6. Simulations are carried out with a mesh size of 1 m. The fireball source is modeled as a vertical release of hot fuel in a short time. The modeling of fireball dynamics is based on a single step combustion using an EDC model coupled with the default LES turbulence model. Fireball characteristics (diameter, height, heat flux and lifetime) issued from the large scale BAM experiment are used to demonstrate the ability of FDS to simulate the various steps of the BLEVE phenomenon from ignition up to total burnout. The influence of release parameters such as the injection rate and the radiative fraction on the fireball heat flux is also presented. Predictions are very encouraging and show good agreement in comparison with BAM experiment data. In addition, a numerical study is carried out on an operational propane accumulator in an Algerian gas processing plant of SONATRACH company located in the Hassi R’Mel Gas Field (the largest gas field in Algeria).Keywords: BLEVE effects, CFD, FDS, fireball, LES, QRA
Procedia PDF Downloads 18626985 A Nonlinear Visco-Hyper Elastic Constitutive Model for Modelling Behavior of Polyurea at Large Deformations
Authors: Shank Kulkarni, Alireza Tabarraei
Abstract:
The fantastic properties of polyurea such as flexibility, durability, and chemical resistance have brought it a wide range of application in various industries. Effective prediction of the response of polyurea under different loading and environmental conditions necessitates the development of an accurate constitutive model. Similar to most polymers, the behavior of polyurea depends on both strain and strain rate. Therefore, the constitutive model should be able to capture both these effects on the response of polyurea. To achieve this objective, in this paper, a nonlinear hyper-viscoelastic constitutive model is developed by the superposition of a hyperelastic and a viscoelastic model. The proposed constitutive model can capture the behavior of polyurea under compressive loading conditions at various strain rates. Four parameter Ogden model and Mooney Rivlin model are used to modeling the hyperelastic behavior of polyurea. The viscoelastic behavior is modeled using both a three-parameter standard linear solid (SLS) model and a K-BKZ model. Comparison of the modeling results with experiments shows that Odgen and SLS model can more accurately predict the behavior of polyurea. The material parameters of the model are found by curve fitting of the proposed model to the uniaxial compression test data. The proposed model can closely reproduce the stress-strain behavior of polyurea for strain rates up to 6500 /s.Keywords: constitutive modelling, ogden model, polyurea, SLS model, uniaxial compression test
Procedia PDF Downloads 24426984 Diagnosis of the Heart Rhythm Disorders by Using Hybrid Classifiers
Authors: Sule Yucelbas, Gulay Tezel, Cuneyt Yucelbas, Seral Ozsen
Abstract:
In this study, it was tried to identify some heart rhythm disorders by electrocardiography (ECG) data that is taken from MIT-BIH arrhythmia database by subtracting the required features, presenting to artificial neural networks (ANN), artificial immune systems (AIS), artificial neural network based on artificial immune system (AIS-ANN) and particle swarm optimization based artificial neural network (PSO-NN) classifier systems. The main purpose of this study is to evaluate the performance of hybrid AIS-ANN and PSO-ANN classifiers with regard to the ANN and AIS. For this purpose, the normal sinus rhythm (NSR), atrial premature contraction (APC), sinus arrhythmia (SA), ventricular trigeminy (VTI), ventricular tachycardia (VTK) and atrial fibrillation (AF) data for each of the RR intervals were found. Then these data in the form of pairs (NSR-APC, NSR-SA, NSR-VTI, NSR-VTK and NSR-AF) is created by combining discrete wavelet transform which is applied to each of these two groups of data and two different data sets with 9 and 27 features were obtained from each of them after data reduction. Afterwards, the data randomly was firstly mixed within themselves, and then 4-fold cross validation method was applied to create the training and testing data. The training and testing accuracy rates and training time are compared with each other. As a result, performances of the hybrid classification systems, AIS-ANN and PSO-ANN were seen to be close to the performance of the ANN system. Also, the results of the hybrid systems were much better than AIS, too. However, ANN had much shorter period of training time than other systems. In terms of training times, ANN was followed by PSO-ANN, AIS-ANN and AIS systems respectively. Also, the features that extracted from the data affected the classification results significantly.Keywords: AIS, ANN, ECG, hybrid classifiers, PSO
Procedia PDF Downloads 44226983 Topic Modelling Using Latent Dirichlet Allocation and Latent Semantic Indexing on SA Telco Twitter Data
Authors: Phumelele Kubheka, Pius Owolawi, Gbolahan Aiyetoro
Abstract:
Twitter is one of the most popular social media platforms where users can share their opinions on different subjects. As of 2010, The Twitter platform generates more than 12 Terabytes of data daily, ~ 4.3 petabytes in a single year. For this reason, Twitter is a great source for big mining data. Many industries such as Telecommunication companies can leverage the availability of Twitter data to better understand their markets and make an appropriate business decision. This study performs topic modeling on Twitter data using Latent Dirichlet Allocation (LDA). The obtained results are benchmarked with another topic modeling technique, Latent Semantic Indexing (LSI). The study aims to retrieve topics on a Twitter dataset containing user tweets on South African Telcos. Results from this study show that LSI is much faster than LDA. However, LDA yields better results with higher topic coherence by 8% for the best-performing model represented in Table 1. A higher topic coherence score indicates better performance of the model.Keywords: big data, latent Dirichlet allocation, latent semantic indexing, telco, topic modeling, twitter
Procedia PDF Downloads 15126982 Enhance the Power of Sentiment Analysis
Authors: Yu Zhang, Pedro Desouza
Abstract:
Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modelling and testing work was done in R and Greenplum in-database analytic tools.Keywords: sentiment analysis, social media, Twitter, Amazon, data mining, machine learning, text mining
Procedia PDF Downloads 35326981 Method for Selecting and Prioritising Smart Services in Manufacturing Companies
Authors: Till Gramberg, Max Kellner, Erwin Gross
Abstract:
This paper presents a comprehensive investigation into the topic of smart services and IIoT-Platforms, focusing on their selection and prioritization in manufacturing organizations. First, a literature review is conducted to provide a basic understanding of the current state of research in the area of smart services. Based on discussed and established definitions, a definition approach for this paper is developed. In addition, value propositions for smart services are identified based on the literature and expert interviews. Furthermore, the general requirements for the provision of smart services are presented. Subsequently, existing approaches for the selection and development of smart services are identified and described. In order to determine the requirements for the selection of smart services, expert opinions from successful companies that have already implemented smart services are collected through semi-structured interviews. Based on the results, criteria for the evaluation of existing methods are derived. The existing methods are then evaluated according to the identified criteria. Furthermore, a novel method for the selection of smart services in manufacturing companies is developed, taking into account the identified criteria and the existing approaches. The developed concept for the method is verified in expert interviews. The method includes a collection of relevant smart services identified in the literature. The actual relevance of the use cases in the industrial environment was validated in an online survey. The required data and sensors are assigned to the smart service use cases. The value proposition of the use cases is evaluated in an expert workshop using different indicators. Based on this, a comparison is made between the identified value proposition and the required data, leading to a prioritization process. The prioritization process follows an established procedure for evaluating technical decision-making processes. In addition to the technical requirements, the prioritization process includes other evaluation criteria such as the economic benefit, the conformity of the new service offering with the company strategy, or the customer retention enabled by the smart service. Finally, the method is applied and validated in an industrial environment. The results of these experiments are critically reflected upon and an outlook on future developments in the area of smart services is given. This research contributes to a deeper understanding of the selection and prioritization process as well as the technical considerations associated with smart service implementation in manufacturing organizations. The proposed method serves as a valuable guide for decision makers, helping them to effectively select the most appropriate smart services for their specific organizational needs.Keywords: smart services, IIoT, industrie 4.0, IIoT-platform, big data
Procedia PDF Downloads 8926980 Real-Time Big-Data Warehouse a Next-Generation Enterprise Data Warehouse and Analysis Framework
Authors: Abbas Raza Ali
Abstract:
Big Data technology is gradually becoming a dire need of large enterprises. These enterprises are generating massively large amount of off-line and streaming data in both structured and unstructured formats on daily basis. It is a challenging task to effectively extract useful insights from the large scale datasets, even though sometimes it becomes a technology constraint to manage transactional data history of more than a few months. This paper presents a framework to efficiently manage massively large and complex datasets. The framework has been tested on a communication service provider producing massively large complex streaming data in binary format. The communication industry is bound by the regulators to manage history of their subscribers’ call records where every action of a subscriber generates a record. Also, managing and analyzing transactional data allows service providers to better understand their customers’ behavior, for example, deep packet inspection requires transactional internet usage data to explain internet usage behaviour of the subscribers. However, current relational database systems limit service providers to only maintain history at semantic level which is aggregated at subscriber level. The framework addresses these challenges by leveraging Big Data technology which optimally manages and allows deep analysis of complex datasets. The framework has been applied to offload existing Intelligent Network Mediation and relational Data Warehouse of the service provider on Big Data. The service provider has 50+ million subscriber-base with yearly growth of 7-10%. The end-to-end process takes not more than 10 minutes which involves binary to ASCII decoding of call detail records, stitching of all the interrogations against a call (transformations) and aggregations of all the call records of a subscriber.Keywords: big data, communication service providers, enterprise data warehouse, stream computing, Telco IN Mediation
Procedia PDF Downloads 17526979 Programming with Grammars
Authors: Peter M. Maurer Maurer
Abstract:
DGL is a context free grammar-based tool for generating random data. Many types of simulator input data require some computation to be placed in the proper format. For example, it might be necessary to generate ordered triples in which the third element is the sum of the first two elements, or it might be necessary to generate random numbers in some sorted order. Although DGL is universal in computational power, generating these types of data is extremely difficult. To overcome this problem, we have enhanced DGL to include features that permit direct computation within the structure of a context free grammar. The features have been implemented as special types of productions, preserving the context free flavor of DGL specifications.Keywords: DGL, Enhanced Context Free Grammars, Programming Constructs, Random Data Generation
Procedia PDF Downloads 14726978 A Model Architecture Transformation with Approach by Modeling: From UML to Multidimensional Schemas of Data Warehouses
Authors: Ouzayr Rabhi, Ibtissam Arrassen
Abstract:
To provide a complete analysis of the organization and to help decision-making, leaders need to have relevant data; Data Warehouses (DW) are designed to meet such needs. However, designing DW is not trivial and there is no formal method to derive a multidimensional schema from heterogeneous databases. In this article, we present a Model-Driven based approach concerning the design of data warehouses. We describe a multidimensional meta-model and also specify a set of transformations starting from a Unified Modeling Language (UML) metamodel. In this approach, the UML metamodel and the multidimensional one are both considered as a platform-independent model (PIM). The first meta-model is mapped into the second one through transformation rules carried out by the Query View Transformation (QVT) language. This proposal is validated through the application of our approach to generating a multidimensional schema of a Balanced Scorecard (BSC) DW. We are interested in the BSC perspectives, which are highly linked to the vision and the strategies of an organization.Keywords: data warehouse, meta-model, model-driven architecture, transformation, UML
Procedia PDF Downloads 16026977 A Convolution Neural Network PM-10 Prediction System Based on a Dense Measurement Sensor Network in Poland
Authors: Piotr A. Kowalski, Kasper Sapala, Wiktor Warchalowski
Abstract:
PM10 is a suspended dust that primarily has a negative effect on the respiratory system. PM10 is responsible for attacks of coughing and wheezing, asthma or acute, violent bronchitis. Indirectly, PM10 also negatively affects the rest of the body, including increasing the risk of heart attack and stroke. Unfortunately, Poland is a country that cannot boast of good air quality, in particular, due to large PM concentration levels. Therefore, based on the dense network of Airly sensors, it was decided to deal with the problem of prediction of suspended particulate matter concentration. Due to the very complicated nature of this issue, the Machine Learning approach was used. For this purpose, Convolution Neural Network (CNN) neural networks have been adopted, these currently being the leading information processing methods in the field of computational intelligence. The aim of this research is to show the influence of particular CNN network parameters on the quality of the obtained forecast. The forecast itself is made on the basis of parameters measured by Airly sensors and is carried out for the subsequent day, hour after hour. The evaluation of learning process for the investigated models was mostly based upon the mean square error criterion; however, during the model validation, a number of other methods of quantitative evaluation were taken into account. The presented model of pollution prediction has been verified by way of real weather and air pollution data taken from the Airly sensor network. The dense and distributed network of Airly measurement devices enables access to current and archival data on air pollution, temperature, suspended particulate matter PM1.0, PM2.5, and PM10, CAQI levels, as well as atmospheric pressure and air humidity. In this investigation, PM2.5, and PM10, temperature and wind information, as well as external forecasts of temperature and wind for next 24h served as inputted data. Due to the specificity of the CNN type network, this data is transformed into tensors and then processed. This network consists of an input layer, an output layer, and many hidden layers. In the hidden layers, convolutional and pooling operations are performed. The output of this system is a vector containing 24 elements that contain prediction of PM10 concentration for the upcoming 24 hour period. Over 1000 models based on CNN methodology were tested during the study. During the research, several were selected out that give the best results, and then a comparison was made with the other models based on linear regression. The numerical tests carried out fully confirmed the positive properties of the presented method. These were carried out using real ‘big’ data. Models based on the CNN technique allow prediction of PM10 dust concentration with a much smaller mean square error than currently used methods based on linear regression. What's more, the use of neural networks increased Pearson's correlation coefficient (R²) by about 5 percent compared to the linear model. During the simulation, the R² coefficient was 0.92, 0.76, 0.75, 0.73, and 0.73 for 1st, 6th, 12th, 18th, and 24th hour of prediction respectively.Keywords: air pollution prediction (forecasting), machine learning, regression task, convolution neural networks
Procedia PDF Downloads 14926976 Secured Embedding of Patient’s Confidential Data in Electrocardiogram Using Chaotic Maps
Authors: Butta Singh
Abstract:
This paper presents a chaotic map based approach for secured embedding of patient’s confidential data in electrocardiogram (ECG) signal. The chaotic map generates predefined locations through the use of selective control parameters. The sample value difference method effectually hides the confidential data in ECG sample pairs at these predefined locations. Evaluation of proposed method on all 48 records of MIT-BIH arrhythmia ECG database demonstrates that the embedding does not alter the diagnostic features of cover ECG. The secret data imperceptibility in stego-ECG is evident through various statistical and clinical performance measures. Statistical metrics comprise of Percentage Root Mean Square Difference (PRD) and Peak Signal to Noise Ratio (PSNR). Further, a comparative analysis between proposed method and existing approaches was also performed. The results clearly demonstrated the superiority of proposed method.Keywords: chaotic maps, ECG steganography, data embedding, electrocardiogram
Procedia PDF Downloads 19626975 Microgrid Design Under Optimal Control With Batch Reinforcement Learning
Authors: Valentin Père, Mathieu Milhé, Fabien Baillon, Jean-Louis Dirion
Abstract:
Microgrids offer potential solutions to meet the need for local grid stability and increase isolated networks autonomy with the integration of intermittent renewable energy production and storage facilities. In such a context, sizing production and storage for a given network is a complex task, highly depending on input data such as power load profile and renewable resource availability. This work aims at developing an operating cost computation methodology for different microgrid designs based on the use of deep reinforcement learning (RL) algorithms to tackle the optimal operation problem in stochastic environments. RL is a data-based sequential decision control method based on Markov decision processes that enable the consideration of random variables for control at a chosen time scale. Agents trained via RL constitute a promising class of Energy Management Systems (EMS) for the operation of microgrids with energy storage. Microgrid sizing (or design) is generally performed by minimizing investment costs and operational costs arising from the EMS behavior. The latter might include economic aspects (power purchase, facilities aging), social aspects (load curtailment), and ecological aspects (carbon emissions). Sizing variables are related to major constraints on the optimal operation of the network by the EMS. In this work, an islanded mode microgrid is considered. Renewable generation is done with photovoltaic panels; an electrochemical battery ensures short-term electricity storage. The controllable unit is a hydrogen tank that is used as a long-term storage unit. The proposed approach focus on the transfer of agent learning for the near-optimal operating cost approximation with deep RL for each microgrid size. Like most data-based algorithms, the training step in RL leads to important computer time. The objective of this work is thus to study the potential of Batch-Constrained Q-learning (BCQ) for the optimal sizing of microgrids and especially to reduce the computation time of operating cost estimation in several microgrid configurations. BCQ is an off-line RL algorithm that is known to be data efficient and can learn better policies than on-line RL algorithms on the same buffer. The general idea is to use the learned policy of agents trained in similar environments to constitute a buffer. The latter is used to train BCQ, and thus the agent learning can be performed without update during interaction sampling. A comparison between online RL and the presented method is performed based on the score by environment and on the computation time.Keywords: batch-constrained reinforcement learning, control, design, optimal
Procedia PDF Downloads 12326974 Detection Efficient Enterprises via Data Envelopment Analysis
Authors: S. Turkan
Abstract:
In this paper, the Turkey’s Top 500 Industrial Enterprises data in 2014 were analyzed by data envelopment analysis. Data envelopment analysis is used to detect efficient decision-making units such as universities, hospitals, schools etc. by using inputs and outputs. The decision-making units in this study are enterprises. To detect efficient enterprises, some financial ratios are determined as inputs and outputs. For this reason, financial indicators related to productivity of enterprises are considered. The efficient foreign weighted owned capital enterprises are detected via super efficiency model. According to the results, it is said that Mercedes-Benz is the most efficient foreign weighted owned capital enterprise in Turkey.Keywords: data envelopment analysis, super efficiency, logistic regression, financial ratios
Procedia PDF Downloads 32426973 Intelligent Process Data Mining for Monitoring for Fault-Free Operation of Industrial Processes
Authors: Hyun-Woo Cho
Abstract:
The real-time fault monitoring and diagnosis of large scale production processes is helpful and necessary in order to operate industrial process safely and efficiently producing good final product quality. Unusual and abnormal events of the process may have a serious impact on the process such as malfunctions or breakdowns. This work try to utilize process measurement data obtained in an on-line basis for the safe and some fault-free operation of industrial processes. To this end, this work evaluated the proposed intelligent process data monitoring framework based on a simulation process. The monitoring scheme extracts the fault pattern in the reduced space for the reliable data representation. Moreover, this work shows the results of using linear and nonlinear techniques for the monitoring purpose. It has shown that the nonlinear technique produced more reliable monitoring results and outperforms linear methods. The adoption of the qualitative monitoring model helps to reduce the sensitivity of the fault pattern to noise.Keywords: process data, data mining, process operation, real-time monitoring
Procedia PDF Downloads 64026972 Comparison of Overall Sensitivity of Meloidogyne incognita to Pure Cucurbitacins and Cucurbitacin-Containing Crude Extracts
Authors: Zakheleni P. Dube, Phatu W. Mashela
Abstract:
The Curve-fitting Allelochemical Response Data (CARD) model had been adopted as a valuable tool in enhancing the understanding of the efficacy of cucurbitacin-containing phytonematicides on the suppression of nematodes. In most cases, for registration purposes, the active ingredients should be in purified form. Evidence in other phytonematicides suggested that purified active ingredients were less effective in suppression of nematodes. The objective of this study was to use CARD model to compare the overall sensitivities of Meloidogyne incognita J2 hatch, mobility and mortality to Nemarioc-AL phytonematicides, cucurbitacin A, Nemafric-BL phytonematicide and cucurbitacin B. Meloidogyne incognita eggs and J2 were exposed to 0.00, 0.50, 1.00, 1.50, 2.00, 2.50, 3.00, 3.50, 4.00, 4.50 and 5.00% of each phytonematicide, whereas in purified form the concentrations were 0.00, 0.25, 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 2.00, 2.25 and 2.50 μg.mL⁻¹. The exposure period to each concentration was 24-, 48- and 72-h. The overall sensitivities of J2 hatch to Nemarioc-AL phytonematicide, cucurbitacin A, Nemafric-BL phytonematicide and cucurbitacin B were 1, 30, 5 and 2 units, respectively, whereas J2 mobiltity were 3, 17, 3 and 6 units, respectively. For J2 mortality overall sensitivities to Nemarioc-AL phytonematicide, cucurbitacin A, Nemafric-BL phytonematicide and cucurbitacin B were 2, 4, 1 and 4 units, respectively. In conclusion, the two crude extracts, Nemarioc-AL and Nemafric-BL phytonematicides were generally more potent to M. incognita compared to their pure active ingredients. The crude plant extract preparation is easy, and they could be an ideal tactic for the management of nematodes in resource poor farming communities.Keywords: Botanicals, cucumin, leptodermin, plant extracts, triterpenoids
Procedia PDF Downloads 21026971 Cultural Impact on Fairness Perception of Inequality: A Study on People With Chinese Roots Living in Germany
Authors: Yanping He-Ulbricht, Marc Oliver Rieger
Abstract:
Based on survey data collected from people with Chinese roots living in Germany, this paper examines the impact of assimilation degree and language priming (Chinese or German) on individuals’ perceived fairness of economic and social differences and their attitude towards these. The results show that both the language used and the length of time spent in a foreign culture have a significant impact. Subjects who had spent less than 10 years in Germany demonstrated a higher readiness to accept government intervention in markets with price limits than those who had lived there longer. Subjects who were asked and answered in German perceived the current economic situation as less fair and were also less inclined to accept inequality, even when it leads to a Pareto improvement. While the difference in fairness perception of inequality was a cultural effect, the difference in attitudes towards government intervention was rather a result of learning process. The findings imply that both learning processes of individuals and culture play an important role in perception and preferences regarding social and economic differences.Keywords: assimilation, bilingualism, cross-cultural comparison, income inequality, language priming, price fairness
Procedia PDF Downloads 8726970 Self-Attention Mechanism for Target Hiding Based on Satellite Images
Authors: Hao Yuan, Yongjian Shen, Xiangjun He, Yuheng Li, Zhouzhou Zhang, Pengyu Zhang, Minkang Cai
Abstract:
Remote sensing data can provide support for decision-making in disaster assessment or disaster relief. The traditional processing methods of sensitive targets in remote sensing mapping are mainly based on manual retrieval and image editing tools, which are inefficient. Methods based on deep learning for sensitive target hiding are faster and more flexible. But these methods have disadvantages in training time and cost of calculation. This paper proposed a target hiding model Self Attention (SA) Deepfill, which used self-attention modules to replace part of gated convolution layers in image inpainting. By this operation, the calculation amount of the model becomes smaller, and the performance is improved. And this paper adds free-form masks to the model’s training to enhance the model’s universal. The experiment on an open remote sensing dataset proved the efficiency of our method. Moreover, through experimental comparison, the proposed method can train for a longer time without over-fitting. Finally, compared with the existing methods, the proposed model has lower computational weight and better performance.Keywords: remote sensing mapping, image inpainting, self-attention mechanism, target hiding
Procedia PDF Downloads 13626969 Statistically Accurate Synthetic Data Generation for Enhanced Traffic Predictive Modeling Using Generative Adversarial Networks and Long Short-Term Memory
Authors: Srinivas Peri, Siva Abhishek Sirivella, Tejaswini Kallakuri, Uzair Ahmad
Abstract:
Effective traffic management and infrastructure planning are crucial for the development of smart cities and intelligent transportation systems. This study addresses the challenge of data scarcity by generating realistic synthetic traffic data using the PeMS-Bay dataset, improving the accuracy and reliability of predictive modeling. Advanced synthetic data generation techniques, including TimeGAN, GaussianCopula, and PAR Synthesizer, are employed to produce synthetic data that replicates the statistical and structural characteristics of real-world traffic. Future integration of Spatial-Temporal Generative Adversarial Networks (ST-GAN) is planned to capture both spatial and temporal correlations, further improving data quality and realism. The performance of each synthetic data generation model is evaluated against real-world data to identify the best models for accurately replicating traffic patterns. Long Short-Term Memory (LSTM) networks are utilized to model and predict complex temporal dependencies within traffic patterns. This comprehensive approach aims to pinpoint areas with low vehicle counts, uncover underlying traffic issues, and inform targeted infrastructure interventions. By combining GAN-based synthetic data generation with LSTM-based traffic modeling, this study supports data-driven decision-making that enhances urban mobility, safety, and the overall efficiency of city planning initiatives.Keywords: GAN, long short-term memory, synthetic data generation, traffic management
Procedia PDF Downloads 2826968 Rule-Based Mamdani Type Fuzzy Modeling of Performances of Anode Side of Proton Exchange Membrane Fuel Cell Spin-Coated with Yttria-Stabilized Zirconia
Authors: Sadık Ata, Kevser Dincer
Abstract:
In this study, performance of proton exchange membrane (PEM) fuel cell was experimentally investigated and modelled with Rule-Based Mamdani-Type Fuzzy (RBMTF) modelling technique. Coating on the anode side of the PEM fuel cell was accomplished with the spin method by using Yttria-stabilized zirconia (YSZ). Input parameters voltage density (V/cm2), and current density (A/cm2), temperature (°C), time (s); output parameter power density (W/cm2) were described by RBMTF if-then rules. Numerical parameters of input and output variables were fuzzificated as linguistic variables: Very Very Low (L1), Very Low (L2), Low (L3), Negative Medium (L4), Medium (L5), Positive Medium (L6), High (L7), Very High (L8) and Very Very High (L9) linguistic classes. The comparison between experimental data and RBMTF is done by using statistical methods like absolute fraction of variance (R2). The actual values and RBMTF results indicated that RBMTF can be successfully used for the analysis of performance of PEM fuel cell.Keywords: proton exchange membrane (PEM), fuel cell, rule-based Mamdani-type fuzzy (RMBTF) modeling, yttria-stabilized zirconia (YSZ)
Procedia PDF Downloads 362