Search results for: logical data model
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 34486

Search results for: logical data model

34426 Reliability Prediction of Tires Using Linear Mixed-Effects Model

Authors: Myung Hwan Na, Ho- Chun Song, EunHee Hong

Abstract:

We widely use normal linear mixed-effects model to analysis data in repeated measurement. In case of detecting heteroscedasticity and the non-normality of the population distribution at the same time, normal linear mixed-effects model can give improper result of analysis. To achieve more robust estimation, we use heavy tailed linear mixed-effects model which gives more exact and reliable analysis conclusion than standard normal linear mixed-effects model.

Keywords: reliability, tires, field data, linear mixed-effects model

Procedia PDF Downloads 534
34425 Detecting Logical Errors in Haskell

Authors: Vanessa Vasconcelos, Mariza A. S. Bigonha

Abstract:

In order to facilitate both processes, this paper presents HaskellFL, a tool that uses fault localization techniques to locate a logical error in Haskell code. The Haskell subset used in this work is sufficiently expressive for those studying functional programming to get immediate help debugging their code and to answer questions about key concepts associated with the functional paradigm. HaskellFL was tested against functional programming assignments submitted by students enrolled at the functional programming class at the Federal University of Minas Gerais and against exercises from the Exercism Haskell track that are publicly available on GitHub. Furthermore, the EXAM score was chosen to evaluate the tool’s effectiveness, and results showed that HaskellFL reduced the effort needed to locate an error for all tested scenarios. Results also showed that the Ochiai method was more effective than Tarantula.

Keywords: debug, fault localization, functional programming, Haskell

Procedia PDF Downloads 274
34424 Efficient Frequent Itemset Mining Methods over Real-Time Spatial Big Data

Authors: Hamdi Sana, Emna Bouazizi, Sami Faiz

Abstract:

In recent years, there is a huge increase in the use of spatio-temporal applications where data and queries are continuously moving. As a result, the need to process real-time spatio-temporal data seems clear and real-time stream data management becomes a hot topic. Sliding window model and frequent itemset mining over dynamic data are the most important problems in the context of data mining. Thus, sliding window model for frequent itemset mining is a widely used model for data stream mining due to its emphasis on recent data and its bounded memory requirement. These methods use the traditional transaction-based sliding window model where the window size is based on a fixed number of transactions. Actually, this model supposes that all transactions have a constant rate which is not suited for real-time applications. And the use of this model in such applications endangers their performance. Based on these observations, this paper relaxes the notion of window size and proposes the use of a timestamp-based sliding window model. In our proposed frequent itemset mining algorithm, support conditions are used to differentiate frequents and infrequent patterns. Thereafter, a tree is developed to incrementally maintain the essential information. We evaluate our contribution. The preliminary results are quite promising.

Keywords: real-time spatial big data, frequent itemset, transaction-based sliding window model, timestamp-based sliding window model, weighted frequent patterns, tree, stream query

Procedia PDF Downloads 130
34423 Establishment of Bit Selective Mode Storage Covert Channel in VANETs

Authors: Amarpreet Singh, Kimi Manchanda

Abstract:

Intended for providing the security in the VANETS (Vehicular Ad hoc Network) scenario, the covert storage channel is implemented through data transmitted between the sender and the receiver. Covert channels are the logical links which are used for the communication purpose and hiding the secure data from the intruders. This paper refers to the Establishment of bit selective mode covert storage channels in VANETS. In this scenario, the data is being transmitted with two modes i.e. the normal mode and the covert mode. During the communication between vehicles in this scenario, the controlling of bits is possible through the optional bits of IPV6 Header Format. This implementation is fulfilled with the help of Network simulator.

Keywords: covert mode, normal mode, VANET, OBU, on-board unit

Procedia PDF Downloads 338
34422 A Methodology for Investigating Public Opinion Using Multilevel Text Analysis

Authors: William Xiu Shun Wong, Myungsu Lim, Yoonjin Hyun, Chen Liu, Seongi Choi, Dasom Kim, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, many users have begun to frequently share their opinions on diverse issues using various social media. Therefore, numerous governments have attempted to establish or improve national policies according to the public opinions captured from various social media. In this paper, we indicate several limitations of the traditional approaches to analyze public opinion on science and technology and provide an alternative methodology to overcome these limitations. First, we distinguish between the science and technology analysis phase and the social issue analysis phase to reflect the fact that public opinion can be formed only when a certain science and technology is applied to a specific social issue. Next, we successively apply a start list and a stop list to acquire clarified and interesting results. Finally, to identify the most appropriate documents that fit with a given subject, we develop a new logical filter concept that consists of not only mere keywords but also a logical relationship among the keywords. This study then analyzes the possibilities for the practical use of the proposed methodology thorough its application to discover core issues and public opinions from 1,700,886 documents comprising SNS, blogs, news, and discussions.

Keywords: big data, social network analysis, text mining, topic modeling

Procedia PDF Downloads 264
34421 Mixture statistical modeling for predecting mortality human immunodeficiency virus (HIV) and tuberculosis(TB) infection patients

Authors: Mohd Asrul Affendi Bi Abdullah, Nyi Nyi Naing

Abstract:

The purpose of this study was to identify comparable manner between negative binomial death rate (NBDR) and zero inflated negative binomial death rate (ZINBDR) with died patients with (HIV + T B+) and (HIV + T B−). HIV and TB is a serious world wide problem in the developing country. Data were analyzed with applying NBDR and ZINBDR to make comparison which a favorable model is better to used. The ZINBDR model is able to account for the disproportionately large number of zero within the data and is shown to be a consistently better fit than the NBDR model. Hence, as a results ZINBDR model is a superior fit to the data than the NBDR model and provides additional information regarding the died mechanisms HIV+TB. The ZINBDR model is shown to be a use tool for analysis death rate according age categorical.

Keywords: zero inflated negative binomial death rate, HIV and TB, AIC and BIC, death rate

Procedia PDF Downloads 401
34420 Commercial Automobile Insurance: A Practical Approach of the Generalized Additive Model

Authors: Nicolas Plamondon, Stuart Atkinson, Shuzi Zhou

Abstract:

The insurance industry is usually not the first topic one has in mind when thinking about applications of data science. However, the use of data science in the finance and insurance industry is growing quickly for several reasons, including an abundance of reliable customer data, ferocious competition requiring more accurate pricing, etc. Among the top use cases of data science, we find pricing optimization, customer segmentation, customer risk assessment, fraud detection, marketing, and triage analytics. The objective of this paper is to present an application of the generalized additive model (GAM) on a commercial automobile insurance product: an individually rated commercial automobile. These are vehicles used for commercial purposes, but for which there is not enough volume to apply pricing to several vehicles at the same time. The GAM model was selected as an improvement over GLM for its ease of use and its wide range of applications. The model was trained using the largest split of the data to determine model parameters. The remaining part of the data was used as testing data to verify the quality of the modeling activity. We used the Gini coefficient to evaluate the performance of the model. For long-term monitoring, commonly used metrics such as RMSE and MAE will be used. Another topic of interest in the insurance industry is to process of producing the model. We will discuss at a high level the interactions between the different teams with an insurance company that needs to work together to produce a model and then monitor the performance of the model over time. Moreover, we will discuss the regulations in place in the insurance industry. Finally, we will discuss the maintenance of the model and the fact that new data does not come constantly and that some metrics can take a long time to become meaningful.

Keywords: insurance, data science, modeling, monitoring, regulation, processes

Procedia PDF Downloads 52
34419 Optimizing Microgrid Operations: A Framework of Adaptive Model Predictive Control

Authors: Ruben Lopez-Rodriguez

Abstract:

In a microgrid, diverse energy sources (both renewable and non-renewable) are combined with energy storage units to form a localized power system. Microgrids function as independent entities, capable of meeting the energy needs of specific areas or communities. This paper introduces a Model Predictive Control (MPC) approach tailored for grid-connected microgrids, aiming to optimize their operation. The formulation employs Mixed-Integer Programming (MIP) to find optimal trajectories. This entails the fulfillment of continuous and binary constraints, all while accounting for commutations between various operating conditions such as storage unit charge/discharge, import/export from/towards the main grid, as well as asset connection/disconnection. To validate the proposed approach, a microgrid case study is conducted, and the simulation results are compared with those obtained using a rule-based strategy.

Keywords: microgrids, mixed logical dynamical systems, mixed-integer optimization, model predictive control

Procedia PDF Downloads 12
34418 An Overview of New Era in Food Science and Technology

Authors: Raana Babadi Fathipour

Abstract:

Strict prerequisites of logical diaries united ought to demonstrate the exploratory information is (in)significant from the statistical point of view and has driven a soak increment within the utilization and advancement of the factual program. It is essential that the utilization of numerical and measurable strategies, counting chemometrics and many other factual methods/algorithms in nourishment science and innovation has expanded steeply within the final 20 a long time. Computational apparatuses accessible can be utilized not as it were to run factual investigations such as univariate and bivariate tests as well as multivariate calibration and improvement of complex models but also to run reenactments of distinctive scenarios considering a set of inputs or essentially making expectations for particular information sets or conditions. Conducting a fast look within the most legitimate logical databases (Pubmed, ScienceDirect, Scopus), it is conceivable to watch that measurable strategies have picked up a colossal space in numerous regions.

Keywords: food science, food technology, food safety, computational tools

Procedia PDF Downloads 38
34417 Online Learning for Modern Business Models: Theoretical Considerations and Algorithms

Authors: Marian Sorin Ionescu, Olivia Negoita, Cosmin Dobrin

Abstract:

This scientific communication reports and discusses learning models adaptable to modern business problems and models specific to digital concepts and paradigms. In the PAC (probably approximately correct) learning model approach, in which the learning process begins by receiving a batch of learning examples, the set of learning processes is used to acquire a hypothesis, and when the learning process is fully used, this hypothesis is used in the prediction of new operational examples. For complex business models, a lot of models should be introduced and evaluated to estimate the induced results so that the totality of the results are used to develop a predictive rule, which anticipates the choice of new models. In opposition, for online learning-type processes, there is no separation between the learning (training) and predictive phase. Every time a business model is approached, a test example is considered from the beginning until the prediction of the appearance of a model considered correct from the point of view of the business decision. After choosing choice a part of the business model, the label with the logical value "true" is known. Some of the business models are used as examples of learning (training), which helps to improve the prediction mechanisms for future business models.

Keywords: machine learning, business models, convex analysis, online learning

Procedia PDF Downloads 115
34416 Procedure Model for Data-Driven Decision Support Regarding the Integration of Renewable Energies into Industrial Energy Management

Authors: M. Graus, K. Westhoff, X. Xu

Abstract:

The climate change causes a change in all aspects of society. While the expansion of renewable energies proceeds, industry could not be convinced based on general studies about the potential of demand side management to reinforce smart grid considerations in their operational business. In this article, a procedure model for a case-specific data-driven decision support for industrial energy management based on a holistic data analytics approach is presented. The model is executed on the example of the strategic decision problem, to integrate the aspect of renewable energies into industrial energy management. This question is induced due to considerations of changing the electricity contract model from a standard rate to volatile energy prices corresponding to the energy spot market which is increasingly more affected by renewable energies. The procedure model corresponds to a data analytics process consisting on a data model, analysis, simulation and optimization step. This procedure will help to quantify the potentials of sustainable production concepts based on the data from a factory. The model is validated with data from a printer in analogy to a simple production machine. The overall goal is to establish smart grid principles for industry via the transformation from knowledge-driven to data-driven decisions within manufacturing companies.

Keywords: data analytics, green production, industrial energy management, optimization, renewable energies, simulation

Procedia PDF Downloads 412
34415 Leveraging Natural Language Processing for Legal Artificial Intelligence: A Longformer Approach for Taiwanese Legal Cases

Authors: Hsin Lee, Hsuan Lee

Abstract:

Legal artificial intelligence (LegalAI) has been increasing applications within legal systems, propelled by advancements in natural language processing (NLP). Compared with general documents, legal case documents are typically long text sequences with intrinsic logical structures. Most existing language models have difficulty understanding the long-distance dependencies between different structures. Another unique challenge is that while the Judiciary of Taiwan has released legal judgments from various levels of courts over the years, there remains a significant obstacle in the lack of labeled datasets. This deficiency makes it difficult to train models with strong generalization capabilities, as well as accurately evaluate model performance. To date, models in Taiwan have yet to be specifically trained on judgment data. Given these challenges, this research proposes a Longformer-based pre-trained language model explicitly devised for retrieving similar judgments in Taiwanese legal documents. This model is trained on a self-constructed dataset, which this research has independently labeled to measure judgment similarities, thereby addressing a void left by the lack of an existing labeled dataset for Taiwanese judgments. This research adopts strategies such as early stopping and gradient clipping to prevent overfitting and manage gradient explosion, respectively, thereby enhancing the model's performance. The model in this research is evaluated using both the dataset and the Average Entropy of Offense-charged Clustering (AEOC) metric, which utilizes the notion of similar case scenarios within the same type of legal cases. Our experimental results illustrate our model's significant advancements in handling similarity comparisons within extensive legal judgments. By enabling more efficient retrieval and analysis of legal case documents, our model holds the potential to facilitate legal research, aid legal decision-making, and contribute to the further development of LegalAI in Taiwan.

Keywords: legal artificial intelligence, computation and language, language model, Taiwanese legal cases

Procedia PDF Downloads 45
34414 Machine Learning Data Architecture

Authors: Neerav Kumar, Naumaan Nayyar, Sharath Kashyap

Abstract:

Most companies see an increase in the adoption of machine learning (ML) applications across internal and external-facing use cases. ML applications vend output either in batch or real-time patterns. A complete batch ML pipeline architecture comprises data sourcing, feature engineering, model training, model deployment, model output vending into a data store for downstream application. Due to unclear role expectations, we have observed that scientists specializing in building and optimizing models are investing significant efforts into building the other components of the architecture, which we do not believe is the best use of scientists’ bandwidth. We propose a system architecture created using AWS services that bring industry best practices to managing the workflow and simplifies the process of model deployment and end-to-end data integration for an ML application. This narrows down the scope of scientists’ work to model building and refinement while specialized data engineers take over the deployment, pipeline orchestration, data quality, data permission system, etc. The pipeline infrastructure is built and deployed as code (using terraform, cdk, cloudformation, etc.) which makes it easy to replicate and/or extend the architecture to other models that are used in an organization.

Keywords: data pipeline, machine learning, AWS, architecture, batch machine learning

Procedia PDF Downloads 31
34413 An Extended Inverse Pareto Distribution, with Applications

Authors: Abdel Hadi Ebraheim

Abstract:

This paper introduces a new extension of the Inverse Pareto distribution in the framework of Marshal-Olkin (1997) family of distributions. This model is capable of modeling various shapes of aging and failure data. The statistical properties of the new model are discussed. Several methods are used to estimate the parameters involved. Explicit expressions are derived for different types of moments of value in reliability analysis are obtained. Besides, the order statistics of samples from the new proposed model have been studied. Finally, the usefulness of the new model for modeling reliability data is illustrated using two real data sets with simulation study.

Keywords: pareto distribution, marshal-Olkin, reliability, hazard functions, moments, estimation

Procedia PDF Downloads 49
34412 Data-Driven Dynamic Overbooking Model for Tour Operators

Authors: Kannapha Amaruchkul

Abstract:

We formulate a dynamic overbooking model for a tour operator, in which most reservations contain at least two people. The cancellation rate and the timing of the cancellation may depend on the group size. We propose two overbooking policies, namely economic- and service-based. In an economic-based policy, we want to minimize the expected oversold and underused cost, whereas, in a service-based policy, we ensure that the probability of an oversold situation does not exceed the pre-specified threshold. To illustrate the applicability of our approach, we use tour package data in 2016-2018 from a tour operator in Thailand to build a data-driven robust optimization model, and we tested the proposed overbooking policy in 2019. We also compare the data-driven approach to the conventional approach of fitting data into a probability distribution.

Keywords: applied stochastic model, data-driven robust optimization, overbooking, revenue management, tour operator

Procedia PDF Downloads 104
34411 Modeling and Statistical Analysis of a Soap Production Mix in Bejoy Manufacturing Industry, Anambra State, Nigeria

Authors: Okolie Chukwulozie Paul, Iwenofu Chinwe Onyedika, Sinebe Jude Ebieladoh, M. C. Nwosu

Abstract:

The research work is based on the statistical analysis of the processing data. The essence is to analyze the data statistically and to generate a design model for the production mix of soap manufacturing products in Bejoy manufacturing company Nkpologwu, Aguata Local Government Area, Anambra state, Nigeria. The statistical analysis shows the statistical analysis and the correlation of the data. T test, Partial correlation and bi-variate correlation were used to understand what the data portrays. The design model developed was used to model the data production yield and the correlation of the variables show that the R2 is 98.7%. However, the results confirm that the data is fit for further analysis and modeling. This was proved by the correlation and the R-squared.

Keywords: General Linear Model, correlation, variables, pearson, significance, T-test, soap, production mix and statistic

Procedia PDF Downloads 410
34410 Elemental Graph Data Model: A Semantic and Topological Representation of Building Elements

Authors: Yasmeen A. S. Essawy, Khaled Nassar

Abstract:

With the rapid increase of complexity in the building industry, professionals in the A/E/C industry were forced to adopt Building Information Modeling (BIM) in order to enhance the communication between the different project stakeholders throughout the project life cycle and create a semantic object-oriented building model that can support geometric-topological analysis of building elements during design and construction. This paper presents a model that extracts topological relationships and geometrical properties of building elements from an existing fully designed BIM, and maps this information into a directed acyclic Elemental Graph Data Model (EGDM). The model incorporates BIM-based search algorithms for automatic deduction of geometrical data and topological relationships for each building element type. Using graph search algorithms, such as Depth First Search (DFS) and topological sortings, all possible construction sequences can be generated and compared against production and construction rules to generate an optimized construction sequence and its associated schedule. The model is implemented in a C# platform.

Keywords: building information modeling (BIM), elemental graph data model (EGDM), geometric and topological data models, graph theory

Procedia PDF Downloads 345
34409 Iot Device Cost Effective Storage Architecture and Real-Time Data Analysis/Data Privacy Framework

Authors: Femi Elegbeleye, Omobayo Esan, Muienge Mbodila, Patrick Bowe

Abstract:

This paper focused on cost effective storage architecture using fog and cloud data storage gateway and presented the design of the framework for the data privacy model and data analytics framework on a real-time analysis when using machine learning method. The paper began with the system analysis, system architecture and its component design, as well as the overall system operations. The several results obtained from this study on data privacy model shows that when two or more data privacy model is combined we tend to have a more stronger privacy to our data, and when fog storage gateway have several advantages over using the traditional cloud storage, from our result shows fog has reduced latency/delay, low bandwidth consumption, and energy usage when been compare with cloud storage, therefore, fog storage will help to lessen excessive cost. This paper dwelt more on the system descriptions, the researchers focused on the research design and framework design for the data privacy model, data storage, and real-time analytics. This paper also shows the major system components and their framework specification. And lastly, the overall research system architecture was shown, its structure, and its interrelationships.

Keywords: IoT, fog, cloud, data analysis, data privacy

Procedia PDF Downloads 66
34408 Oil Reservoir Asphalting Precipitation Estimating during CO2 Injection

Authors: I. Alhajri, G. Zahedi, R. Alazmi, A. Akbari

Abstract:

In this paper, an Artificial Neural Network (ANN) was developed to predict Asphaltene Precipitation (AP) during the injection of carbon dioxide into crude oil reservoirs. In this study, the experimental data from six different oil fields were collected. Seventy percent of the data was used to develop the ANN model, and different ANN architectures were examined. A network with the Trainlm training algorithm was found to be the best network to estimate the AP. To check the validity of the proposed model, the model was used to predict the AP for the thirty percent of the data that was unevaluated. The Mean Square Error (MSE) of the prediction was 0.0018, which confirms the excellent prediction capability of the proposed model. In the second part of this study, the ANN model predictions were compared with modified Hirschberg model predictions. The ANN was found to provide more accurate estimates compared to the modified Hirschberg model. Finally, the proposed model was employed to examine the effect of different operating parameters during gas injection on the AP. It was found that the AP is mostly sensitive to the reservoir temperature. Furthermore, the carbon dioxide concentration in liquid phase increases the AP.

Keywords: artificial neural network, asphaltene, CO2 injection, Hirschberg model, oil reservoirs

Procedia PDF Downloads 343
34407 Vibration-Based Data-Driven Model for Road Health Monitoring

Authors: Guru Prakash, Revanth Dugalam

Abstract:

A road’s condition often deteriorates due to harsh loading such as overload due to trucks, and severe environmental conditions such as heavy rain, snow load, and cyclic loading. In absence of proper maintenance planning, this results in potholes, wide cracks, bumps, and increased roughness of roads. In this paper, a data-driven model will be developed to detect these damages using vibration and image signals. The key idea of the proposed methodology is that the road anomaly manifests in these signals, which can be detected by training a machine learning algorithm. The use of various machine learning techniques such as the support vector machine and Radom Forest method will be investigated. The proposed model will first be trained and tested with artificially simulated data, and the model architecture will be finalized by comparing the accuracies of various models. Once a model is fixed, the field study will be performed, and data will be collected. The field data will be used to validate the proposed model and to predict the future road’s health condition. The proposed will help to automate the road condition monitoring process, repair cost estimation, and maintenance planning process.

Keywords: SVM, data-driven, road health monitoring, pot-hole

Procedia PDF Downloads 54
34406 Assessment of Students Skills in Error Detection in SQL Classes using Rubric Framework - An Empirical Study

Authors: Dirson Santos De Campos, Deller James Ferreira, Anderson Cavalcante Gonçalves, Uyara Ferreira Silva

Abstract:

Rubrics to learning research provide many evaluation criteria and expected performance standards linked to defined student activity for learning and pedagogical objectives. Despite the rubric being used in education at all levels, academic literature on rubrics as a tool to support research in SQL Education is quite rare. There is a large class of SQL queries is syntactically correct, but certainly, not all are semantically correct. Detecting and correcting errors is a recurring problem in SQL education. In this paper, we usthe Rubric Abstract Framework (RAF), which consists of steps, that allows us to map the information to measure student performance guided by didactic objectives defined by the teacher as long as it is contextualized domain modeling by rubric. An empirical study was done that demonstrates how rubrics can mitigate student difficulties in finding logical errors and easing teacher workload in SQL education. Detecting and correcting logical errors is an important skill for students. Researchers have proposed several ways to improve SQL education because understanding this paradigm skills are crucial in software engineering and computer science. The RAF instantiation was using in an empirical study developed during the COVID-19 pandemic in database course. The pandemic transformed face-to-face and remote education, without presential classes. The lab activities were conducted remotely, which hinders the teaching-learning process, in particular for this research, in verifying the evidence or statements of knowledge, skills, and abilities (KSAs) of students. Various research in academia and industry involved databases. The innovation proposed in this paper is the approach used where the results obtained when using rubrics to map logical errors in query formulation have been analyzed with gains obtained by students empirically verified. The research approach can be used in the post-pandemic period in both classroom and distance learning.

Keywords: rubric, logical error, structured query language (SQL), empirical study, SQL education

Procedia PDF Downloads 157
34405 WhatsApp as Part of a Blended Learning Model to Help Programming Novices

Authors: Tlou J. Ramabu

Abstract:

Programming is one of the challenging subjects in the field of computing. In the higher education sphere, some programming novices’ performance, retention rate, and success rate are not improving. Most of the time, the problem is caused by the slow pace of learning, difficulty in grasping the syntax of the programming language and poor logical skills. More importantly, programming forms part of major subjects within the field of computing. As a result, specialized pedagogical methods and innovation are highly recommended. Little research has been done on the potential productivity of the WhatsApp platform as part of a blended learning model. In this article, the authors discuss the WhatsApp group as a part of blended learning model incorporated for a group of programming novices. We discuss possible administrative activities for productive utilisation of the WhatsApp group on the blended learning overview. The aim is to take advantage of the popularity of WhatsApp and the time students spend on it for their educational purpose. We believe that blended learning featuring a WhatsApp group may ease novices’ cognitive load and strengthen their foundational programming knowledge and skills. This is a work in progress as the proposed blended learning model with WhatsApp incorporated is yet to be implemented.

Keywords: blended learning, higher education, WhatsApp, programming, novices, lecturers

Procedia PDF Downloads 134
34404 Model Averaging for Poisson Regression

Authors: Zhou Jianhong

Abstract:

Model averaging is a desirable approach to deal with model uncertainty, which, however, has rarely been explored for Poisson regression. In this paper, we propose a model averaging procedure based on an unbiased estimator of the expected Kullback-Leibler distance for the Poisson regression. Simulation study shows that the proposed model average estimator outperforms some other commonly used model selection and model average estimators in some situations. Our proposed methods are further applied to a real data example and the advantage of this method is demonstrated again.

Keywords: model averaging, poission regression, Kullback-Leibler distance, statistics

Procedia PDF Downloads 484
34403 Analysis of Production Forecasting in Unconventional Gas Resources Development Using Machine Learning and Data-Driven Approach

Authors: Dongkwon Han, Sangho Kim, Sunil Kwon

Abstract:

Unconventional gas resources have dramatically changed the future energy landscape. Unlike conventional gas resources, the key challenges in unconventional gas have been the requirement that applies to advanced approaches for production forecasting due to uncertainty and complexity of fluid flow. In this study, artificial neural network (ANN) model which integrates machine learning and data-driven approach was developed to predict productivity in shale gas. The database of 129 wells of Eagle Ford shale basin used for testing and training of the ANN model. The Input data related to hydraulic fracturing, well completion and productivity of shale gas were selected and the output data is a cumulative production. The performance of the ANN using all data sets, clustering and variables importance (VI) models were compared in the mean absolute percentage error (MAPE). ANN model using all data sets, clustering, and VI were obtained as 44.22%, 10.08% (cluster 1), 5.26% (cluster 2), 6.35%(cluster 3), and 32.23% (ANN VI), 23.19% (SVM VI), respectively. The results showed that the pre-trained ANN model provides more accurate results than the ANN model using all data sets.

Keywords: unconventional gas, artificial neural network, machine learning, clustering, variables importance

Procedia PDF Downloads 172
34402 Application Difference between Cox and Logistic Regression Models

Authors: Idrissa Kayijuka

Abstract:

The logistic regression and Cox regression models (proportional hazard model) at present are being employed in the analysis of prospective epidemiologic research looking into risk factors in their application on chronic diseases. However, a theoretical relationship between the two models has been studied. By definition, Cox regression model also called Cox proportional hazard model is a procedure that is used in modeling data regarding time leading up to an event where censored cases exist. Whereas the Logistic regression model is mostly applicable in cases where the independent variables consist of numerical as well as nominal values while the resultant variable is binary (dichotomous). Arguments and findings of many researchers focused on the overview of Cox and Logistic regression models and their different applications in different areas. In this work, the analysis is done on secondary data whose source is SPSS exercise data on BREAST CANCER with a sample size of 1121 women where the main objective is to show the application difference between Cox regression model and logistic regression model based on factors that cause women to die due to breast cancer. Thus we did some analysis manually i.e. on lymph nodes status, and SPSS software helped to analyze the mentioned data. This study found out that there is an application difference between Cox and Logistic regression models which is Cox regression model is used if one wishes to analyze data which also include the follow-up time whereas Logistic regression model analyzes data without follow-up-time. Also, they have measurements of association which is different: hazard ratio and odds ratio for Cox and logistic regression models respectively. A similarity between the two models is that they are both applicable in the prediction of the upshot of a categorical variable i.e. a variable that can accommodate only a restricted number of categories. In conclusion, Cox regression model differs from logistic regression by assessing a rate instead of proportion. The two models can be applied in many other researches since they are suitable methods for analyzing data but the more recommended is the Cox, regression model.

Keywords: logistic regression model, Cox regression model, survival analysis, hazard ratio

Procedia PDF Downloads 423
34401 Data Collection with Bounded-Sized Messages in Wireless Sensor Networks

Authors: Min Kyung An

Abstract:

In this paper, we study the data collection problem in Wireless Sensor Networks (WSNs) adopting the two interference models: The graph model and the more realistic physical interference model known as Signal-to-Interference-Noise-Ratio (SINR). The main issue of the problem is to compute schedules with the minimum number of timeslots, that is, to compute the minimum latency schedules, such that data from every node can be collected without any collision or interference to a sink node. While existing works studied the problem with unit-sized and unbounded-sized message models, we investigate the problem with the bounded-sized message model, and introduce a constant factor approximation algorithm. To the best known of our knowledge, our result is the first result of the data collection problem with bounded-sized model in both interference models.

Keywords: data collection, collision-free, interference-free, physical interference model, SINR, approximation, bounded-sized message model, wireless sensor networks

Procedia PDF Downloads 188
34400 Companies’ Internationalization: Multi-Criteria-Based Prioritization Using Fuzzy Logic

Authors: Jorge Anibal Restrepo Morales, Sonia Martín Gómez

Abstract:

A model based on a logical framework was developed to quantify SMEs' internationalization capacity. To do so, linguistic variables, such as human talent, infrastructure, innovation strategies, FTAs, marketing strategies, finance, etc. were integrated. It is argued that a company’s management of international markets depends on internal factors, especially capabilities and resources available. This study considers internal factors as the biggest business challenge because they force companies to develop an adequate set of capabilities. At this stage, importance and strategic relevance have to be defined in order to build competitive advantages. A fuzzy inference system is proposed to model the resources, skills, and capabilities that determine the success of internationalization. Data: 157 linguistic variables were used. These variables were defined by international trade entrepreneurs, experts, consultants, and researchers. Using expert judgment, the variables were condensed into18 factors that explain SMEs’ export capacity. The proposed model is applied by means of a case study of the textile and clothing cluster in Medellin, Colombia. In the model implementation, a general index of 28.2 was obtained for internationalization capabilities. The result confirms that the sector’s current capabilities and resources are not sufficient for a successful integration into the international market. The model specifies the factors and variables, which need to be worked on in order to improve export capability. In the case of textile companies, the lack of a continuous recording of information stands out. Likewise, there are very few studies directed towards developing long-term plans, and., there is little consistency in exports criteria. This method emerges as an innovative management tool linked to internal organizational spheres and their different abilities.

Keywords: business strategy, exports, internationalization, fuzzy set methods

Procedia PDF Downloads 269
34399 Anomaly Detection in a Data Center with a Reconstruction Method Using a Multi-Autoencoders Model

Authors: Victor Breux, Jérôme Boutet, Alain Goret, Viviane Cattin

Abstract:

Early detection of anomalies in data centers is important to reduce downtimes and the costs of periodic maintenance. However, there is little research on this topic and even fewer on the fusion of sensor data for the detection of abnormal events. The goal of this paper is to propose a method for anomaly detection in data centers by combining sensor data (temperature, humidity, power) and deep learning models. The model described in the paper uses one autoencoder per sensor to reconstruct the inputs. The auto-encoders contain Long-Short Term Memory (LSTM) layers and are trained using the normal samples of the relevant sensors selected by correlation analysis. The difference signal between the input and its reconstruction is then used to classify the samples using feature extraction and a random forest classifier. The data measured by the sensors of a data center between January 2019 and May 2020 are used to train the model, while the data between June 2020 and May 2021 are used to assess it. Performances of the model are assessed a posteriori through F1-score by comparing detected anomalies with the data center’s history. The proposed model outperforms the state-of-the-art reconstruction method, which uses only one autoencoder taking multivariate sequences and detects an anomaly with a threshold on the reconstruction error, with an F1-score of 83.60% compared to 24.16%.

Keywords: anomaly detection, autoencoder, data centers, deep learning

Procedia PDF Downloads 160
34398 Enhancement Method of Network Traffic Anomaly Detection Model Based on Adversarial Training With Category Tags

Authors: Zhang Shuqi, Liu Dan

Abstract:

For the problems in intelligent network anomaly traffic detection models, such as low detection accuracy caused by the lack of training samples, poor effect with small sample attack detection, a classification model enhancement method, F-ACGAN(Flow Auxiliary Classifier Generative Adversarial Network) which introduces generative adversarial network and adversarial training, is proposed to solve these problems. Generating adversarial data with category labels could enhance the training effect and improve classification accuracy and model robustness. FACGAN consists of three steps: feature preprocess, which includes data type conversion, dimensionality reduction and normalization, etc.; A generative adversarial network model with feature learning ability is designed, and the sample generation effect of the model is improved through adversarial iterations between generator and discriminator. The adversarial disturbance factor of the gradient direction of the classification model is added to improve the diversity and antagonism of generated data and to promote the model to learn from adversarial classification features. The experiment of constructing a classification model with the UNSW-NB15 dataset shows that with the enhancement of FACGAN on the basic model, the classification accuracy has improved by 8.09%, and the score of F1 has improved by 6.94%.

Keywords: data imbalance, GAN, ACGAN, anomaly detection, adversarial training, data augmentation

Procedia PDF Downloads 75
34397 The Role of Synthetic Data in Aerial Object Detection

Authors: Ava Dodd, Jonathan Adams

Abstract:

The purpose of this study is to explore the characteristics of developing a machine learning application using synthetic data. The study is structured to develop the application for the purpose of deploying the computer vision model. The findings discuss the realities of attempting to develop a computer vision model for practical purpose, and detail the processes, tools, and techniques that were used to meet accuracy requirements. The research reveals that synthetic data represents another variable that can be adjusted to improve the performance of a computer vision model. Further, a suite of tools and tuning recommendations are provided.

Keywords: computer vision, machine learning, synthetic data, YOLOv4

Procedia PDF Downloads 194