Search results for: panel data models
28949 Evaluation and Compression of Different Language Transformer Models for Semantic Textual Similarity Binary Task Using Minority Language Resources
Authors: Ma. Gracia Corazon Cayanan, Kai Yuen Cheong, Li Sha
Abstract:
Training a language model for a minority language has been a challenging task. The lack of available corpora to train and fine-tune state-of-the-art language models is still a challenge in the area of Natural Language Processing (NLP). Moreover, the need for high computational resources and bulk data limit the attainment of this task. In this paper, we presented the following contributions: (1) we introduce and used a translation pair set of Tagalog and English (TL-EN) in pre-training a language model to a minority language resource; (2) we fine-tuned and evaluated top-ranking and pre-trained semantic textual similarity binary task (STSB) models, to both TL-EN and STS dataset pairs. (3) then, we reduced the size of the model to offset the need for high computational resources. Based on our results, the models that were pre-trained to translation pairs and STS pairs can perform well for STSB task. Also, having it reduced to a smaller dimension has no negative effect on the performance but rather has a notable increase on the similarity scores. Moreover, models that were pre-trained to a similar dataset have a tremendous effect on the model’s performance scores.Keywords: semantic matching, semantic textual similarity binary task, low resource minority language, fine-tuning, dimension reduction, transformer models
Procedia PDF Downloads 21128948 Study and Conservation of Cultural and Natural Heritages with the Use of Laser Scanner and Processing System for 3D Modeling Spatial Data
Authors: Julia Desiree Velastegui Caceres, Luis Alejandro Velastegui Caceres, Oswaldo Padilla, Eduardo Kirby, Francisco Guerrero, Theofilos Toulkeridis
Abstract:
It is fundamental to conserve sites of natural and cultural heritage with any available technique or existing methodology of preservation in order to sustain them for the following generations. We propose a further skill to protect the actual view of such sites, in which with high technology instrumentation we are able to digitally preserve natural and cultural heritages applied in Ecuador. In this project the use of laser technology is presented for three-dimensional models, with high accuracy in a relatively short period of time. In Ecuador so far, there are not any records on the use and processing of data obtained by this new technological trend. The importance of the project is the description of the methodology of the laser scanner system using the Faro Laser Scanner Focus 3D 120, the method for 3D modeling of geospatial data and the development of virtual environments in the areas of Cultural and Natural Heritage. In order to inform users this trend in technology in which three-dimensional models are generated, the use of such tools has been developed to be able to be displayed in all kinds of digitally formats. The results of the obtained 3D models allows to demonstrate that this technology is extremely useful in these areas, but also indicating that each data campaign needs an individual slightly different proceeding starting with the data capture and processing to obtain finally the chosen virtual environments.Keywords: laser scanner system, 3D model, cultural heritage, natural heritage
Procedia PDF Downloads 30628947 Marginalized Two-Part Joint Models for Generalized Gamma Family of Distributions
Authors: Mohadeseh Shojaei Shahrokhabadi, Ding-Geng (Din) Chen
Abstract:
Positive continuous outcomes with a substantial number of zero values and incomplete longitudinal follow-up are quite common in medical cost data. To jointly model semi-continuous longitudinal cost data and survival data and to provide marginalized covariate effect estimates, a marginalized two-part joint model (MTJM) has been developed for outcome variables with lognormal distributions. In this paper, we propose MTJM models for outcome variables from a generalized gamma (GG) family of distributions. The GG distribution constitutes a general family that includes approximately all of the most frequently used distributions like the Gamma, Exponential, Weibull, and Log Normal. In the proposed MTJM-GG model, the conditional mean from a conventional two-part model with a three-parameter GG distribution is parameterized to provide the marginal interpretation for regression coefficients. In addition, MTJM-gamma and MTJM-Weibull are developed as special cases of MTJM-GG. To illustrate the applicability of the MTJM-GG, we applied the model to a set of real electronic health record data recently collected in Iran, and we provided SAS code for application. The simulation results showed that when the outcome distribution is unknown or misspecified, which is usually the case in real data sets, the MTJM-GG consistently outperforms other models. The GG family of distribution facilitates estimating a model with improved fit over the MTJM-gamma, standard Weibull, or Log-Normal distributions.Keywords: marginalized two-part model, zero-inflated, right-skewed, semi-continuous, generalized gamma
Procedia PDF Downloads 17628946 A Methodology to Integrate Data in the Company Based on the Semantic Standard in the Context of Industry 4.0
Authors: Chang Qin, Daham Mustafa, Abderrahmane Khiat, Pierre Bienert, Paulo Zanini
Abstract:
Nowadays, companies are facing lots of challenges in the process of digital transformation, which can be a complex and costly undertaking. Digital transformation involves the collection and analysis of large amounts of data, which can create challenges around data management and governance. Furthermore, it is also challenged to integrate data from multiple systems and technologies. Although with these pains, companies are still pursuing digitalization because by embracing advanced technologies, companies can improve efficiency, quality, decision-making, and customer experience while also creating different business models and revenue streams. In this paper, the issue that data is stored in data silos with different schema and structures is focused. The conventional approaches to addressing this issue involve utilizing data warehousing, data integration tools, data standardization, and business intelligence tools. However, these approaches primarily focus on the grammar and structure of the data and neglect the importance of semantic modeling and semantic standardization, which are essential for achieving data interoperability. In this session, the challenge of data silos in Industry 4.0 is addressed by developing a semantic modeling approach compliant with Asset Administration Shell (AAS) models as an efficient standard for communication in Industry 4.0. The paper highlights how our approach can facilitate the data mapping process and semantic lifting according to existing industry standards such as ECLASS and other industrial dictionaries. It also incorporates the Asset Administration Shell technology to model and map the company’s data and utilize a knowledge graph for data storage and exploration.Keywords: data interoperability in industry 4.0, digital integration, industrial dictionary, semantic modeling
Procedia PDF Downloads 9428945 On-Line Data-Driven Multivariate Statistical Prediction Approach to Production Monitoring
Authors: Hyun-Woo Cho
Abstract:
Detection of incipient abnormal events in production processes is important to improve safety and reliability of manufacturing operations and reduce losses caused by failures. The construction of calibration models for predicting faulty conditions is quite essential in making decisions on when to perform preventive maintenance. This paper presents a multivariate calibration monitoring approach based on the statistical analysis of process measurement data. The calibration model is used to predict faulty conditions from historical reference data. This approach utilizes variable selection techniques, and the predictive performance of several prediction methods are evaluated using real data. The results shows that the calibration model based on supervised probabilistic model yielded best performance in this work. By adopting a proper variable selection scheme in calibration models, the prediction performance can be improved by excluding non-informative variables from their model building steps.Keywords: calibration model, monitoring, quality improvement, feature selection
Procedia PDF Downloads 35628944 An Infinite Mixture Model for Modelling Stutter Ratio in Forensic Data Analysis
Authors: M. A. C. S. Sampath Fernando, James M. Curran, Renate Meyer
Abstract:
Forensic DNA analysis has received much attention over the last three decades, due to its incredible usefulness in human identification. The statistical interpretation of DNA evidence is recognised as one of the most mature fields in forensic science. Peak heights in an Electropherogram (EPG) are approximately proportional to the amount of template DNA in the original sample being tested. A stutter is a minor peak in an EPG, which is not masking as an allele of a potential contributor, and considered as an artefact that is presumed to be arisen due to miscopying or slippage during the PCR. Stutter peaks are mostly analysed in terms of stutter ratio that is calculated relative to the corresponding parent allele height. Analysis of mixture profiles has always been problematic in evidence interpretation, especially with the presence of PCR artefacts like stutters. Unlike binary and semi-continuous models; continuous models assign a probability (as a continuous weight) for each possible genotype combination, and significantly enhances the use of continuous peak height information resulting in more efficient reliable interpretations. Therefore, the presence of a sound methodology to distinguish between stutters and real alleles is essential for the accuracy of the interpretation. Sensibly, any such method has to be able to focus on modelling stutter peaks. Bayesian nonparametric methods provide increased flexibility in applied statistical modelling. Mixture models are frequently employed as fundamental data analysis tools in clustering and classification of data and assume unidentified heterogeneous sources for data. In model-based clustering, each unknown source is reflected by a cluster, and the clusters are modelled using parametric models. Specifying the number of components in finite mixture models, however, is practically difficult even though the calculations are relatively simple. Infinite mixture models, in contrast, do not require the user to specify the number of components. Instead, a Dirichlet process, which is an infinite-dimensional generalization of the Dirichlet distribution, is used to deal with the problem of a number of components. Chinese restaurant process (CRP), Stick-breaking process and Pólya urn scheme are frequently used as Dirichlet priors in Bayesian mixture models. In this study, we illustrate an infinite mixture of simple linear regression models for modelling stutter ratio and introduce some modifications to overcome weaknesses associated with CRP.Keywords: Chinese restaurant process, Dirichlet prior, infinite mixture model, PCR stutter
Procedia PDF Downloads 33028943 Application of Signature Verification Models for Document Recognition
Authors: Boris M. Fedorov, Liudmila P. Goncharenko, Sergey A. Sybachin, Natalia A. Mamedova, Ekaterina V. Makarenkova, Saule Rakhimova
Abstract:
In modern economic conditions, the question of the possibility of correct recognition of a signature on digital documents in order to verify the expression of will or confirm a certain operation is relevant. The additional complexity of processing lies in the dynamic variability of the signature for each individual, as well as in the way information is processed because the signature refers to biometric data. The article discusses the issues of using artificial intelligence models in order to improve the quality of signature confirmation in document recognition. The analysis of several possible options for using the model is carried out. The results of the study are given, in which it is possible to correctly determine the authenticity of the signature on small samples.Keywords: signature recognition, biometric data, artificial intelligence, neural networks
Procedia PDF Downloads 14828942 Contrasted Mean and Median Models in Egyptian Stock Markets
Authors: Mai A. Ibrahim, Mohammed El-Beltagy, Motaz Khorshid
Abstract:
Emerging Markets return distributions have shown significance departure from normality were they are characterized by fatter tails relative to the normal distribution and exhibit levels of skewness and kurtosis that constitute a significant departure from normality. Therefore, the classical Markowitz Mean-Variance is not applicable for emerging markets since it assumes normally-distributed returns (with zero skewness and kurtosis) and a quadratic utility function. Moreover, the Markowitz mean-variance analysis can be used in cases of moderate non-normality and it still provides a good approximation of the expected utility, but it may be ineffective under large departure from normality. Higher moments models and median models have been suggested in the literature for asset allocation in this case. Higher moments models have been introduced to account for the insufficiency of the description of a portfolio by only its first two moments while the median model has been introduced as a robust statistic which is less affected by outliers than the mean. Tail risk measures such as Value-at Risk (VaR) and Conditional Value-at-Risk (CVaR) have been introduced instead of Variance to capture the effect of risk. In this research, higher moment models including the Mean-Variance-Skewness (MVS) and Mean-Variance-Skewness-Kurtosis (MVSK) are formulated as single-objective non-linear programming problems (NLP) and median models including the Median-Value at Risk (MedVaR) and Median-Mean Absolute Deviation (MedMAD) are formulated as a single-objective mixed-integer linear programming (MILP) problems. The higher moment models and median models are compared to some benchmark portfolios and tested on real financial data in the Egyptian main Index EGX30. The results show that all the median models outperform the higher moment models were they provide higher final wealth for the investor over the entire period of study. In addition, the results have confirmed the inapplicability of the classical Markowitz Mean-Variance to the Egyptian stock market as it resulted in very low realized profits.Keywords: Egyptian stock exchange, emerging markets, higher moment models, median models, mixed-integer linear programming, non-linear programming
Procedia PDF Downloads 31428941 Livestock Production in Vietnam: Technical Efficiency and Productivity Performance Based on Regional Differences
Authors: Diep Thanh Tung
Abstract:
This study aims to measure technical efficiency and examine productivity performance of livestock production in regions of Vietnam based on a panel data of 2008–2012. After four years, although there are improvements in efficiency of some regions, low technical efficiency, poor performance of productivity and its compositions are dominant features in almost regions. Households which much depend on livestock income in agricultural income or agricultural income in total income are more vulnerable than the others in term of livestock production.Keywords: data envelopment analysis, meta-frontier, Malmquist, technical efficiency, livestock production
Procedia PDF Downloads 70628940 Recurrent Neural Networks for Complex Survival Models
Authors: Pius Marthin, Nihal Ata Tutkun
Abstract:
Survival analysis has become one of the paramount procedures in the modeling of time-to-event data. When we encounter complex survival problems, the traditional approach remains limited in accounting for the complex correlational structure between the covariates and the outcome due to the strong assumptions that limit the inference and prediction ability of the resulting models. Several studies exist on the deep learning approach to survival modeling; moreover, the application for the case of complex survival problems still needs to be improved. In addition, the existing models need to address the data structure's complexity fully and are subject to noise and redundant information. In this study, we design a deep learning technique (CmpXRnnSurv_AE) that obliterates the limitations imposed by traditional approaches and addresses the above issues to jointly predict the risk-specific probabilities and survival function for recurrent events with competing risks. We introduce the component termed Risks Information Weights (RIW) as an attention mechanism to compute the weighted cumulative incidence function (WCIF) and an external auto-encoder (ExternalAE) as a feature selector to extract complex characteristics among the set of covariates responsible for the cause-specific events. We train our model using synthetic and real data sets and employ the appropriate metrics for complex survival models for evaluation. As benchmarks, we selected both traditional and machine learning models and our model demonstrates better performance across all datasets.Keywords: cumulative incidence function (CIF), risk information weight (RIW), autoencoders (AE), survival analysis, recurrent events with competing risks, recurrent neural networks (RNN), long short-term memory (LSTM), self-attention, multilayers perceptrons (MLPs)
Procedia PDF Downloads 8928939 Comparative Operating Speed and Speed Differential Day and Night Time Models for Two Lane Rural Highways
Authors: Vinayak Malaghan, Digvijay Pawar
Abstract:
Speed is the independent parameter which plays a vital role in the highway design. Design consistency of the highways is checked based on the variation in the operating speed. Often the design consistency fails to meet the driver’s expectation which results in the difference between operating and design speed. Literature reviews have shown that significant crashes take place in horizontal curves due to lack of design consistency. The paper focuses on continuous speed profile study on tangent to curve transition for both day and night daytime. Data is collected using GPS device which gives continuous speed profile and other parameters such as acceleration, deceleration were analyzed along with Tangent to Curve Transition. In this present study, models were developed to predict operating speed on tangents and horizontal curves as well as model indicating the speed reduction from tangent to curve based on continuous speed profile data. It is observed from the study that vehicle tends to decelerate from approach tangent to between beginning of the curve and midpoint of the curve and then accelerates from curve to tangent transition. The models generated were compared for both day and night and can be used in the road safety improvement by evaluating the geometric design consistency.Keywords: operating speed, design consistency, continuous speed profile data, day and night time
Procedia PDF Downloads 15728938 Innovative Methods of Improving Train Formation in Freight Transport
Authors: Jaroslav Masek, Juraj Camaj, Eva Nedeliakova
Abstract:
The paper is focused on the operational model for transport the single wagon consignments on railway network by using two different models of train formation. The paper gives an overview of possibilities of improving the quality of transport services. Paper deals with two models used in problematic of train formatting - time continuously and time discrete. By applying these models in practice, the transport company can guarantee a higher quality of service and expect increasing of transport performance. The models are also applicable into others transport networks. The models supplement a theoretical problem of train formation by new ways of looking to affecting the organization of wagon flows.Keywords: train formation, wagon flows, marshalling yard, railway technology
Procedia PDF Downloads 43728937 Wind Power Forecast Error Simulation Model
Authors: Josip Vasilj, Petar Sarajcev, Damir Jakus
Abstract:
One of the major difficulties introduced with wind power penetration is the inherent uncertainty in production originating from uncertain wind conditions. This uncertainty impacts many different aspects of power system operation, especially the balancing power requirements. For this reason, in power system development planing, it is necessary to evaluate the potential uncertainty in future wind power generation. For this purpose, simulation models are required, reproducing the performance of wind power forecasts. This paper presents a wind power forecast error simulation models which are based on the stochastic process simulation. Proposed models capture the most important statistical parameters recognized in wind power forecast error time series. Furthermore, two distinct models are presented based on data availability. First model uses wind speed measurements on potential or existing wind power plant locations, while the seconds model uses statistical distribution of wind speeds.Keywords: wind power, uncertainty, stochastic process, Monte Carlo simulation
Procedia PDF Downloads 48328936 Improving Predictions of Coastal Benthic Invertebrate Occurrence and Density Using a Multi-Scalar Approach
Authors: Stephanie Watson, Fabrice Stephenson, Conrad Pilditch, Carolyn Lundquist
Abstract:
Spatial data detailing both the distribution and density of functionally important marine species are needed to inform management decisions. Species distribution models (SDMs) have proven helpful in this regard; however, models often focus only on species occurrences derived from spatially expansive datasets and lack the resolution and detail required to inform regional management decisions. Boosted regression trees (BRT) were used to produce high-resolution SDMs (250 m) at two spatial scales predicting probability of occurrence, abundance (count per sample unit), density (count per km2) and uncertainty for seven coastal seafloor taxa that vary in habitat usage and distribution to examine prediction differences and implications for coastal management. We investigated if small scale regionally focussed models (82,000 km2) can provide improved predictions compared to data-rich national scale models (4.2 million km2). We explored the variability in predictions across model type (occurrence vs abundance) and model scale to determine if specific taxa models or model types are more robust to geographical variability. National scale occurrence models correlated well with broad-scale environmental predictors, resulting in higher AUC (Area under the receiver operating curve) and deviance explained scores; however, they tended to overpredict in the coastal environment and lacked spatially differentiated detail for some taxa. Regional models had lower overall performance, but for some taxa, spatial predictions were more differentiated at a localised ecological scale. National density models were often spatially refined and highlighted areas of ecological relevance producing more useful outputs than regional-scale models. The utility of a two-scale approach aids the selection of the most optimal combination of models to create a spatially informative density model, as results contrasted for specific taxa between model type and scale. However, it is vital that robust predictions of occurrence and abundance are generated as inputs for the combined density model as areas that do not spatially align between models can be discarded. This study demonstrates the variability in SDM outputs created over different geographical scales and highlights implications and opportunities for managers utilising these tools for regional conservation, particularly in data-limited environments.Keywords: Benthic ecology, spatial modelling, multi-scalar modelling, marine conservation.
Procedia PDF Downloads 7728935 European Food Safety Authority (EFSA) Safety Assessment of Food Additives: Data and Methodology Used for the Assessment of Dietary Exposure for Different European Countries and Population Groups
Authors: Petra Gergelova, Sofia Ioannidou, Davide Arcella, Alexandra Tard, Polly E. Boon, Oliver Lindtner, Christina Tlustos, Jean-Charles Leblanc
Abstract:
Objectives: To assess chronic dietary exposure to food additives in different European countries and population groups. Method and Design: The European Food Safety Authority’s (EFSA) Panel on Food Additives and Nutrient Sources added to Food (ANS) estimates chronic dietary exposure to food additives with the purpose of re-evaluating food additives that were previously authorized in Europe. For this, EFSA uses concentration values (usage and/or analytical occurrence data) reported through regular public calls for data by food industry and European countries. These are combined, at individual level, with national food consumption data from the EFSA Comprehensive European Food Consumption Database including data from 33 dietary surveys from 19 European countries and considering six different population groups (infants, toddlers, children, adolescents, adults and the elderly). EFSA ANS Panel estimates dietary exposure for each individual in the EFSA Comprehensive Database by combining the occurrence levels per food group with their corresponding consumption amount per kg body weight. An individual average exposure per day is calculated, resulting in distributions of individual exposures per survey and population group. Based on these distributions, the average and 95th percentile of exposure is calculated per survey and per population group. Dietary exposure is assessed based on two different sets of data: (a) Maximum permitted levels (MPLs) of use set down in the EU legislation (defined as regulatory maximum level exposure assessment scenario) and (b) usage levels and/or analytical occurrence data (defined as refined exposure assessment scenario). The refined exposure assessment scenario is sub-divided into the brand-loyal consumer scenario and the non-brand-loyal consumer scenario. For the brand-loyal consumer scenario, the consumer is considered to be exposed on long-term basis to the highest reported usage/analytical level for one food group, and at the mean level for the remaining food groups. For the non-brand-loyal consumer scenario, the consumer is considered to be exposed on long-term basis to the mean reported usage/analytical level for all food groups. An additional exposure from sources other than direct addition of food additives (i.e. natural presence, contaminants, and carriers of food additives) is also estimated, as appropriate. Results: Since 2014, this methodology has been applied in about 30 food additive exposure assessments conducted as part of scientific opinions of the EFSA ANS Panel. For example, under the non-brand-loyal scenario, the highest 95th percentile of exposure to α-tocopherol (E 307) and ammonium phosphatides (E 442) was estimated in toddlers up to 5.9 and 8.7 mg/kg body weight/day, respectively. The same estimates under the brand-loyal scenario in toddlers resulted in exposures of 8.1 and 20.7 mg/kg body weight/day, respectively. For the regulatory maximum level exposure assessment scenario, the highest 95th percentile of exposure to α-tocopherol (E 307) and ammonium phosphatides (E 442) was estimated in toddlers up to 11.9 and 30.3 mg/kg body weight/day, respectively. Conclusions: Detailed and up-to-date information on food additive concentration values (usage and/or analytical occurrence data) and food consumption data enable the assessment of chronic dietary exposure to food additives to more realistic levels.Keywords: α-tocopherol, ammonium phosphatides, dietary exposure assessment, European Food Safety Authority, food additives, food consumption data
Procedia PDF Downloads 32528934 Microclimate Impacts on Solar Panel Power Generation in Midlands Area, UK
Authors: Stamatis Zoras, Boris Ceranic, Ashley Redfern
Abstract:
Green House Gas emissions from domestic properties currently account for a substantial part of the total UK’s carbon emissions and is a priority area for UK to reach zero carbon emissions. However, GHG emissions of urban complexes depend on building, road, structural developments etc surfaces that form urban microclimate. This in turn may further influence renewable energy system power generation that depend on solar or wind potential. Moreover, urban climatic conditions are also influenced by the installation of those power generation systems that may impact their own power generation efficiency. Increased air temperature is attributed to densely installed roof based solar panels that consequently impact their own production efficiency. Installation of roof based solar panels requires adequate guidance to enable housing businesses, councils and organisations to implement sufficient measures for improved power generation in relation to local urban microclimate. How microclimate is affected and how, in return, it affects solar power productivity. Derby Council & Derby Homes have been collecting solar panel power generation data for a large number of properties. The different building areas and system operation performance will be studied against microclimate conditions through time. It is envisaged that the outcomes of the study will support a working up strategy for Derby city to ensure that owned homes would be able to access information and data of solar photo voltaic PV and solar thermal panels potential on social housing, helping residents on low incomes create their own green energy to power their homes and heat their homeshot water.Keywords: microclimate, solar power, urban climatology, urban morphology
Procedia PDF Downloads 6928933 Production Optimization under Geological Uncertainty Using Distance-Based Clustering
Authors: Byeongcheol Kang, Junyi Kim, Hyungsik Jung, Hyungjun Yang, Jaewoo An, Jonggeun Choe
Abstract:
It is important to figure out reservoir properties for better production management. Due to the limited information, there are geological uncertainties on very heterogeneous or channel reservoir. One of the solutions is to generate multiple equi-probable realizations using geostatistical methods. However, some models have wrong properties, which need to be excluded for simulation efficiency and reliability. We propose a novel method of model selection scheme, based on distance-based clustering for reliable application of production optimization algorithm. Distance is defined as a degree of dissimilarity between the data. We calculate Hausdorff distance to classify the models based on their similarity. Hausdorff distance is useful for shape matching of the reservoir models. We use multi-dimensional scaling (MDS) to describe the models on two dimensional space and group them by K-means clustering. Rather than simulating all models, we choose one representative model from each cluster and find out the best model, which has the similar production rates with the true values. From the process, we can select good reservoir models near the best model with high confidence. We make 100 channel reservoir models using single normal equation simulation (SNESIM). Since oil and gas prefer to flow through the sand facies, it is critical to characterize pattern and connectivity of the channels in the reservoir. After calculating Hausdorff distances and projecting the models by MDS, we can see that the models assemble depending on their channel patterns. These channel distributions affect operation controls of each production well so that the model selection scheme improves management optimization process. We use one of useful global search algorithms, particle swarm optimization (PSO), for our production optimization. PSO is good to find global optimum of objective function, but it takes too much time due to its usage of many particles and iterations. In addition, if we use multiple reservoir models, the simulation time for PSO will be soared. By using the proposed method, we can select good and reliable models that already matches production data. Considering geological uncertainty of the reservoir, we can get well-optimized production controls for maximum net present value. The proposed method shows one of novel solutions to select good cases among the various probabilities. The model selection schemes can be applied to not only production optimization but also history matching or other ensemble-based methods for efficient simulations.Keywords: distance-based clustering, geological uncertainty, particle swarm optimization (PSO), production optimization
Procedia PDF Downloads 14328932 Prediction of Gully Erosion with Stochastic Modeling by using Geographic Information System and Remote Sensing Data in North of Iran
Authors: Reza Zakerinejad
Abstract:
Gully erosion is a serious problem that threading the sustainability of agricultural area and rangeland and water in a large part of Iran. This type of water erosion is the main source of sedimentation in many catchment areas in the north of Iran. Since in many national assessment approaches just qualitative models were applied the aim of this study is to predict the spatial distribution of gully erosion processes by means of detail terrain analysis and GIS -based logistic regression in the loess deposition in a case study in the Golestan Province. This study the DEM with 25 meter result ion from ASTER data has been used. The Landsat ETM data have been used to mapping of land use. The TreeNet model as a stochastic modeling was applied to prediction the susceptible area for gully erosion. In this model ROC we have set 20 % of data as learning and 20 % as learning data. Therefore, applying the GIS and satellite image analysis techniques has been used to derive the input information for these stochastic models. The result of this study showed a high accurate map of potential for gully erosion.Keywords: TreeNet model, terrain analysis, Golestan Province, Iran
Procedia PDF Downloads 53528931 Damage of Laminated Corrugated Sandwich Panels under Inclined Impact Loading
Authors: Muhammad Kamran, Xue Pu, Naveed Ahmed
Abstract:
Sandwich foam structures are efficient in impact energy absorption and making components lightweight; however their efficient use require a detailed understanding of its mechanical response. In this study, the foam core, laminated facings’ sandwich panel with internal triangular rib configuration is impacted by a spherical steel projectile at different angles using ABAQUS finite element package and damage mechanics is studied. Laminated ribs’ structure is sub-divided into three formations; all zeros, all 45 and optimized combination of zeros and 45 degrees. Impact velocity is varied from 250 m/s to 500 m/s with an increment of 50 m/s. The impact damage can significantly demolish the structural integrity and energy absorption due to fiber breakage, matrix cracking, and de-bonding. Macroscopic fracture study of the panel and core along with load-displacement responses and failure modes are the key parameters in the design of smart ballistic resistant structures. Ballistic impact characteristics of panels are studied on different speed, different inclination angles and its dependency on the base, and core materials, ribs formation, and cross-sectional spaces among them are determined. Impact momentum, penetration and kinetic energy absorption data and curves are compiled to predict the first and proximity impact in an effort to enhance the dynamic energy absorption.Keywords: dynamic energy absorption, proximity impact, sandwich panels, impact momentum
Procedia PDF Downloads 38828930 Non-Linear Regression Modeling for Composite Distributions
Authors: Mostafa Aminzadeh, Min Deng
Abstract:
Modeling loss data is an important part of actuarial science. Actuaries use models to predict future losses and manage financial risk, which can be beneficial for marketing purposes. In the insurance industry, small claims happen frequently while large claims are rare. Traditional distributions such as Normal, Exponential, and inverse-Gaussian are not suitable for describing insurance data, which often show skewness and fat tails. Several authors have studied classical and Bayesian inference for parameters of composite distributions, such as Exponential-Pareto, Weibull-Pareto, and Inverse Gamma-Pareto. These models separate small to moderate losses from large losses using a threshold parameter. This research introduces a computational approach using a nonlinear regression model for loss data that relies on multiple predictors. Simulation studies were conducted to assess the accuracy of the proposed estimation method. The simulations confirmed that the proposed method provides precise estimates for regression parameters. It's important to note that this approach can be applied to datasets if goodness-of-fit tests confirm that the composite distribution under study fits the data well. To demonstrate the computations, a real data set from the insurance industry is analyzed. A Mathematica code uses the Fisher information algorithm as an iteration method to obtain the maximum likelihood estimation (MLE) of regression parameters.Keywords: maximum likelihood estimation, fisher scoring method, non-linear regression models, composite distributions
Procedia PDF Downloads 3328929 SOM Map vs Hopfield Neural Network: A Comparative Study in Microscopic Evacuation Application
Authors: Zouhour Neji Ben Salem
Abstract:
Microscopic evacuation focuses on the evacuee behavior and way of search of safety place in an egress situation. In recent years, several models handled microscopic evacuation problem. Among them, we have proposed Artificial Neural Network (ANN) as an alternative to mathematical models that can deal with such problem. In this paper, we present two ANN models: SOM map and Hopfield Network used to predict the evacuee behavior in a disaster situation. These models are tested in a real case, the second floor of Tunisian children hospital evacuation in case of fire. The two models are studied and compared in order to evaluate their performance.Keywords: artificial neural networks, self-organization map, hopfield network, microscopic evacuation, fire building evacuation
Procedia PDF Downloads 40428928 Modeling Of The Random Impingement Erosion Due To The Impact Of The Solid Particles
Authors: Siamack A. Shirazi, Farzin Darihaki
Abstract:
Solid particles could be found in many multiphase flows, including transport pipelines and pipe fittings. Such particles interact with the pipe material and cause erosion which threats the integrity of the system. Therefore, predicting the erosion rate is an important factor in the design and the monitor of such systems. Mechanistic models can provide reliable predictions for many conditions while demanding only relatively low computational cost. Mechanistic models utilize a representative particle trajectory to predict the impact characteristics of the majority of the particle impacts that cause maximum erosion rate in the domain. The erosion caused by particle impacts is not only due to the direct impacts but also random impingements. In the present study, an alternative model has been introduced to describe the erosion due to random impingement of particles. The present model provides a realistic trend for erosion with changes in the particle size and particle Stokes number. The present model is examined against the experimental data and CFD simulation results and indicates better agreement with the data incomparison to the available models in the literature.Keywords: erosion, mechanistic modeling, particles, multiphase flow, gas-liquid-solid
Procedia PDF Downloads 16828927 Possibility of Making Ceramic Models from Condemned Plaster of Paris (Pop) Moulds for Ceramics Production in Edo State Nigeria
Authors: Osariyekemwen, Daniel Nosakhare
Abstract:
Some ceramic wastes, such as discarded (condemn) Plaster of Paris (POP) in Auchi Polytechnic, Edo State, constitute environmental hazards. This study, therefore, bridges the forgoing gaps by undertaking the use of these discarded (POP) moulds to produced ceramic models for making casting moulds for mass production. This is in line with the possibility of using this medium to properly manage the discarded (condemn) Plaster of Paris (POP) that littered our immediate environment. Presently these are major wastes disposal in the department. Hence, the study has been made to fabricate sanitary miniature models and contract fuse models, respectively. Findings arising from this study show that discarded (condemn) Plaster of Paris (POP) can be carved when to set it neither shrink nor expand; hence warping is quite unusual. Above all, it also gives good finishing with little deterioration with time when compared to clay models.Keywords: plaster of Paris, condemn, moulds, models, production
Procedia PDF Downloads 18928926 Short Review on Models to Estimate the Risk in the Financial Area
Authors: Tiberiu Socaciu, Tudor Colomeischi, Eugenia Iancu
Abstract:
Business failure affects in various proportions shareholders, managers, lenders (banks), suppliers, customers, the financial community, government and society as a whole. In the era in which we have telecommunications networks, exists an interdependence of markets, the effect of a failure of a company is relatively instant. To effectively manage risk exposure is thus require sophisticated support systems, supported by analytical tools to measure, monitor, manage and control operational risks that may arise. As we know, bankruptcy is a phenomenon that managers do not want no matter what stage of life is the company they direct / lead. In the analysis made by us, by the nature of economic models that are reviewed (Altman, Conan-Holder etc.), estimating the risk of bankruptcy of a company corresponds to some extent with its own business cycle tracing of the company. Various models for predicting bankruptcy take into account direct / indirect aspects such as market position, company growth trend, competition structure, characteristics and customer retention, organization and distribution, location etc. From the perspective of our research we will now review the economic models known in theory and practice for estimating the risk of bankruptcy; such models are based on indicators drawn from major accounting firms.Keywords: Anglo-Saxon models, continental models, national models, statistical models
Procedia PDF Downloads 40528925 On the Existence of Homotopic Mapping Between Knowledge Graphs and Graph Embeddings
Authors: Jude K. Safo
Abstract:
Knowledge Graphs KG) and their relation to Graph Embeddings (GE) represent a unique data structure in the landscape of machine learning (relative to image, text and acoustic data). Unlike the latter, GEs are the only data structure sufficient for representing hierarchically dense, semantic information needed for use-cases like supply chain data and protein folding where the search space exceeds the limits traditional search methods (e.g. page-rank, Dijkstra, etc.). While GEs are effective for compressing low rank tensor data, at scale, they begin to introduce a new problem of ’data retreival’ which we observe in Large Language Models. Notable attempts by transE, TransR and other prominent industry standards have shown a peak performance just north of 57% on WN18 and FB15K benchmarks, insufficient practical industry applications. They’re also limited, in scope, to next node/link predictions. Traditional linear methods like Tucker, CP, PARAFAC and CANDECOMP quickly hit memory limits on tensors exceeding 6.4 million nodes. This paper outlines a topological framework for linear mapping between concepts in KG space and GE space that preserve cardinality. Most importantly we introduce a traceable framework for composing dense linguistic strcutures. We demonstrate performance on WN18 benchmark this model hits. This model does not rely on Large Langauge Models (LLM) though the applications are certainy relevant here as well.Keywords: representation theory, large language models, graph embeddings, applied algebraic topology, applied knot theory, combinatorics
Procedia PDF Downloads 6828924 Seafloor and Sea Surface Modelling in the East Coast Region of North America
Authors: Magdalena Idzikowska, Katarzyna Pająk, Kamil Kowalczyk
Abstract:
Seafloor topography is a fundamental issue in geological, geophysical, and oceanographic studies. Single-beam or multibeam sonars attached to the hulls of ships are used to emit a hydroacoustic signal from transducers and reproduce the topography of the seabed. This solution provides relevant accuracy and spatial resolution. Bathymetric data from ships surveys provides National Centers for Environmental Information – National Oceanic and Atmospheric Administration. Unfortunately, most of the seabed is still unidentified, as there are still many gaps to be explored between ship survey tracks. Moreover, such measurements are very expensive and time-consuming. The solution is raster bathymetric models shared by The General Bathymetric Chart of the Oceans. The offered products are a compilation of different sets of data - raw or processed. Indirect data for the development of bathymetric models are also measurements of gravity anomalies. Some forms of seafloor relief (e.g. seamounts) increase the force of the Earth's pull, leading to changes in the sea surface. Based on satellite altimetry data, Sea Surface Height and marine gravity anomalies can be estimated, and based on the anomalies, it’s possible to infer the structure of the seabed. The main goal of the work is to create regional bathymetric models and models of the sea surface in the area of the east coast of North America – a region of seamounts and undulating seafloor. The research includes an analysis of the methods and techniques used, an evaluation of the interpolation algorithms used, model thickening, and the creation of grid models. Obtained data are raster bathymetric models in NetCDF format, survey data from multibeam soundings in MB-System format, and satellite altimetry data from Copernicus Marine Environment Monitoring Service. The methodology includes data extraction, processing, mapping, and spatial analysis. Visualization of the obtained results was carried out with Geographic Information System tools. The result is an extension of the state of the knowledge of the quality and usefulness of the data used for seabed and sea surface modeling and knowledge of the accuracy of the generated models. Sea level is averaged over time and space (excluding waves, tides, etc.). Its changes, along with knowledge of the topography of the ocean floor - inform us indirectly about the volume of the entire water ocean. The true shape of the ocean surface is further varied by such phenomena as tides, differences in atmospheric pressure, wind systems, thermal expansion of water, or phases of ocean circulation. Depending on the location of the point, the higher the depth, the lower the trend of sea level change. Studies show that combining data sets, from different sources, with different accuracies can affect the quality of sea surface and seafloor topography models.Keywords: seafloor, sea surface height, bathymetry, satellite altimetry
Procedia PDF Downloads 7928923 Dynamic Modeling of Advanced Wastewater Treatment Plants Using BioWin
Authors: Komal Rathore, Aydin Sunol, Gita Iranipour, Luke Mulford
Abstract:
Advanced wastewater treatment plants have complex biological kinetics, time variant influent flow rates and long processing times. Due to these factors, the modeling and operational control of advanced wastewater treatment plants become complicated. However, development of a robust model for advanced wastewater treatment plants has become necessary in order to increase the efficiency of the plants, reduce energy costs and meet the discharge limits set by the government. A dynamic model was designed using the Envirosim (Canada) platform software called BioWin for several wastewater treatment plants in Hillsborough County, Florida. Proper control strategies for various parameters such as mixed liquor suspended solids, recycle activated sludge and waste activated sludge were developed for models to match the plant performance. The models were tuned using both the influent and effluent data from the plant and their laboratories. The plant SCADA was used to predict the influent wastewater rates and concentration profiles as a function of time. The kinetic parameters were tuned based on sensitivity analysis and trial and error methods. The dynamic models were validated by using experimental data for influent and effluent parameters. The dissolved oxygen measurements were taken to validate the model by coupling them with Computational Fluid Dynamics (CFD) models. The Biowin models were able to exactly mimic the plant performance and predict effluent behavior for extended periods. The models are useful for plant engineers and operators as they can take decisions beforehand by predicting the plant performance with the use of BioWin models. One of the important findings from the model was the effects of recycle and wastage ratios on the mixed liquor suspended solids. The model was also useful in determining the significant kinetic parameters for biological wastewater treatment systems.Keywords: BioWin, kinetic modeling, flowsheet simulation, dynamic modeling
Procedia PDF Downloads 15428922 Drying Kinects of Soybean Seeds
Authors: Amanda Rithieli Pereira Dos Santos, Rute Quelvia De Faria, Álvaro De Oliveira Cardoso, Anderson Rodrigo Da Silva, Érica Leão Fernandes Araújo
Abstract:
The study of the kinetics of drying has great importance for the mathematical modeling, allowing to know about the processes of transference of heat and mass between the products and to adjust dryers managing new technologies for these processes. The present work had the objective of studying the kinetics of drying of soybean seeds and adjusting different statistical models to the experimental data varying cultivar and temperature. Soybean seeds were pre-dried in a natural environment in order to reduce and homogenize the water content to the level of 14% (b.s.). Then, drying was carried out in a forced air circulation oven at controlled temperatures of 38, 43, 48, 53 and 58 ± 1 ° C, using two soybean cultivars, BRS 8780 and Sambaíba, until reaching a hygroscopic equilibrium. The experimental design was completely randomized in factorial 5 x 2 (temperature x cultivar) with 3 replicates. To the experimental data were adjusted eleven statistical models used to explain the drying process of agricultural products. Regression analysis was performed using the least squares Gauss-Newton algorithm to estimate the parameters. The degree of adjustment was evaluated from the analysis of the coefficient of determination (R²), the adjusted coefficient of determination (R² Aj.) And the standard error (S.E). The models that best represent the drying kinetics of soybean seeds are those of Midilli and Logarítmico.Keywords: curve of drying seeds, Glycine max L., moisture ratio, statistical models
Procedia PDF Downloads 62728921 Comparison of Methods of Estimation for Use in Goodness of Fit Tests for Binary Multilevel Models
Authors: I. V. Pinto, M. R. Sooriyarachchi
Abstract:
It can be frequently observed that the data arising in our environment have a hierarchical or a nested structure attached with the data. Multilevel modelling is a modern approach to handle this kind of data. When multilevel modelling is combined with a binary response, the estimation methods get complex in nature and the usual techniques are derived from quasi-likelihood method. The estimation methods which are compared in this study are, marginal quasi-likelihood (order 1 & order 2) (MQL1, MQL2) and penalized quasi-likelihood (order 1 & order 2) (PQL1, PQL2). A statistical model is of no use if it does not reflect the given dataset. Therefore, checking the adequacy of the fitted model through a goodness-of-fit (GOF) test is an essential stage in any modelling procedure. However, prior to usage, it is also equally important to confirm that the GOF test performs well and is suitable for the given model. This study assesses the suitability of the GOF test developed for binary response multilevel models with respect to the method used in model estimation. An extensive set of simulations was conducted using MLwiN (v 2.19) with varying number of clusters, cluster sizes and intra cluster correlations. The test maintained the desirable Type-I error for models estimated using PQL2 and it failed for almost all the combinations of MQL. Power of the test was adequate for most of the combinations in all estimation methods except MQL1. Moreover, models were fitted using the four methods to a real-life dataset and performance of the test was compared for each model.Keywords: goodness-of-fit test, marginal quasi-likelihood, multilevel modelling, penalized quasi-likelihood, power, quasi-likelihood, type-I error
Procedia PDF Downloads 14228920 The Effect of Political Characteristics on the Budget Balance of Local Governments: A Dynamic System Generalized Method of Moments Data Approach
Authors: Stefanie M. Vanneste, Stijn Goeminne
Abstract:
This paper studies the effect of political characteristics of 308 Flemish municipalities on their budget balance in the period 1995-2011. All local governments experience the same economic and financial setting, however some governments have high budget balances, while others have low budget balances. The aim of this paper is to explain the differences in municipal budget balances by a number of economic, socio-demographic and political variables. The economic and socio-demographic variables will be used as control variables, while the focus of this paper will be on the political variables. We test four hypotheses resulting from the literature, namely (i) the partisan hypothesis tests if left wing governments have lower budget balances, (ii) the fragmentation hypothesis stating that more fragmented governments have lower budget balances, (iii) the hypothesis regarding the power of the government, higher powered governments would resolve in higher budget balances, and (iv) the opportunistic budget cycle to test whether politicians manipulate the economic situation before elections in order to maximize their reelection possibilities and therefore have lower budget balances before elections. The contributions of our paper to the existing literature are multiple. First, we use the whole array of political variables and not just a selection of them. Second, we are dealing with a homogeneous database with the same budget and election rules, making it easier to focus on the political factors without having to control for the impact of differences in the political systems. Third, our research extends the existing literature on Flemish municipalities as this is the first dynamic research on local budget balances. We use a dynamic panel data model. Because of the two lagged dependent variables as explanatory variables, we employ the system GMM (Generalized Method of Moments) estimator. This is the best possible estimator as we are dealing with political panel data that is rather persistent. Our empirical results show that the effect of the ideological position and the power of the coalition are of less importance to explain the budget balance. The political fragmentation of the government on the other hand has a negative and significant effect on the budget balance. The more parties in a coalition the worse the budget balance is ceteris paribus. Our results also provide evidence of an opportunistic budget cycle, the budget balances are lower in pre-election years relative to the other years to try and increase the incumbents reelection possibilities. An additional finding is that the incremental effect of the budget balance is very important and should not be ignored like is being done in a lot of empirical research. The coefficients of the lagged dependent variables are always positive and very significant. This proves that the budget balance is subject to incrementalism. It is not possible to change the entire policy from one year to another so the actions taken in recent past years still have an impact on the current budget balance. Only a relatively small amount of research concerning the budget balance takes this considerable incremental effect into account. Our findings survive several robustness checks.Keywords: budget balance, fragmentation, ideology, incrementalism, municipalities, opportunistic budget cycle, panel data, political characteristics, power, system GMM
Procedia PDF Downloads 299