Search results for: rounding error
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1900

Search results for: rounding error

1420 Students' Errors in Translating Algebra Word Problems to Mathematical Structure

Authors: Ledeza Jordan Babiano

Abstract:

Translating statements into mathematical notations is one of the processes in word problem-solving. However, based on the literature, students still have difficulties with this skill. The purpose of this study was to investigate the translation errors of the students when they translate algebraic word problems into mathematical structures and locate the errors via the lens of the Translation-Verification Model. Moreover, this qualitative research study employed content analysis. During the data-gathering process, the students were asked to answer a six-item algebra word problem questionnaire, and their answers were analyzed by experts through blind coding using the Translation-Verification Model to determine their translation errors. After this, a focus group discussion was conducted, and the data gathered was analyzed through thematic analysis to determine the causes of the students’ translation errors. It was found out that students’ prevalent error in translation was the interpretation error, which was situated in the Attribute construct. The emerging themes during the FGD were: (1) The procedure of translation is strategically incorrect; (2) Lack of comprehension; (3) Algebra concepts related to difficulty; (4) Lack of spatial skills; (5) Unprepared for independent learning; and (6) The content of the problem is developmentally inappropriate. These themes boiled down to the major concept of independent learning preparedness in solving mathematical problems. This concept has subcomponents, which include contextual and conceptual factors in translation. Consequently, the results provided implications for instructors and professors in Mathematics to innovate their teaching pedagogies and strategies to address translation gaps among students.

Keywords: mathematical structure, algebra word problems, translation, errors

Procedia PDF Downloads 49
1419 Comparison of Methods of Estimation for Use in Goodness of Fit Tests for Binary Multilevel Models

Authors: I. V. Pinto, M. R. Sooriyarachchi

Abstract:

It can be frequently observed that the data arising in our environment have a hierarchical or a nested structure attached with the data. Multilevel modelling is a modern approach to handle this kind of data. When multilevel modelling is combined with a binary response, the estimation methods get complex in nature and the usual techniques are derived from quasi-likelihood method. The estimation methods which are compared in this study are, marginal quasi-likelihood (order 1 & order 2) (MQL1, MQL2) and penalized quasi-likelihood (order 1 & order 2) (PQL1, PQL2). A statistical model is of no use if it does not reflect the given dataset. Therefore, checking the adequacy of the fitted model through a goodness-of-fit (GOF) test is an essential stage in any modelling procedure. However, prior to usage, it is also equally important to confirm that the GOF test performs well and is suitable for the given model. This study assesses the suitability of the GOF test developed for binary response multilevel models with respect to the method used in model estimation. An extensive set of simulations was conducted using MLwiN (v 2.19) with varying number of clusters, cluster sizes and intra cluster correlations. The test maintained the desirable Type-I error for models estimated using PQL2 and it failed for almost all the combinations of MQL. Power of the test was adequate for most of the combinations in all estimation methods except MQL1. Moreover, models were fitted using the four methods to a real-life dataset and performance of the test was compared for each model.

Keywords: goodness-of-fit test, marginal quasi-likelihood, multilevel modelling, penalized quasi-likelihood, power, quasi-likelihood, type-I error

Procedia PDF Downloads 142
1418 Multi-Point Dieless Forming Product Defect Reduction Using Reliability-Based Robust Process Optimization

Authors: Misganaw Abebe Baye, Ji-Woo Park, Beom-Soo Kang

Abstract:

The product quality of multi-point dieless forming (MDF) is identified to be dependent on the process parameters. Moreover, a certain variation of friction and material properties may have a substantially worse influence on the final product quality. This study proposed on how to compensate the MDF product defects by minimizing the sensitivity of noise parameter variations. This can be attained by reliability-based robust optimization (RRO) technique to obtain the optimal process setting of the controllable parameters. Initially two MDF Finite Element (FE) simulations of AA3003-H14 saddle shape showed a substantial amount of dimpling, wrinkling, and shape error. FE analyses are consequently applied on ABAQUS commercial software to obtain the correlation between the control process setting and noise variation with regard to the product defects. The best prediction models are chosen from the family of metamodels to swap the computational expensive FE simulation. Genetic algorithm (GA) is applied to determine the optimal process settings of the control parameters. Monte Carlo Analysis (MCA) is executed to determine how the noise parameter variation affects the final product quality. Finally, the RRO FE simulation and the experimental result show that the amendment of the control parameters in the final forming process leads to a considerably better-quality product.

Keywords: dimpling, multi-point dieless forming, reliability-based robust optimization, shape error, variation, wrinkling

Procedia PDF Downloads 254
1417 Government Final Consumption Expenditure and Household Consumption Expenditure NPISHS in Nigeria

Authors: Usman A. Usman

Abstract:

Undeniably, unlike the Classical side, the Keynesian perspective of the aggregate demand side indeed has a significant position in the policy, growth, and welfare of Nigeria due to government involvement and ineffective demand of the population living with poor per capita income. This study seeks to investigate the effect of Government Final Consumption Expenditure, Financial Deepening on Households, and NPISHs Final consumption expenditure using data on Nigeria from 1981 to 2019. This study employed the ADF stationarity test, Johansen Cointegration test, and Vector Error Correction Model. The results of the study revealed that the coefficient of Government final consumption expenditure has a positive effect on household consumption expenditure in the long run. There is a long-run and short-run relationship between gross fixed capital formation and household consumption expenditure. The coefficients cpsgdp (financial deepening and gross fixed capital formation posit a negative impact on household final consumption expenditure. The coefficients money supply lm2gdp, which is another proxy for financial deepening, and the coefficient FDI have a positive effect on household final consumption expenditure in the long run. Therefore, this study recommends that Gross fixed capital formation stimulates household consumption expenditure; a legal framework to support investment is a panacea to increasing hoodmold income and consumption and reducing poverty in Nigeria. Therefore, this should be a key central component of policy.

Keywords: government final consumption expenditure, household consumption expenditure, vector error correction model, cointegration

Procedia PDF Downloads 52
1416 Approximation of Geodesics on Meshes with Implementation in Rhinoceros Software

Authors: Marian Sagat, Mariana Remesikova

Abstract:

In civil engineering, there is a problem how to industrially produce tensile membrane structures that are non-developable surfaces. Nondevelopable surfaces can only be developed with a certain error and we want to minimize this error. To that goal, the non-developable surfaces are cut into plates along to the geodesic curves. We propose a numerical algorithm for finding approximations of open geodesics on meshes and surfaces based on geodesic curvature flow. For practical reasons, it is important to automatize the choice of the time step. We propose a method for automatic setting of the time step based on the diagonal dominance criterion for the matrix of the linear system obtained by discretization of our partial differential equation model. Practical experiments show reliability of this method. Because approximation of the model is made by numerical method based on classic derivatives, it is necessary to solve obstacles which occur for meshes with sharp corners. We solve this problem for big family of meshes with sharp corners via special rotations which can be seen as partial unfolding of the mesh. In practical applications, it is required that the approximation of geodesic has its vertices only on the edges of the mesh. This problem is solved by a specially designed pointing tracking algorithm. We also partially solve the problem of finding geodesics on meshes with holes. We implemented the whole algorithm in Rhinoceros (commercial 3D computer graphics and computer-aided design software ). It is done by using C# language as C# assembly library for Grasshopper, which is plugin in Rhinoceros.

Keywords: geodesic, geodesic curvature flow, mesh, Rhinoceros software

Procedia PDF Downloads 150
1415 Simulation of Optimal Runoff Hydrograph Using Ensemble of Radar Rainfall and Blending of Runoffs Model

Authors: Myungjin Lee, Daegun Han, Jongsung Kim, Soojun Kim, Hung Soo Kim

Abstract:

Recently, the localized heavy rainfall and typhoons are frequently occurred due to the climate change and the damage is becoming bigger. Therefore, we may need a more accurate prediction of the rainfall and runoff. However, the gauge rainfall has the limited accuracy in space. Radar rainfall is better than gauge rainfall for the explanation of the spatial variability of rainfall but it is mostly underestimated with the uncertainty involved. Therefore, the ensemble of radar rainfall was simulated using error structure to overcome the uncertainty and gauge rainfall. The simulated ensemble was used as the input data of the rainfall-runoff models for obtaining the ensemble of runoff hydrographs. The previous studies discussed about the accuracy of the rainfall-runoff model. Even if the same input data such as rainfall is used for the runoff analysis using the models in the same basin, the models can have different results because of the uncertainty involved in the models. Therefore, we used two models of the SSARR model which is the lumped model, and the Vflo model which is a distributed model and tried to simulate the optimum runoff considering the uncertainty of each rainfall-runoff model. The study basin is located in Han river basin and we obtained one integrated runoff hydrograph which is an optimum runoff hydrograph using the blending methods such as Multi-Model Super Ensemble (MMSE), Simple Model Average (SMA), Mean Square Error (MSE). From this study, we could confirm the accuracy of rainfall and rainfall-runoff model using ensemble scenario and various rainfall-runoff model and we can use this result to study flood control measure due to climate change. Acknowledgements: This work is supported by the Korea Agency for Infrastructure Technology Advancement(KAIA) grant funded by the Ministry of Land, Infrastructure and Transport (Grant 18AWMP-B083066-05).

Keywords: radar rainfall ensemble, rainfall-runoff models, blending method, optimum runoff hydrograph

Procedia PDF Downloads 280
1414 An Improved Robust Algorithm Based on Cubature Kalman Filter for Single-Frequency Global Navigation Satellite System/Inertial Navigation Tightly Coupled System

Authors: Hao Wang, Shuguo Pan

Abstract:

The Global Navigation Satellite System (GNSS) signal received by the dynamic vehicle in the harsh environment will be frequently interfered with and blocked, which generates gross error affecting the positioning accuracy of the GNSS/Inertial Navigation System (INS) integrated navigation. Therefore, this paper put forward an improved robust Cubature Kalman filter (CKF) algorithm for single-frequency GNSS/INS tightly coupled system ambiguity resolution. Firstly, the dynamic model and measurement model of a single-frequency GNSS/INS tightly coupled system was established, and the method for GNSS integer ambiguity resolution with INS aided is studied. Then, we analyzed the influence of pseudo-range observation with gross error on GNSS/INS integrated positioning accuracy. To reduce the influence of outliers, this paper improved the CKF algorithm and realized an intelligent selection of robust strategies by judging the ill-conditioned matrix. Finally, a field navigation test was performed to demonstrate the effectiveness of the proposed algorithm based on the double-differenced solution mode. The experiment has proved the improved robust algorithm can greatly weaken the influence of separate, continuous, and hybrid observation anomalies for enhancing the reliability and accuracy of GNSS/INS tightly coupled navigation solutions.

Keywords: GNSS/INS integrated navigation, ambiguity resolution, Cubature Kalman filter, Robust algorithm

Procedia PDF Downloads 98
1413 Reasons for the Selection of Information-Processing Framework and the Philosophy of Mind as a General Account for an Error Analysis and Explanation on Mathematics

Authors: Michael Lousis

Abstract:

This research study is concerned with learner’s errors on Arithmetic and Algebra. The data resulted from a broader international comparative research program called Kassel Project. However, its conceptualisation differed from and contrasted with that of the main program, which was mostly based on socio-demographic data. The way in which the research study was conducted, was not dependent on the researcher’s discretion, but was absolutely dictated by the nature of the problem under investigation. This is because the phenomenon of learners’ mathematical errors is due neither to the intentions of learners nor to institutional processes, rules and norms, nor to the educators’ intentions and goals; but rather to the way certain information is presented to learners and how their cognitive apparatus processes this information. Several approaches for the study of learners’ errors have been developed from the beginning of the 20th century, encompassing different belief systems. These approaches were based on the behaviourist theory, on the Piagetian- constructivist research framework, the perspective that followed the philosophy of science and the information-processing paradigm. The researcher of the present study was forced to disclose the learners’ course of thinking that led them in specific observable actions with the result of showing particular errors in specific problems, rather than analysing scripts with the students’ thoughts presented in a written form. This, in turn, entailed that the choice of methods would have to be appropriate and conducive to seeing and realising the learners’ errors from the perspective of the participants in the investigation. This particular fact determined important decisions to be made concerning the selection of an appropriate framework for analysing the mathematical errors and giving explanations. Thus the rejection of the belief systems concerning behaviourism, the Piagetian-constructivist, and philosophy of science perspectives took place, and the information-processing paradigm in conjunction with the philosophy of mind were adopted as a general account for the elaboration of data. This paper explains why these decisions were appropriate and beneficial for conducting the present study and for the establishment of the ensued thesis. Additionally, the reasons for the adoption of the information-processing paradigm in conjunction with the philosophy of mind give sound and legitimate bases for the development of future studies concerning mathematical error analysis are explained.

Keywords: advantages-disadvantages of theoretical prospects, behavioral prospect, critical evaluation of theoretical prospects, error analysis, information-processing paradigm, opting for the appropriate approach, philosophy of science prospect, Piagetian-constructivist research frameworks, review of research in mathematical errors

Procedia PDF Downloads 190
1412 Government Final Consumption Expenditure Financial Deepening and Household Consumption Expenditure NPISHs in Nigeria

Authors: Usman A. Usman

Abstract:

Undeniably, unlike the Classical side, the Keynesian perspective of the aggregate demand side indeed has a significant position in the policy, growth, and welfare of Nigeria due to government involvement and ineffective demand of the population living with poor per capita income. This study seeks to investigate the effect of Government Final Consumption Expenditure, Financial Deepening on Households, and NPISHs Final consumption expenditure using data on Nigeria from 1981 to 2019. This study employed the ADF stationarity test, Johansen Cointegration test, and Vector Error Correction Model. The results of the study revealed that the coefficient of Government final consumption expenditure has a positive effect on household consumption expenditure in the long run. There is a long-run and short-run relationship between gross fixed capital formation and household consumption expenditure. The coefficients cpsgdp financial deepening and gross fixed capital formation posit a negative impact on household final consumption expenditure. The coefficients money supply lm2gdp, which is another proxy for financial deepening, and the coefficient FDI have a positive effect on household final consumption expenditure in the long run. Therefore, this study recommends that Gross fixed capital formation stimulates household consumption expenditure; a legal framework to support investment is a panacea to increasing hoodmold income and consumption and reducing poverty in Nigeria. Therefore, this should be a key central component of policy.

Keywords: household, government expenditures, vector error correction model, johansen test

Procedia PDF Downloads 61
1411 The Bayesian Premium Under Entropy Loss

Authors: Farouk Metiri, Halim Zeghdoudi, Mohamed Riad Remita

Abstract:

Credibility theory is an experience rating technique in actuarial science which can be seen as one of quantitative tools that allows the insurers to perform experience rating, that is, to adjust future premiums based on past experiences. It is used usually in automobile insurance, worker's compensation premium, and IBNR (incurred but not reported claims to the insurer) where credibility theory can be used to estimate the claim size amount. In this study, we focused on a popular tool in credibility theory which is the Bayesian premium estimator, considering Lindley distribution as a claim distribution. We derive this estimator under entropy loss which is asymmetric and squared error loss which is a symmetric loss function with informative and non-informative priors. In a purely Bayesian setting, the prior distribution represents the insurer’s prior belief about the insured’s risk level after collection of the insured’s data at the end of the period. However, the explicit form of the Bayesian premium in the case when the prior is not a member of the exponential family could be quite difficult to obtain as it involves a number of integrations which are not analytically solvable. The paper finds a solution to this problem by deriving this estimator using numerical approximation (Lindley approximation) which is one of the suitable approximation methods for solving such problems, it approaches the ratio of the integrals as a whole and produces a single numerical result. Simulation study using Monte Carlo method is then performed to evaluate this estimator and mean squared error technique is made to compare the Bayesian premium estimator under the above loss functions.

Keywords: bayesian estimator, credibility theory, entropy loss, monte carlo simulation

Procedia PDF Downloads 334
1410 Accurate Calculation of the Penetration Depth of a Bullet Using ANSYS

Authors: Eunsu Jang, Kang Park

Abstract:

In developing an armored ground combat vehicle (AGCV), it is a very important step to analyze the vulnerability (or the survivability) of the AGCV against enemy’s attack. In the vulnerability analysis, the penetration equations are usually used to get the penetration depth and check whether a bullet can penetrate the armor of the AGCV, which causes the damage of internal components or crews. The penetration equations are derived from penetration experiments which require long time and great efforts. However, they usually hold only for the specific material of the target and the specific type of the bullet used in experiments. Thus, penetration simulation using ANSYS can be another option to calculate penetration depth. However, it is very important to model the targets and select the input parameters in order to get an accurate penetration depth. This paper performed a sensitivity analysis of input parameters of ANSYS on the accuracy of the calculated penetration depth. Two conflicting objectives need to be achieved in adopting ANSYS in penetration analysis: maximizing the accuracy of calculation and minimizing the calculation time. To maximize the calculation accuracy, the sensitivity analysis of the input parameters for ANSYS was performed and calculated the RMS error with the experimental data. The input parameters include mesh size, boundary condition, material properties, target diameter are tested and selected to minimize the error between the calculated result from simulation and the experiment data from the papers on the penetration equation. To minimize the calculation time, the parameter values obtained from accuracy analysis are adjusted to get optimized overall performance. As result of analysis, the followings were found: 1) As the mesh size gradually decreases from 0.9 mm to 0.5 mm, both the penetration depth and calculation time increase. 2) As diameters of the target decrease from 250mm to 60 mm, both the penetration depth and calculation time decrease. 3) As the yield stress which is one of the material property of the target decreases, the penetration depth increases. 4) The boundary condition with the fixed side surface of the target gives more penetration depth than that with the fixed side and rear surfaces. By using above finding, the input parameters can be tuned to minimize the error between simulation and experiments. By using simulation tool, ANSYS, with delicately tuned input parameters, penetration analysis can be done on computer without actual experiments. The data of penetration experiments are usually hard to get because of security reasons and only published papers provide them in the limited target material. The next step of this research is to generalize this approach to anticipate the penetration depth by interpolating the known penetration experiments. This result may not be accurate enough to be used to replace the penetration experiments, but those simulations can be used in the early stage of the design process of AGCV in modelling and simulation stage.

Keywords: ANSYS, input parameters, penetration depth, sensitivity analysis

Procedia PDF Downloads 401
1409 Application of Grey Theory in the Forecast of Facility Maintenance Hours for Office Building Tenants and Public Areas

Authors: Yen Chia-Ju, Cheng Ding-Ruei

Abstract:

This study took case office building as subject and explored the responsive work order repair request of facilities and equipment in offices and public areas by gray theory, with the purpose of providing for future related office building owners, executive managers, property management companies, mechanical and electrical companies as reference for deciding and assessing forecast model. Important conclusions of this study are summarized as follows according to the study findings: 1. Grey Relational Analysis discusses the importance of facilities repair number of six categories, namely, power systems, building systems, water systems, air conditioning systems, fire systems and manpower dispatch in order. In terms of facilities maintenance importance are power systems, building systems, water systems, air conditioning systems, manpower dispatch and fire systems in order. 2. GM (1,N) and regression method took maintenance hours as dependent variables and repair number, leased area and tenants number as independent variables and conducted single month forecast based on 12 data from January to December 2011. The mean absolute error and average accuracy of GM (1,N) from verification results were 6.41% and 93.59%; the mean absolute error and average accuracy of regression model were 4.66% and 95.34%, indicating that they have highly accurate forecast capability.

Keywords: rey theory, forecast model, Taipei 101, office buildings, property management, facilities, equipment

Procedia PDF Downloads 444
1408 Microwave Dielectric Constant Measurements of Titanium Dioxide Using Five Mixture Equations

Authors: Jyh Sheen, Yong-Lin Wang

Abstract:

This research dedicates to find a different measurement procedure of microwave dielectric properties of ceramic materials with high dielectric constants. For the composite of ceramic dispersed in the polymer matrix, the dielectric constants of the composites with different concentrations can be obtained by various mixture equations. The other development of mixture rule is to calculate the permittivity of ceramic from measurements on composite. To do this, the analysis method and theoretical accuracy on six basic mixture laws derived from three basic particle shapes of ceramic fillers have been reported for dielectric constants of ceramic less than 40 at microwave frequency. Similar researches have been done for other well-known mixture rules. They have shown that both the physical curve matching with experimental results and low potential theory error are important to promote the calculation accuracy. Recently, a modified of mixture equation for high dielectric constant ceramics at microwave frequency has also been presented for strontium titanate (SrTiO3) which was selected from five more well known mixing rules and has shown a good accuracy for high dielectric constant measurements. However, it is still not clear the accuracy of this modified equation for other high dielectric constant materials. Therefore, the five more well known mixing rules are selected again to understand their application to other high dielectric constant ceramics. The other high dielectric constant ceramic, TiO2 with dielectric constant 100, was then chosen for this research. Their theoretical error equations are derived. In addition to the theoretical research, experimental measurements are always required. Titanium dioxide is an interesting ceramic for microwave applications. In this research, its powder is adopted as the filler material and polyethylene powder is like the matrix material. The dielectric constants of those ceramic-polyethylene composites with various compositions were measured at 10 GHz. The theoretical curves of the five published mixture equations are shown together with the measured results to understand the curve matching condition of each rule. Finally, based on the experimental observation and theoretical analysis, one of the five rules was selected and modified to a new powder mixture equation. This modified rule has show very good curve matching with the measurement data and low theoretical error. We can then calculate the dielectric constant of pure filler medium (titanium dioxide) by those mixing equations from the measured dielectric constants of composites. The accuracy on the estimating dielectric constant of pure ceramic by various mixture rules will be compared. This modified mixture rule has also shown good measurement accuracy on the dielectric constant of titanium dioxide ceramic. This study can be applied to the microwave dielectric properties measurements of other high dielectric constant ceramic materials in the future.

Keywords: microwave measurement, dielectric constant, mixture rules, composites

Procedia PDF Downloads 367
1407 Attention States in the Sustained Attention to Response Task: Effects of Trial Duration, Mind-Wandering and Focus

Authors: Aisling Davies, Ciara Greene

Abstract:

Over the past decade the phenomenon of mind-wandering in cognitive tasks has attracted widespread scientific attention. Research indicates that mind-wandering occurrences can be detected through behavioural responses in the Sustained Attention to Response Task (SART) and several studies have attributed a specific pattern of responding around an error in this task to an observable effect of a mind-wandering state. SART behavioural responses are also widely accepted as indices of sustained attention and of general attention lapses. However, evidence suggests that these same patterns of responding may be attributable to other factors associated with more focused states and that it may also be possible to distinguish the two states within the same task. To use behavioural responses in the SART to study mind-wandering, it is essential to establish both the SART parameters that would increase the likelihood of errors due to mind-wandering, and exactly what type of responses are indicative of mind-wandering, neither of which have yet been determined. The aims of this study were to compare different versions of the SART to establish which task would induce the most mind-wandering episodes and to determine whether mind-wandering related errors can be distinguished from errors during periods of focus, by behavioural responses in the SART. To achieve these objectives, 25 Participants completed four modified versions of the SART that differed from the classic paradigm in several ways so to capture more instances of mind-wandering. The duration that trials were presented for was increased proportionately across each of the four versions of the task; Standard, Medium Slow, Slow, and Very Slow and participants intermittently responded to thought probes assessing their level of focus and degree of mind-wandering throughout. Error rates, reaction times and variability in reaction times decreased in proportion to the decrease in trial duration rate and the proportion of mind-wandering related errors increased, until the Very Slow condition where the extra decrease in duration no longer had an effect. Distinct reaction time patterns around an error, dependent on level of focus (high/low) and level of mind-wandering (high/low) were also observed indicating four separate attention states occurring within the SART. This study establishes the optimal duration of trial presentation for inducing mind-wandering in the SART, provides evidence supporting the idea that different attention states can be observed within the SART and highlights the importance of addressing other factors contributing to behavioural responses when studying mind-wandering during this task. A notable finding in relation to the standard SART, was that while more errors were observed in this version of the task, most of these errors were during periods of focus, raising significant questions about our current understanding of mind-wandering and associated failures of attention.

Keywords: attention, mind-wandering, trial duration rate, Sustained Attention to Response Task (SART)

Procedia PDF Downloads 182
1406 Near Optimal Closed-Loop Guidance Gains Determination for Vector Guidance Law, from Impact Angle Errors and Miss Distance Considerations

Authors: Karthikeyan Kalirajan, Ashok Joshi

Abstract:

An optimization problem is to setup to maximize the terminal kinetic energy of a maneuverable reentry vehicle (MaRV). The target location, the impact angle is given as constraints. The MaRV uses an explicit guidance law called Vector guidance. This law has two gains which are taken as decision variables. The problem is to find the optimal value of these gains which will result in minimum miss distance and impact angle error. Using a simple 3DOF non-rotating flat earth model and Lockheed martin HP-MARV as the reentry vehicle, the nature of solutions of the optimization problem is studied. This is achieved by carrying out a parametric study for a range of closed loop gain values and the corresponding impact angle error and the miss distance values are generated. The results show that there are well defined lower and upper bounds on the gains that result in near optimal terminal guidance solution. It is found from this study, that there exist common permissible regions (values of gains) where all constraints are met. Moreover, the permissible region lies between flat regions and hence the optimization algorithm has to be chosen carefully. It is also found that, only one of the gain values is independent and that the other dependent gain value is related through a simple straight-line expression. Moreover, to reduce the computational burden of finding the optimal value of two gains, a guidance law called Diveline guidance is discussed, which uses single gain. The derivation of the Diveline guidance law from Vector guidance law is discussed in this paper.

Keywords: Marv guidance, reentry trajectory, trajectory optimization, guidance gain selection

Procedia PDF Downloads 427
1405 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest

Procedia PDF Downloads 231
1404 Evaluating the Validity of CFD Model of Dispersion in a Complex Urban Geometry Using Two Sets of Experimental Measurements

Authors: Mohammad R. Kavian Nezhad, Carlos F. Lange, Brian A. Fleck

Abstract:

This research presents the validation study of a computational fluid dynamics (CFD) model developed to simulate the scalar dispersion emitted from rooftop sources around the buildings at the University of Alberta North Campus. The ANSYS CFX code was used to perform the numerical simulation of the wind regime and pollutant dispersion by solving the 3D steady Reynolds-averaged Navier-Stokes (RANS) equations on a building-scale high-resolution grid. The validation study was performed in two steps. First, the CFD model performance in 24 cases (eight wind directions and three wind speeds) was evaluated by comparing the predicted flow fields with the available data from the previous measurement campaign designed at the North Campus, using the standard deviation method (SDM), while the estimated results of the numerical model showed maximum average percent errors of approximately 53% and 37% for wind incidents from the North and Northwest, respectively. Good agreement with the measurements was observed for the other six directions, with an average error of less than 30%. In the second step, the reliability of the implemented turbulence model, numerical algorithm, modeling techniques, and the grid generation scheme was further evaluated using the Mock Urban Setting Test (MUST) dispersion dataset. Different statistical measures, including the fractional bias (FB), the geometric mean bias (MG), and the normalized mean square error (NMSE), were used to assess the accuracy of the predicted dispersion field. Our CFD results are in very good agreement with the field measurements.

Keywords: CFD, plume dispersion, complex urban geometry, validation study, wind flow

Procedia PDF Downloads 135
1403 Forecasting Nokoué Lake Water Levels Using Long Short-Term Memory Network

Authors: Namwinwelbere Dabire, Eugene C. Ezin, Adandedji M. Firmin

Abstract:

The prediction of hydrological flows (rainfall-depth or rainfall-discharge) is becoming increasingly important in the management of hydrological risks such as floods. In this study, the Long Short-Term Memory (LSTM) network, a state-of-the-art algorithm dedicated to time series, is applied to predict the daily water level of Nokoue Lake in Benin. This paper aims to provide an effective and reliable method enable of reproducing the future daily water level of Nokoue Lake, which is influenced by a combination of two phenomena: rainfall and river flow (runoff from the Ouémé River, the Sô River, the Porto-Novo lagoon, and the Atlantic Ocean). Performance analysis based on the forecasting horizon indicates that LSTM can predict the water level of Nokoué Lake up to a forecast horizon of t+10 days. Performance metrics such as Root Mean Square Error (RMSE), coefficient of correlation (R²), Nash-Sutcliffe Efficiency (NSE), and Mean Absolute Error (MAE) agree on a forecast horizon of up to t+3 days. The values of these metrics remain stable for forecast horizons of t+1 days, t+2 days, and t+3 days. The values of R² and NSE are greater than 0.97 during the training and testing phases in the Nokoué Lake basin. Based on the evaluation indices used to assess the model's performance for the appropriate forecast horizon of water level in the Nokoué Lake basin, the forecast horizon of t+3 days is chosen for predicting future daily water levels.

Keywords: forecasting, long short-term memory cell, recurrent artificial neural network, Nokoué lake

Procedia PDF Downloads 64
1402 Acceleration-Based Motion Model for Visual Simultaneous Localization and Mapping

Authors: Daohong Yang, Xiang Zhang, Lei Li, Wanting Zhou

Abstract:

Visual Simultaneous Localization and Mapping (VSLAM) is a technology that obtains information in the environment for self-positioning and mapping. It is widely used in computer vision, robotics and other fields. Many visual SLAM systems, such as OBSLAM3, employ a constant-speed motion model that provides the initial pose of the current frame to improve the speed and accuracy of feature matching. However, in actual situations, the constant velocity motion model is often difficult to be satisfied, which may lead to a large deviation between the obtained initial pose and the real value, and may lead to errors in nonlinear optimization results. Therefore, this paper proposed a motion model based on acceleration, which can be applied on most SLAM systems. In order to better describe the acceleration of the camera pose, we decoupled the pose transformation matrix, and calculated the rotation matrix and the translation vector respectively, where the rotation matrix is represented by rotation vector. We assume that, in a short period of time, the changes of rotating angular velocity and translation vector remain the same. Based on this assumption, the initial pose of the current frame is estimated. In addition, the error of constant velocity model was analyzed theoretically. Finally, we applied our proposed approach to the ORBSLAM3 system and evaluated two sets of sequences on the TUM dataset. The results showed that our proposed method had a more accurate initial pose estimation and the accuracy of ORBSLAM3 system is improved by 6.61% and 6.46% respectively on the two test sequences.

Keywords: error estimation, constant acceleration motion model, pose estimation, visual SLAM

Procedia PDF Downloads 94
1401 The Impact of Introspective Models on Software Engineering

Authors: Rajneekant Bachan, Dhanush Vijay

Abstract:

The visualization of operating systems has refined the Turing machine, and current trends suggest that the emulation of 32 bit architectures will soon emerge. After years of technical research into Web services, we demonstrate the synthesis of gigabit switches, which embodies the robust principles of theory. Loam, our new algorithm for forward-error correction, is the solution to all of these challenges.

Keywords: software engineering, architectures, introspective models, operating systems

Procedia PDF Downloads 538
1400 The Per Capita Income, Energy production and Environmental Degradation: A Comprehensive Assessment of the existence of the Environmental Kuznets Curve Hypothesis in Bangladesh

Authors: Ashique Mahmud, MD. Ataul Gani Osmani, Shoria Sharmin

Abstract:

In the first quarter of the twenty-first century, the most substantial global concern is environmental contamination, and it has gained the prioritization of both the national and international community. Keeping in mind this crucial fact, this study conducted different statistical and econometrical methods to identify whether the gross national income of the country has a significant impact on electricity production from nonrenewable sources and different air pollutants like carbon dioxide, nitrous oxide, and methane emissions. Besides, the primary objective of this research was to analyze whether the environmental Kuznets curve hypothesis holds for the examined variables. After analyzing different statistical properties of the variables, this study came to the conclusion that the environmental Kuznets curve hypothesis holds for gross national income and carbon dioxide emission in Bangladesh in the short run as well as the long run. This study comes to this conclusion based on the findings of ordinary least square estimations, ARDL bound tests, short-run causality analysis, the Error Correction Model, and other pre-diagnostic and post-diagnostic tests that have been employed in the structural model. Moreover, this study wants to demonstrate that the outline of gross national income and carbon dioxide emissions is in its initial stage of development and will increase up to the optimal peak. The compositional effect will then force the emission to decrease, and the environmental quality will be restored in the long run.

Keywords: environmental Kuznets curve hypothesis, carbon dioxide emission in Bangladesh, gross national income in Bangladesh, autoregressive distributed lag model, granger causality, error correction model

Procedia PDF Downloads 150
1399 Lamb Waves Wireless Communication in Healthy Plates Using Coherent Demodulation

Authors: Rudy Bahouth, Farouk Benmeddour, Emmanuel Moulin, Jamal Assaad

Abstract:

Guided ultrasonic waves are used in Non-Destructive Testing (NDT) and Structural Health Monitoring (SHM) for inspection and damage detection. Recently, wireless data transmission using ultrasonic waves in solid metallic channels has gained popularity in some industrial applications such as nuclear, aerospace and smart vehicles. The idea is to find a good substitute for electromagnetic waves since they are highly attenuated near metallic components due to Faraday shielding. The proposed solution is to use ultrasonic guided waves such as Lamb waves as an information carrier due to their capability of propagation for long distances. In addition to this, valuable information about the health of the structure could be extracted simultaneously. In this work, the reliable frequency bandwidth for communication is extracted experimentally from dispersion curves at first. Then, an experimental platform for wireless communication using Lamb waves is described and built. After this, coherent demodulation algorithm used in telecommunications is tested for Amplitude Shift Keying, On-Off Keying and Binary Phase Shift Keying modulation techniques. Signal processing parameters such as threshold choice, number of cycles per bit and Bit Rate are optimized. Experimental results are compared based on the average Bit Error Rate. Results have shown high sensitivity to threshold selection for Amplitude Shift Keying and On-Off Keying techniques resulting a Bit Rate decrease. Binary Phase Shift Keying technique shows the highest stability and data rate between all tested modulation techniques.

Keywords: lamb waves communication, wireless communication, coherent demodulation, bit error rate

Procedia PDF Downloads 260
1398 A Pilot Study to Investigate the Use of Machine Translation Post-Editing Training for Foreign Language Learning

Authors: Hong Zhang

Abstract:

The main purpose of this study is to show that machine translation (MT) post-editing (PE) training can help our Chinese students learn Spanish as a second language. Our hypothesis is that they might make better use of it by learning PE skills specific for foreign language learning. We have developed PE training materials based on the data collected in a previous study. Training material included the special error types of the output of MT and the error types that our Chinese students studying Spanish could not detect in the experiment last year. This year we performed a pilot study in order to evaluate the PE training materials effectiveness and to what extent PE training helps Chinese students who study the Spanish language. We used screen recording to record these moments and made note of every action done by the students. Participants were speakers of Chinese with intermediate knowledge of Spanish. They were divided into two groups: Group A performed PE training and Group B did not. We prepared a Chinese text for both groups, and participants translated it by themselves (human translation), and then used Google Translate to translate the text and asked them to post-edit the raw MT output. Comparing the results of PE test, Group A could identify and correct the errors faster than Group B students, Group A did especially better in omission, word order, part of speech, terminology, mistranslation, official names, and formal register. From the results of this study, we can see that PE training can help Chinese students learn Spanish as a second language. In the future, we could focus on the students’ struggles during their Spanish studies and complete the PE training materials to teach Chinese students learning Spanish with machine translation.

Keywords: machine translation, post-editing, post-editing training, Chinese, Spanish, foreign language learning

Procedia PDF Downloads 144
1397 Modeling Search-And-Rescue Operations by Autonomous Mobile Robots at Sea

Authors: B. Kriheli, E. Levner, T. C. E. Cheng, C. T. Ng

Abstract:

During the last decades, research interest in planning, scheduling, and control of emergency response operations, especially people rescue and evacuation from the dangerous zone of marine accidents, has increased dramatically. Until the survivors (called ‘targets’) are found and saved, it may cause loss or damage whose extent depends on the location of the targets and the search duration. The problem is to efficiently search for and detect/rescue the targets as soon as possible with the help of intelligent mobile robots so as to maximize the number of saved people and/or minimize the search cost under restrictions on the amount of saved people within the allowable response time. We consider a special situation when the autonomous mobile robots (AMR), e.g., unmanned aerial vehicles and remote-controlled robo-ships have no operator on board as they are guided and completely controlled by on-board sensors and computer programs. We construct a mathematical model for the search process in an uncertain environment and provide a new fast algorithm for scheduling the activities of the autonomous robots during the search-and rescue missions after an accident at sea. We presume that in the unknown environments, the AMR’s search-and-rescue activity is subject to two types of error: (i) a 'false-negative' detection error where a target object is not discovered (‘overlooked') by the AMR’s sensors in spite that the AMR is in a close neighborhood of the latter and (ii) a 'false-positive' detection error, also known as ‘a false alarm’, in which a clean place or area is wrongly classified by the AMR’s sensors as a correct target. As the general resource-constrained discrete search problem is NP-hard, we restrict our study to finding local-optimal strategies. A specificity of the considered operational research problem in comparison with the traditional Kadane-De Groot-Stone search models is that in our model the probability of the successful search outcome depends not only on cost/time/probability parameters assigned to each individual location but, as well, on parameters characterizing the entire history of (unsuccessful) search before selecting any next location. We provide a fast approximation algorithm for finding the AMR route adopting a greedy search strategy in which, in each step, the on-board computer computes a current search effectiveness value for each location in the zone and sequentially searches for a location with the highest search effectiveness value. Extensive experiments with random and real-life data provide strong evidence in favor of the suggested operations research model and corresponding algorithm.

Keywords: disaster management, intelligent robots, scheduling algorithm, search-and-rescue at sea

Procedia PDF Downloads 171
1396 Design of the Compliant Mechanism of a Biomechanical Assistive Device for the Knee

Authors: Kevin Giraldo, Juan A. Gallego, Uriel Zapata, Fanny L. Casado

Abstract:

Compliant mechanisms are designed to deform in a controlled manner in response to external forces, utilizing the flexibility of their components to store potential elastic energy during deformation, gradually releasing it upon returning to its original form. This article explores the design of a knee orthosis intended to assist users during stand-up motion. The orthosis makes use of a compliant mechanism to balance the user’s weight, thereby minimizing the strain on leg muscles during standup motion. The primary function of the compliant mechanism is to store and exchange potential energy, so when coupled with the gravitational potential of the user, the total potential energy variation is minimized. The design process for the semi-rigid knee orthosis involved material selection and the development of a numerical model for the compliant mechanism seen as a spring. Geometric properties are obtained through the numerical modeling of the spring once the desired stiffness and safety factor values have been attained. Subsequently, a 3D finite element analysis was conducted. The study demonstrates a strong correlation between the maximum stress in the mathematical model (250.22 MPa) and the simulation (239.8 MPa), with a 4.16% error. Both analyses safety factors: 1.02 for the mathematical approach and 1.1 for the simulation, with a consistent 7.84% margin of error. The spring’s stiffness, calculated at 90.82 Nm/rad analytically and 85.71 Nm/rad in the simulation, exhibits a 5.62% difference. These results suggest significant potential for the proposed device in assisting patients with knee orthopedic restrictions, contributing to ongoing efforts in advancing the understanding and treatment of knee osteoarthritis.

Keywords: biomechanics, complaint mechanisms, gonarthrosis, orthoses

Procedia PDF Downloads 36
1395 Identifying Protein-Coding and Non-Coding Regions in Transcriptomes

Authors: Angela U. Makolo

Abstract:

Protein-coding and Non-coding regions determine the biology of a sequenced transcriptome. Research advances have shown that Non-coding regions are important in disease progression and clinical diagnosis. Existing bioinformatics tools have been targeted towards Protein-coding regions alone. Therefore, there are challenges associated with gaining biological insights from transcriptome sequence data. These tools are also limited to computationally intensive sequence alignment, which is inadequate and less accurate to identify both Protein-coding and Non-coding regions. Alignment-free techniques can overcome the limitation of identifying both regions. Therefore, this study was designed to develop an efficient sequence alignment-free model for identifying both Protein-coding and Non-coding regions in sequenced transcriptomes. Feature grouping and randomization procedures were applied to the input transcriptomes (37,503 data points). Successive iterations were carried out to compute the gradient vector that converged the developed Protein-coding and Non-coding Region Identifier (PNRI) model to the approximate coefficient vector. The logistic regression algorithm was used with a sigmoid activation function. A parameter vector was estimated for every sample in 37,503 data points in a bid to reduce the generalization error and cost. Maximum Likelihood Estimation (MLE) was used for parameter estimation by taking the log-likelihood of six features and combining them into a summation function. Dynamic thresholding was used to classify the Protein-coding and Non-coding regions, and the Receiver Operating Characteristic (ROC) curve was determined. The generalization performance of PNRI was determined in terms of F1 score, accuracy, sensitivity, and specificity. The average generalization performance of PNRI was determined using a benchmark of multi-species organisms. The generalization error for identifying Protein-coding and Non-coding regions decreased from 0.514 to 0.508 and to 0.378, respectively, after three iterations. The cost (difference between the predicted and the actual outcome) also decreased from 1.446 to 0.842 and to 0.718, respectively, for the first, second and third iterations. The iterations terminated at the 390th epoch, having an error of 0.036 and a cost of 0.316. The computed elements of the parameter vector that maximized the objective function were 0.043, 0.519, 0.715, 0.878, 1.157, and 2.575. The PNRI gave an ROC of 0.97, indicating an improved predictive ability. The PNRI identified both Protein-coding and Non-coding regions with an F1 score of 0.970, accuracy (0.969), sensitivity (0.966), and specificity of 0.973. Using 13 non-human multi-species model organisms, the average generalization performance of the traditional method was 74.4%, while that of the developed model was 85.2%, thereby making the developed model better in the identification of Protein-coding and Non-coding regions in transcriptomes. The developed Protein-coding and Non-coding region identifier model efficiently identified the Protein-coding and Non-coding transcriptomic regions. It could be used in genome annotation and in the analysis of transcriptomes.

Keywords: sequence alignment-free model, dynamic thresholding classification, input randomization, genome annotation

Procedia PDF Downloads 68
1394 Finite Element Modeling of Mass Transfer Phenomenon and Optimization of Process Parameters for Drying of Paddy in a Hybrid Solar Dryer

Authors: Aprajeeta Jha, Punyadarshini P. Tripathy

Abstract:

Drying technologies for various food processing operations shares an inevitable linkage with energy, cost and environmental sustainability. Hence, solar drying of food grains has become imperative choice to combat duo challenges of meeting high energy demand for drying and to address climate change scenario. But performance and reliability of solar dryers depend hugely on sunshine period, climatic conditions, therefore, offer a limited control over drying conditions and have lower efficiencies. Solar drying technology, supported by Photovoltaic (PV) power plant and hybrid type solar air collector can potentially overpower the disadvantages of solar dryers. For development of such robust hybrid dryers; to ensure quality and shelf-life of paddy grains the optimization of process parameter becomes extremely critical. Investigation of the moisture distribution profile within the grains becomes necessary in order to avoid over drying or under drying of food grains in hybrid solar dryer. Computational simulations based on finite element modeling can serve as potential tool in providing a better insight of moisture migration during drying process. Hence, present work aims at optimizing the process parameters and to develop a 3-dimensional (3D) finite element model (FEM) for predicting moisture profile in paddy during solar drying. COMSOL Multiphysics was employed to develop a 3D finite element model for predicting moisture profile. Furthermore, optimization of process parameters (power level, air velocity and moisture content) was done using response surface methodology in design expert software. 3D finite element model (FEM) for predicting moisture migration in single kernel for every time step has been developed and validated with experimental data. The mean absolute error (MAE), mean relative error (MRE) and standard error (SE) were found to be 0.003, 0.0531 and 0.0007, respectively, indicating close agreement of model with experimental results. Furthermore, optimized process parameters for drying paddy were found to be 700 W, 2.75 m/s at 13% (wb) with optimum temperature, milling yield and drying time of 42˚C, 62%, 86 min respectively, having desirability of 0.905. Above optimized conditions can be successfully used to dry paddy in PV integrated solar dryer in order to attain maximum uniformity, quality and yield of product. PV-integrated hybrid solar dryers can be employed as potential and cutting edge drying technology alternative for sustainable energy and food security.

Keywords: finite element modeling, moisture migration, paddy grain, process optimization, PV integrated hybrid solar dryer

Procedia PDF Downloads 150
1393 Theory of the Optimum Signal Approximation Clarifying the Importance in the Recognition of Parallel World and Application to Secure Signal Communication with Feedback

Authors: Takuro Kida, Yuichi Kida

Abstract:

In this paper, it is shown a base of the new trend of algorithm mathematically that treats a historical reason of continuous discrimination in the world as well as its solution by introducing new concepts of parallel world that includes an invisible set of errors as its companion. With respect to a matrix operator-filter bank that the matrix operator-analysis-filter bank H and the matrix operator-sampling-filter bank S are given, firstly, we introduce the detail algorithm to derive the optimum matrix operator-synthesis-filter bank Z that minimizes all the worst-case measures of the matrix operator-error-signals E(ω) = F(ω) − Y(ω) between the matrix operator-input-signals F(ω) and the matrix operator-output-signals Y(ω) of the matrix operator-filter bank at the same time. Further, feedback is introduced to the above approximation theory, and it is indicated that introducing conversations with feedback do not superior automatically to the accumulation of existing knowledge of signal prediction. Secondly, the concept of category in the field of mathematics is applied to the above optimum signal approximation and is indicated that the category-based approximation theory is applied to the set-theoretic consideration of the recognition of humans. Based on this discussion, it is shown naturally why the narrow perception that tends to create isolation shows an apparent advantage in the short term and, often, why such narrow thinking becomes intimate with discriminatory action in a human group. Throughout these considerations, it is presented that, in order to abolish easy and intimate discriminatory behavior, it is important to create a parallel world of conception where we share the set of invisible error signals, including the words and the consciousness of both worlds.

Keywords: matrix filterbank, optimum signal approximation, category theory, simultaneous minimization

Procedia PDF Downloads 143
1392 Application of Particle Swarm Optimization to Thermal Sensor Placement for Smart Grid

Authors: Hung-Shuo Wu, Huan-Chieh Chiu, Xiang-Yao Zheng, Yu-Cheng Yang, Chien-Hao Wang, Jen-Cheng Wang, Chwan-Lu Tseng, Joe-Air Jiang

Abstract:

Dynamic Thermal Rating (DTR) provides crucial information by estimating the ampacity of transmission lines to improve power dispatching efficiency. To perform the DTR, it is necessary to install on-line thermal sensors to monitor conductor temperature and weather variables. A simple and intuitive strategy is to allocate a thermal sensor to every span of transmission lines, but the cost of sensors might be too high to bear. To deal with the cost issue, a thermal sensor placement problem must be solved. This research proposes and implements a hybrid algorithm which combines proper orthogonal decomposition (POD) with particle swarm optimization (PSO) methods. The proposed hybrid algorithm solves a multi-objective optimization problem that concludes the minimum number of sensors and the minimum error on conductor temperature, and the optimal sensor placement is determined simultaneously. The data of 345 kV transmission lines and the hourly weather data from the Taiwan Power Company and Central Weather Bureau (CWB), respectively, are used by the proposed method. The simulated results indicate that the number of sensors could be reduced using the optimal placement method proposed by the study and an acceptable error on conductor temperature could be achieved. This study provides power companies with a reliable reference for efficiently monitoring and managing their power grids.

Keywords: dynamic thermal rating, proper orthogonal decomposition, particle swarm optimization, sensor placement, smart grid

Procedia PDF Downloads 432
1391 An Adaptive Oversampling Technique for Imbalanced Datasets

Authors: Shaukat Ali Shahee, Usha Ananthakumar

Abstract:

A data set exhibits class imbalance problem when one class has very few examples compared to the other class, and this is also referred to as between class imbalance. The traditional classifiers fail to classify the minority class examples correctly due to its bias towards the majority class. Apart from between-class imbalance, imbalance within classes where classes are composed of a different number of sub-clusters with these sub-clusters containing different number of examples also deteriorates the performance of the classifier. Previously, many methods have been proposed for handling imbalanced dataset problem. These methods can be classified into four categories: data preprocessing, algorithmic based, cost-based methods and ensemble of classifier. Data preprocessing techniques have shown great potential as they attempt to improve data distribution rather than the classifier. Data preprocessing technique handles class imbalance either by increasing the minority class examples or by decreasing the majority class examples. Decreasing the majority class examples lead to loss of information and also when minority class has an absolute rarity, removing the majority class examples is generally not recommended. Existing methods available for handling class imbalance do not address both between-class imbalance and within-class imbalance simultaneously. In this paper, we propose a method that handles between class imbalance and within class imbalance simultaneously for binary classification problem. Removing between class imbalance and within class imbalance simultaneously eliminates the biases of the classifier towards bigger sub-clusters by minimizing the error domination of bigger sub-clusters in total error. The proposed method uses model-based clustering to find the presence of sub-clusters or sub-concepts in the dataset. The number of examples oversampled among the sub-clusters is determined based on the complexity of sub-clusters. The method also takes into consideration the scatter of the data in the feature space and also adaptively copes up with unseen test data using Lowner-John ellipsoid for increasing the accuracy of the classifier. In this study, neural network is being used as this is one such classifier where the total error is minimized and removing the between-class imbalance and within class imbalance simultaneously help the classifier in giving equal weight to all the sub-clusters irrespective of the classes. The proposed method is validated on 9 publicly available data sets and compared with three existing oversampling techniques that rely on the spatial location of minority class examples in the euclidean feature space. The experimental results show the proposed method to be statistically significantly superior to other methods in terms of various accuracy measures. Thus the proposed method can serve as a good alternative to handle various problem domains like credit scoring, customer churn prediction, financial distress, etc., that typically involve imbalanced data sets.

Keywords: classification, imbalanced dataset, Lowner-John ellipsoid, model based clustering, oversampling

Procedia PDF Downloads 418