Search results for: data sets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24671

Search results for: data sets

24551 A New Learning Automata-Based Algorithm to the Priority-Based Target Coverage Problem in Directional Sensor Networks

Authors: Shaharuddin Salleh, Sara Marouf, Hosein Mohammadi

Abstract:

Directional sensor networks (DSNs) have recently attracted a great deal of attention due to their extensive applications in a wide range of situations. One of the most important problems associated with DSNs is covering a set of targets in a given area and, at the same time, maximizing the network lifetime. This is due to limitation in sensing angle and battery power of the directional sensors. This problem gets more complicated by the possibility that targets may have different coverage requirements. In the present study, this problem is referred to as priority-based target coverage (PTC). As sensors are often densely deployed, organizing the sensors into several cover sets and then activating these cover sets successively is a promising solution to this problem. In this paper, we propose a learning automata-based algorithm to organize the directional sensors into several cover sets in such a way that each cover set could satisfy coverage requirements of all the targets. Several experiments are conducted to evaluate the performance of the proposed algorithm. The results demonstrated that the algorithms were able to contribute to solving the problem.

Keywords: directional sensor networks, target coverage problem, cover set formation, learning automata

Procedia PDF Downloads 382
24550 Investigating Data Normalization Techniques in Swarm Intelligence Forecasting for Energy Commodity Spot Price

Authors: Yuhanis Yusof, Zuriani Mustaffa, Siti Sakira Kamaruddin

Abstract:

Data mining is a fundamental technique in identifying patterns from large data sets. The extracted facts and patterns contribute in various domains such as marketing, forecasting, and medical. Prior to that, data are consolidated so that the resulting mining process may be more efficient. This study investigates the effect of different data normalization techniques, which are Min-max, Z-score, and decimal scaling, on Swarm-based forecasting models. Recent swarm intelligence algorithms employed includes the Grey Wolf Optimizer (GWO) and Artificial Bee Colony (ABC). Forecasting models are later developed to predict the daily spot price of crude oil and gasoline. Results showed that GWO works better with Z-score normalization technique while ABC produces better accuracy with the Min-Max. Nevertheless, the GWO is more superior that ABC as its model generates the highest accuracy for both crude oil and gasoline price. Such a result indicates that GWO is a promising competitor in the family of swarm intelligence algorithms.

Keywords: artificial bee colony, data normalization, forecasting, Grey Wolf optimizer

Procedia PDF Downloads 445
24549 Characterization of the Groundwater Aquifers at El Sadat City by Joint Inversion of VES and TEM Data

Authors: Usama Massoud, Abeer A. Kenawy, El-Said A. Ragab, Abbas M. Abbas, Heba M. El-Kosery

Abstract:

Vertical Electrical Sounding (VES) and Transient Electro Magnetic (TEM) survey have been applied for characterizing the groundwater aquifers at El Sadat industrial area. El-Sadat city is one of the most important industrial cities in Egypt. It has been constructed more than three decades ago at about 80 km northwest of Cairo along the Cairo–Alexandria desert road. Groundwater is the main source of water supplies required for domestic, municipal, and industrial activities in this area due to the lack of surface water sources. So, it is important to maintain this vital resource in order to sustain the development plans of this city. In this study, VES and TEM data were identically measured at 24 stations along three profiles trending NE–SW with the elongation of the study area. The measuring points were arranged in a grid like pattern with both inter-station spacing and line–line distance of about 2 km. After performing the necessary processing steps, the VES and TEM data sets were inverted individually to multi-layer models, followed by a joint inversion of both data sets. Joint inversion process has succeeded to overcome the model-equivalence problem encountered in the inversion of individual data set. Then, the joint models were used for the construction of a number of cross sections and contour maps showing the lateral and vertical distribution of the geo-electrical parameters in the subsurface medium. Interpretation of the obtained results and correlation with the available geological and hydrogeological information revealed TWO aquifer systems in the area. The shallow Pleistocene aquifer consists of sand and gravel saturated with fresh water and exhibits large thickness exceeding 200 m. The deep Pliocene aquifer is composed of clay and sand and shows low resistivity values. The water bearing layer of the Pleistocene aquifer and the upper surface of Pliocene aquifer are continuous and no structural features have cut this continuity through the investigated area.

Keywords: El Sadat city, joint inversion, VES, TEM

Procedia PDF Downloads 338
24548 Enhancing Understanding and Engagement in Linear Motion Using 7R-Based Module

Authors: Mary Joy C. Montenegro, Voltaire M. Mistades

Abstract:

This action research was implemented to enhance the teaching of linear motion and to improve students' conceptual understanding and engagement using a developed 7R-based module called 'module on vectors and one-dimensional kinematics' (MVOK). MVOK was validated in terms of objectives, contents, format, and language used, presentation, usefulness, and overall presentation. The validation process revealed a value of 4.7 interpreted as 'Very Acceptable' with a substantial agreement (0. 60) from the validators. One intact class of 46 Grade 12 STEM students from one of the public schools in Paranaque City served as the participants of this study. The students were taught using the module during the first semester of the academic year 2019–2020. Employing the mixed-method approach, quantitative data were gathered using pretest/posttest, activity sheets, problem sets, and survey form, while qualitative data were obtained from surveys, interviews, observations, and reflection log. After the implementation, there was a significant difference of 18.4 on students’ conceptual understanding as shown in their pre-test and post-test scores on the 24-item test with a moderate Hake gain equal to 0.45 and an effect size of 0.83. Moreover, the scores on activity and problem sets have a 'very good' to 'excellent' rating, which signifies an increase in the level of students’ conceptual understanding. There also exists a significant difference between the mean scores of students’ engagement overall (t= 4.79, p = 0.000, p < 0.05) and in the dimension of emotion (t = 2.51, p = 0.03) and participation/interaction (t = 5.75, p = 0.001). These findings were supported by gathered qualitative data. Positive views were elicited from the students since it is an accessible tool for learning and has well-detailed explanations and examples. The results of this study may substantiate that using MVOK will lead to better physics content understanding and higher engagement.

Keywords: conceptual understanding, engagement, linear motion, module

Procedia PDF Downloads 101
24547 Mixtures of Length-Biased Weibull Distributions for Loss Severity Modelling

Authors: Taehan Bae

Abstract:

In this paper, a class of length-biased Weibull mixtures is presented to model loss severity data. The proposed model generalizes the Erlang mixtures with the common scale parameter, and it shares many important modelling features, such as flexibility to fit various data distribution shapes and weak-denseness in the class of positive continuous distributions, with the Erlang mixtures. We show that the asymptotic tail estimate of the length-biased Weibull mixture is Weibull-type, which makes the model effective to fit loss severity data with heavy-tailed observations. A method of statistical estimation is discussed with applications on real catastrophic loss data sets.

Keywords: Erlang mixture, length-biased distribution, transformed gamma distribution, asymptotic tail estimate, EM algorithm, expectation-maximization algorithm

Procedia PDF Downloads 195
24546 Exploring Counting Methods for the Vertices of Certain Polyhedra with Uncertainties

Authors: Sammani Danwawu Abdullahi

Abstract:

Vertex Enumeration Algorithms explore the methods and procedures of generating the vertices of general polyhedra formed by system of equations or inequalities. These problems of enumerating the extreme points (vertices) of general polyhedra are shown to be NP-Hard. This lead to exploring how to count the vertices of general polyhedra without listing them. This is also shown to be #P-Complete. Some fully polynomial randomized approximation schemes (fpras) of counting the vertices of some special classes of polyhedra associated with Down-Sets, Independent Sets, 2-Knapsack problems and 2 x n transportation problems are presented together with some discovered open problems.

Keywords: counting with uncertainties, mathematical programming, optimization, vertex enumeration

Procedia PDF Downloads 319
24545 Comparison of Statistical Methods for Estimating Missing Precipitation Data in the River Subbasin Lenguazaque, Colombia

Authors: Miguel Cañon, Darwin Mena, Ivan Cabeza

Abstract:

In this work was compared and evaluated the applicability of statistical methods for the estimation of missing precipitations data in the basin of the river Lenguazaque located in the departments of Cundinamarca and Boyacá, Colombia. The methods used were the method of simple linear regression, distance rate, local averages, mean rates, correlation with nearly stations and multiple regression method. The analysis used to determine the effectiveness of the methods is performed by using three statistical tools, the correlation coefficient (r2), standard error of estimation and the test of agreement of Bland and Altmant. The analysis was performed using real rainfall values removed randomly in each of the seasons and then estimated using the methodologies mentioned to complete the missing data values. So it was determined that the methods with the highest performance and accuracy in the estimation of data according to conditions that were counted are the method of multiple regressions with three nearby stations and a random application scheme supported in the precipitation behavior of related data sets.

Keywords: statistical comparison, precipitation data, river subbasin, Bland and Altmant

Procedia PDF Downloads 443
24544 Application of a New Efficient Normal Parameter Reduction Algorithm of Soft Sets in Online Shopping

Authors: Xiuqin Ma, Hongwu Qin

Abstract:

A new efficient normal parameter reduction algorithm of soft set in decision making was proposed. However, up to the present, few documents have focused on real-life applications of this algorithm. Accordingly, we apply a New Efficient Normal Parameter Reduction algorithm into real-life datasets of online shopping, such as Blackberry Mobile Phone Dataset. Experimental results show that this algorithm is not only suitable but feasible for dealing with the online shopping.

Keywords: soft sets, parameter reduction, normal parameter reduction, online shopping

Procedia PDF Downloads 484
24543 Wage Differentiation Patterns of Households Revisited for Turkey in Same Industry Employment: A Pseudo-Panel Approach

Authors: Yasin Kutuk, Bengi Yanik Ilhan

Abstract:

Previous studies investigate the wage differentiations among regions in Turkey between couples who work in the same industry and those who work in different industries by using the models that is appropriate for cross sectional data. However, since there is no available panel data for this investigation in Turkey, pseudo panels using repeated cross-section data sets of the Household Labor Force Surveys 2004-2014 are employed in order to open a new way to examine wage differentiation patterns. For this purpose, household heads are separated into groups with respect to their household composition. These groups’ membership is assumed to be fixed over time such as age groups, education, gender, and NUTS1 (12 regions) Level. The average behavior of them can be tracked overtime same as in the panel data. Estimates using the pseudo panel data would be consistent with the estimates using genuine panel data on individuals if samples are representative of the population which has fixed composition, characteristics. With controlling the socioeconomic factors, wage differentiation of household income is affected by social, cultural and economic changes after global economic crisis emerged in US. It is also revealed whether wage differentiation is changing among the birth cohorts.

Keywords: wage income, same industry, pseudo panel, panel data econometrics

Procedia PDF Downloads 368
24542 Metabolic Predictive Model for PMV Control Based on Deep Learning

Authors: Eunji Choi, Borang Park, Youngjae Choi, Jinwoo Moon

Abstract:

In this study, a predictive model for estimating the metabolism (MET) of human body was developed for the optimal control of indoor thermal environment. Human body images for indoor activities and human body joint coordinated values were collected as data sets, which are used in predictive model. A deep learning algorithm was used in an initial model, and its number of hidden layers and hidden neurons were optimized. Lastly, the model prediction performance was analyzed after the model being trained through collected data. In conclusion, the possibility of MET prediction was confirmed, and the direction of the future study was proposed as developing various data and the predictive model.

Keywords: deep learning, indoor quality, metabolism, predictive model

Procedia PDF Downloads 228
24541 Ranking All of the Efficient DMUs in DEA

Authors: Elahe Sarfi, Esmat Noroozi, Farhad Hosseinzadeh Lotfi

Abstract:

One of the important issues in Data Envelopment Analysis is the ranking of Decision Making Units. In this paper, a method for ranking DMUs is presented through which the weights related to efficient units should be chosen in a way that the other units preserve a certain percentage of their efficiency with the mentioned weights. To this end, a model is presented for ranking DMUs on the base of their superefficiency by considering the mentioned restrictions related to weights. This percentage can be determined by decision Maker. If the specific percentage is unsuitable, we can find a suitable and feasible one for ranking DMUs accordingly. Furthermore, the presented model is capable of ranking all of the efficient units including nonextreme efficient ones. Finally, the presented models are utilized for two sets of data and related results are reported.

Keywords: data envelopment analysis, efficiency, ranking, weight

Procedia PDF Downloads 426
24540 A Neural Network Modelling Approach for Predicting Permeability from Well Logs Data

Authors: Chico Horacio Jose Sambo

Abstract:

Recently neural network has gained popularity when come to solve complex nonlinear problems. Permeability is one of fundamental reservoir characteristics system that are anisotropic distributed and non-linear manner. For this reason, permeability prediction from well log data is well suited by using neural networks and other computer-based techniques. The main goal of this paper is to predict reservoir permeability from well logs data by using neural network approach. A multi-layered perceptron trained by back propagation algorithm was used to build the predictive model. The performance of the model on net results was measured by correlation coefficient. The correlation coefficient from testing, training, validation and all data sets was evaluated. The results show that neural network was capable of reproducing permeability with accuracy in all cases, so that the calculated correlation coefficients for training, testing and validation permeability were 0.96273, 0.89991 and 0.87858, respectively. The generalization of the results to other field can be made after examining new data, and a regional study might be possible to study reservoir properties with cheap and very fast constructed models.

Keywords: neural network, permeability, multilayer perceptron, well log

Procedia PDF Downloads 363
24539 Empowering a New Frontier in Heart Disease Detection: Unleashing Quantum Machine Learning

Authors: Sadia Nasrin Tisha, Mushfika Sharmin Rahman, Javier Orduz

Abstract:

Machine learning is applied in a variety of fields throughout the world. The healthcare sector has benefited enormously from it. One of the most effective approaches for predicting human heart diseases is to use machine learning applications to classify data and predict the outcome as a classification. However, with the rapid advancement of quantum technology, quantum computing has emerged as a potential game-changer for many applications. Quantum algorithms have the potential to execute substantially faster than their classical equivalents, which can lead to significant improvements in computational performance and efficiency. In this study, we applied quantum machine learning concepts to predict coronary heart diseases from text data. We experimented thrice with three different features; and three feature sets. The data set consisted of 100 data points. We pursue to do a comparative analysis of the two approaches, highlighting the potential benefits of quantum machine learning for predicting heart diseases.

Keywords: quantum machine learning, SVM, QSVM, matrix product state

Procedia PDF Downloads 55
24538 Optimizing Electric Vehicle Charging with Charging Data Analytics

Authors: Tayyibah Khanam, Mohammad Saad Alam, Sanchari Deb, Yasser Rafat

Abstract:

Electric vehicles are considered as viable replacements to gasoline cars since they help in reducing harmful emissions and stimulate power generation through renewable energy sources, hence contributing to sustainability. However, one of the significant obstacles in the mass deployment of electric vehicles is the charging time anxiety among users and, thus, the subsequent large waiting times for available chargers at charging stations. Data analytics, on the other hand, has revolutionized the decision-making tasks of management and operating systems since its arrival. In this paper, we attempt to optimize the choice of EV charging stations for users in their vicinity by minimizing the time taken to reach the charging stations and the waiting times for available chargers. Time taken to travel to the charging station is calculated by the Google Maps API and the waiting times are predicted by polynomial regression of the historical data stored. The proposed framework utilizes real-time data and historical data from all operating charging stations in the city and assists the user in finding the best suitable charging station for their current situation and can be implemented in a mobile phone application. The algorithm successfully predicts the most optimal choice of a charging station and the minimum required time for various sample data sets.

Keywords: charging data, electric vehicles, machine learning, waiting times

Procedia PDF Downloads 148
24537 A Probabilistic View of the Spatial Pooler in Hierarchical Temporal Memory

Authors: Mackenzie Leake, Liyu Xia, Kamil Rocki, Wayne Imaino

Abstract:

In the Hierarchical Temporal Memory (HTM) paradigm the effect of overlap between inputs on the activation of columns in the spatial pooler is studied. Numerical results suggest that similar inputs are represented by similar sets of columns and dissimilar inputs are represented by dissimilar sets of columns. It is shown that the spatial pooler produces these results under certain conditions for the connectivity and proximal thresholds. Following the discussion of the initialization of parameters for the thresholds, corresponding qualitative arguments about the learning dynamics of the spatial pooler are discussed.

Keywords: hierarchical temporal memory, HTM, learning algorithms, machine learning, spatial pooler

Procedia PDF Downloads 312
24536 The Phenomena of False Cognates and Deceptive Cognates: Issues to Foreign Language Learning and Teaching Methodology Based on Set Theory

Authors: Marilei Amadeu Sabino

Abstract:

The aim of this study is to establish differences between the terms ‘false cognates’, ‘false friends’ and ‘deceptive cognates’, usually considered to be synonyms. It will be shown they are not synonyms, since they do not designate the same linguistic process or phenomenon. Despite their differences in meaning, many pairs of formally similar words in two (or more) different languages are true cognates, although they are usually known as ‘false’ cognates – such as, for instance, the English and Italian lexical items ‘assist x assistere’; ‘attend x attendere’; ‘argument x argomento’; ‘apology x apologia’; ‘camera x camera’; ‘cucumber x cocomero’; ‘fabric x fabbrica’; ‘factory x fattoria’; ‘firm x firma’; ‘journal x giornale’; ‘library x libreria’; ‘magazine x magazzino’; ‘parent x parente’; ‘preservative x preservativo’; ‘pretend x pretendere’; ‘vacancy x vacanza’, to name but a few examples. Thus, one of the theoretical objectives of this paper is firstly to elaborate definitions establishing a distinction between the words that are definitely ‘false cognates’ (derived from different etyma) and those that are just ‘deceptive cognates’ (derived from the same etymon). Secondly, based on Set Theory and on the concepts of equal sets, subsets, intersection of sets and disjoint sets, this study is intended to elaborate some theoretical and practical questions that will be useful in identifying more precisely similarities and differences between cognate words of different languages, and according to graphic interpretation of sets it will be possible to classify them and provide discernment about the processes of semantic changes. Therefore, these issues might be helpful not only to the Learning of Second and Foreign Languages, but they could also give insights into Foreign and Second Language Teaching Methodology. Acknowledgements: FAPESP – São Paulo State Research Support Foundation – the financial support offered (proc. n° 2017/02064-7).

Keywords: deceptive cognates, false cognates, foreign language learning, teaching methodology

Procedia PDF Downloads 305
24535 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm

Procedia PDF Downloads 301
24534 DEMs: A Multivariate Comparison Approach

Authors: Juan Francisco Reinoso Gordo, Francisco Javier Ariza-López, José Rodríguez Avi, Domingo Barrera Rosillo

Abstract:

The evaluation of the quality of a data product is based on the comparison of the product with a reference of greater accuracy. In the case of MDE data products, quality assessment usually focuses on positional accuracy and few studies consider other terrain characteristics, such as slope and orientation. The proposal that is made consists of evaluating the similarity of two DEMs (a product and a reference), through the joint analysis of the distribution functions of the variables of interest, for example, elevations, slopes and orientations. This is a multivariable approach that focuses on distribution functions, not on single parameters such as mean values or dispersions (e.g. root mean squared error or variance). This is considered to be a more holistic approach. The use of the Kolmogorov-Smirnov test is proposed due to its non-parametric nature, since the distributions of the variables of interest cannot always be adequately modeled by parametric models (e.g. the Normal distribution model). In addition, its application to the multivariate case is carried out jointly by means of a single test on the convolution of the distribution functions of the variables considered, which avoids the use of corrections such as Bonferroni when several statistics hypothesis tests are carried out together. In this work, two DEM products have been considered, DEM02 with a resolution of 2x2 meters and DEM05 with a resolution of 5x5 meters, both generated by the National Geographic Institute of Spain. DEM02 is considered as the reference and DEM05 as the product to be evaluated. In addition, the slope and aspect derived models have been calculated by GIS operations on the two DEM datasets. Through sample simulation processes, the adequate behavior of the Kolmogorov-Smirnov statistical test has been verified when the null hypothesis is true, which allows calibrating the value of the statistic for the desired significance value (e.g. 5%). Once the process has been calibrated, the same process can be applied to compare the similarity of different DEM data sets (e.g. the DEM05 versus the DEM02). In summary, an innovative alternative for the comparison of DEM data sets based on a multinomial non-parametric perspective has been proposed by means of a single Kolmogorov-Smirnov test. This new approach could be extended to other DEM features of interest (e.g. curvature, etc.) and to more than three variables

Keywords: data quality, DEM, kolmogorov-smirnov test, multivariate DEM comparison

Procedia PDF Downloads 87
24533 Diagnosis of the Heart Rhythm Disorders by Using Hybrid Classifiers

Authors: Sule Yucelbas, Gulay Tezel, Cuneyt Yucelbas, Seral Ozsen

Abstract:

In this study, it was tried to identify some heart rhythm disorders by electrocardiography (ECG) data that is taken from MIT-BIH arrhythmia database by subtracting the required features, presenting to artificial neural networks (ANN), artificial immune systems (AIS), artificial neural network based on artificial immune system (AIS-ANN) and particle swarm optimization based artificial neural network (PSO-NN) classifier systems. The main purpose of this study is to evaluate the performance of hybrid AIS-ANN and PSO-ANN classifiers with regard to the ANN and AIS. For this purpose, the normal sinus rhythm (NSR), atrial premature contraction (APC), sinus arrhythmia (SA), ventricular trigeminy (VTI), ventricular tachycardia (VTK) and atrial fibrillation (AF) data for each of the RR intervals were found. Then these data in the form of pairs (NSR-APC, NSR-SA, NSR-VTI, NSR-VTK and NSR-AF) is created by combining discrete wavelet transform which is applied to each of these two groups of data and two different data sets with 9 and 27 features were obtained from each of them after data reduction. Afterwards, the data randomly was firstly mixed within themselves, and then 4-fold cross validation method was applied to create the training and testing data. The training and testing accuracy rates and training time are compared with each other. As a result, performances of the hybrid classification systems, AIS-ANN and PSO-ANN were seen to be close to the performance of the ANN system. Also, the results of the hybrid systems were much better than AIS, too. However, ANN had much shorter period of training time than other systems. In terms of training times, ANN was followed by PSO-ANN, AIS-ANN and AIS systems respectively. Also, the features that extracted from the data affected the classification results significantly.

Keywords: AIS, ANN, ECG, hybrid classifiers, PSO

Procedia PDF Downloads 407
24532 An Ab Initio Molecular Orbital Theory and Density Functional Theory Study of Fluorous 1,3-Dion Compounds

Authors: S. Ghammamy, M. Mirzaabdollahiha

Abstract:

Quantum mechanical calculations of energies, geometries, and vibrational wavenumbers of fluorous 1,3-dion compounds are carried out using density functional theory (DFT/B3LYP) method with LANL2DZ basis sets. The calculated HOMO and LUMO energies show that charge transfer occurs in the molecules. The thermodynamic functions of fluorous 1,3-dion compounds have been performed at B3LYP/LANL2DZ basis sets. The theoretical spectrograms for F NMR spectra of fluorous 1,3-dion compounds have also been constructed. The F NMR nuclear shieldings of fluoride ligands in fluorous 1,3-dion compounds have been studied quantum chemical.

Keywords: density function theory, natural bond orbital, HOMO, LOMO, fluorous

Procedia PDF Downloads 359
24531 Efficiency, Effectiveness, and Technological Change in Armed Forces: Indonesian Case

Authors: Citra Pertiwi, Muhammad Fikruzzaman Rahawarin

Abstract:

Government of Indonesia had committed to increasing its national defense the budget up to 1,5 percent of GDP. However, the budget increase does not necessarily allocate efficiently and effectively. Using Data Envelopment Analysis (DEA), the operational units of Indonesian Armed Forces are considered as a proxy to measure those two aspects. The bootstrap technique is being used as well to reduce uncertainty in the estimation. Additionally, technological change is being measured as a nonstationary component. Nearly half of the units are being estimated as fully efficient, with less than a third is considered as effective. Longer and larger sets of data might increase the robustness of the estimation in the future.

Keywords: bootstrap, effectiveness, efficiency, DEA, military, Malmquist, technological change

Procedia PDF Downloads 276
24530 SQL Generator Based on MVC Pattern

Authors: Chanchai Supaartagorn

Abstract:

Structured Query Language (SQL) is the standard de facto language to access and manipulate data in a relational database. Although SQL is a language that is simple and powerful, most novice users will have trouble with SQL syntax. Thus, we are presenting SQL generator tool which is capable of translating actions and displaying SQL commands and data sets simultaneously. The tool was developed based on Model-View-Controller (MVC) pattern. The MVC pattern is a widely used software design pattern that enforces the separation between the input, processing, and output of an application. Developers take full advantage of it to reduce the complexity in architectural design and to increase flexibility and reuse of code. In addition, we use White-Box testing for the code verification in the Model module.

Keywords: MVC, relational database, SQL, White-Box testing

Procedia PDF Downloads 397
24529 An Exploratory Investigation into the Quality of Life of People with Multi-Drug Resistant Pulmonary Tuberculosis (MDR-PTB) Using the ICF Core Sets: A Preliminary Investigation

Authors: Shamila Manie, Soraya Maart, Ayesha Osman

Abstract:

Introduction: People diagnosed with multidrug resistant pulmonary tuberculosis (MDR-PTB) is subjected to prolonged hospitalization in South Africa. It has thus become essential for research to shift its focus from a purely medical approach, but to include social and environmental factors when looking at the impact of the disease on those affected. Aim: To explore the factors affecting individuals with multi-drug resistant pulmonary tuberculosis during long-term hospitalization using the comprehensive ICF core-sets for obstructive pulmonary disease (OPD) and cardiopulmonary (CPR) conditions at Brooklyn Chest Hospital (BCH). Methods: A quantitative descriptive, cross-sectional study design was utilized. A convenient sample of 19 adults at Brooklyn Chest Hospital were interviewed. Results: Most participants reported a decrease in exercise tolerance levels (b455: n=11). However it did not limit participation. Participants reported that a lack of privacy in the environment (e155) was a barrier to health. The presence of health professionals (e355) and the provision of skills development services (e585) are facilitators to health and well-being. No differences exist in the functional ability of HIV positive and negative participants in this sample. Conclusion: The ICF Core Sets appeared valid in identifying the barriers and facilitators experienced by individuals with MDR-PTB admitted to BCH. The hospital environment must be improved to add to the QoL of those admitted, especially improving privacy within the wards. Although the social grant is seen as a facilitator, greater emphasis must be placed on preparing individuals to be economically active in the labour for when they are discharged.

Keywords: multidrug resistant tuberculosis, MDR ICF core sets, health-related quality of life (HRQoL), hospitalization

Procedia PDF Downloads 315
24528 An Adaptive Oversampling Technique for Imbalanced Datasets

Authors: Shaukat Ali Shahee, Usha Ananthakumar

Abstract:

A data set exhibits class imbalance problem when one class has very few examples compared to the other class, and this is also referred to as between class imbalance. The traditional classifiers fail to classify the minority class examples correctly due to its bias towards the majority class. Apart from between-class imbalance, imbalance within classes where classes are composed of a different number of sub-clusters with these sub-clusters containing different number of examples also deteriorates the performance of the classifier. Previously, many methods have been proposed for handling imbalanced dataset problem. These methods can be classified into four categories: data preprocessing, algorithmic based, cost-based methods and ensemble of classifier. Data preprocessing techniques have shown great potential as they attempt to improve data distribution rather than the classifier. Data preprocessing technique handles class imbalance either by increasing the minority class examples or by decreasing the majority class examples. Decreasing the majority class examples lead to loss of information and also when minority class has an absolute rarity, removing the majority class examples is generally not recommended. Existing methods available for handling class imbalance do not address both between-class imbalance and within-class imbalance simultaneously. In this paper, we propose a method that handles between class imbalance and within class imbalance simultaneously for binary classification problem. Removing between class imbalance and within class imbalance simultaneously eliminates the biases of the classifier towards bigger sub-clusters by minimizing the error domination of bigger sub-clusters in total error. The proposed method uses model-based clustering to find the presence of sub-clusters or sub-concepts in the dataset. The number of examples oversampled among the sub-clusters is determined based on the complexity of sub-clusters. The method also takes into consideration the scatter of the data in the feature space and also adaptively copes up with unseen test data using Lowner-John ellipsoid for increasing the accuracy of the classifier. In this study, neural network is being used as this is one such classifier where the total error is minimized and removing the between-class imbalance and within class imbalance simultaneously help the classifier in giving equal weight to all the sub-clusters irrespective of the classes. The proposed method is validated on 9 publicly available data sets and compared with three existing oversampling techniques that rely on the spatial location of minority class examples in the euclidean feature space. The experimental results show the proposed method to be statistically significantly superior to other methods in terms of various accuracy measures. Thus the proposed method can serve as a good alternative to handle various problem domains like credit scoring, customer churn prediction, financial distress, etc., that typically involve imbalanced data sets.

Keywords: classification, imbalanced dataset, Lowner-John ellipsoid, model based clustering, oversampling

Procedia PDF Downloads 388
24527 Anisotropic Total Fractional Order Variation Model in Seismic Data Denoising

Authors: Jianwei Ma, Diriba Gemechu

Abstract:

In seismic data processing, attenuation of random noise is the basic step to improve quality of data for further application of seismic data in exploration and development in different gas and oil industries. The signal-to-noise ratio of the data also highly determines quality of seismic data. This factor affects the reliability as well as the accuracy of seismic signal during interpretation for different purposes in different companies. To use seismic data for further application and interpretation, we need to improve the signal-to-noise ration while attenuating random noise effectively. To improve the signal-to-noise ration and attenuating seismic random noise by preserving important features and information about seismic signals, we introduce the concept of anisotropic total fractional order denoising algorithm. The anisotropic total fractional order variation model defined in fractional order bounded variation is proposed as a regularization in seismic denoising. The split Bregman algorithm is employed to solve the minimization problem of the anisotropic total fractional order variation model and the corresponding denoising algorithm for the proposed method is derived. We test the effectiveness of theproposed method for synthetic and real seismic data sets and the denoised result is compared with F-X deconvolution and non-local means denoising algorithm.

Keywords: anisotropic total fractional order variation, fractional order bounded variation, seismic random noise attenuation, split Bregman algorithm

Procedia PDF Downloads 183
24526 EDM for Prediction of Academic Trends and Patterns

Authors: Trupti Diwan

Abstract:

Predicting student failure at school has changed into a difficult challenge due to both the large number of factors that can affect the reduced performance of students and the imbalanced nature of these kinds of data sets. This paper surveys the two elements needed to make prediction on Students’ Academic Performances which are parameters and methods. This paper also proposes a framework for predicting the performance of engineering students. Genetic programming can be used to predict student failure/success. Ranking algorithm is used to rank students according to their credit points. The framework can be used as a basis for the system implementation & prediction of students’ Academic Performance in Higher Learning Institute.

Keywords: classification, educational data mining, student failure, grammar-based genetic programming

Procedia PDF Downloads 397
24525 A Bivariate Inverse Generalized Exponential Distribution and Its Applications in Dependent Competing Risks Model

Authors: Fatemah A. Alqallaf, Debasis Kundu

Abstract:

The aim of this paper is to introduce a bivariate inverse generalized exponential distribution which has a singular component. The proposed bivariate distribution can be used when the marginals have heavy-tailed distributions, and they have non-monotone hazard functions. Due to the presence of the singular component, it can be used quite effectively when there are ties in the data. Since it has four parameters, it is a very flexible bivariate distribution, and it can be used quite effectively for analyzing various bivariate data sets. Several dependency properties and dependency measures have been obtained. The maximum likelihood estimators cannot be obtained in closed form, and it involves solving a four-dimensional optimization problem. To avoid that, we have proposed to use an EM algorithm, and it involves solving only one non-linear equation at each `E'-step. Hence, the implementation of the proposed EM algorithm is very straight forward in practice. Extensive simulation experiments and the analysis of one data set have been performed. We have observed that the proposed bivariate inverse generalized exponential distribution can be used for modeling dependent competing risks data. One data set has been analyzed to show the effectiveness of the proposed model.

Keywords: Block and Basu bivariate distributions, competing risks, EM algorithm, Marshall-Olkin bivariate exponential distribution, maximum likelihood estimators

Procedia PDF Downloads 108
24524 Comparison of Different k-NN Models for Speed Prediction in an Urban Traffic Network

Authors: Seyoung Kim, Jeongmin Kim, Kwang Ryel Ryu

Abstract:

A database that records average traffic speeds measured at five-minute intervals for all the links in the traffic network of a metropolitan city. While learning from this data the models that can predict future traffic speed would be beneficial for the applications such as the car navigation system, building predictive models for every link becomes a nontrivial job if the number of links in a given network is huge. An advantage of adopting k-nearest neighbor (k-NN) as predictive models is that it does not require any explicit model building. Instead, k-NN takes a long time to make a prediction because it needs to search for the k-nearest neighbors in the database at prediction time. In this paper, we investigate how much we can speed up k-NN in making traffic speed predictions by reducing the amount of data to be searched for without a significant sacrifice of prediction accuracy. The rationale behind this is that we had a better look at only the recent data because the traffic patterns not only repeat daily or weekly but also change over time. In our experiments, we build several different k-NN models employing different sets of features which are the current and past traffic speeds of the target link and the neighbor links in its up/down-stream. The performances of these models are compared by measuring the average prediction accuracy and the average time taken to make a prediction using various amounts of data.

Keywords: big data, k-NN, machine learning, traffic speed prediction

Procedia PDF Downloads 325
24523 A Comprehensive Methodology for Voice Segmentation of Large Sets of Speech Files Recorded in Naturalistic Environments

Authors: Ana Londral, Burcu Demiray, Marcus Cheetham

Abstract:

Speech recording is a methodology used in many different studies related to cognitive and behaviour research. Modern advances in digital equipment brought the possibility of continuously recording hours of speech in naturalistic environments and building rich sets of sound files. Speech analysis can then extract from these files multiple features for different scopes of research in Language and Communication. However, tools for analysing a large set of sound files and automatically extract relevant features from these files are often inaccessible to researchers that are not familiar with programming languages. Manual analysis is a common alternative, with a high time and efficiency cost. In the analysis of long sound files, the first step is the voice segmentation, i.e. to detect and label segments containing speech. We present a comprehensive methodology aiming to support researchers on voice segmentation, as the first step for data analysis of a big set of sound files. Praat, an open source software, is suggested as a tool to run a voice detection algorithm, label segments and files and extract other quantitative features on a structure of folders containing a large number of sound files. We present the validation of our methodology with a set of 5000 sound files that were collected in the daily life of a group of voluntary participants with age over 65. A smartphone device was used to collect sound using the Electronically Activated Recorder (EAR): an app programmed to record 30-second sound samples that were randomly distributed throughout the day. Results demonstrated that automatic segmentation and labelling of files containing speech segments was 74% faster when compared to a manual analysis performed with two independent coders. Furthermore, the methodology presented allows manual adjustments of voiced segments with visualisation of the sound signal and the automatic extraction of quantitative information on speech. In conclusion, we propose a comprehensive methodology for voice segmentation, to be used by researchers that have to work with large sets of sound files and are not familiar with programming tools.

Keywords: automatic speech analysis, behavior analysis, naturalistic environments, voice segmentation

Procedia PDF Downloads 257
24522 Stability Assessment of Chamshir Dam Based on DEM, South West Zagros

Authors: Rezvan Khavari

Abstract:

The Zagros fold-thrust belt in SW Iran is a part of the Alpine-Himalayan system which consists of a variety of structures with different sizes or geometries. The study area is Chamshir Dam, which is located on the Zohreh River, 20 km southeast of Gachsaran City (southwest Iran). The satellite images are valuable means available to geologists for locating geological or geomorphological features expressing regional fault or fracture systems, therefore, the satellite images were used for structural analysis of the Chamshir dam area. As well, using the DEM and geological maps, 3D Models of the area have been constructed. Then, based on these models, all the acquired fracture traces data were integrated in Geographic Information System (GIS) environment by using Arc GIS software. Based on field investigation and DEM model, main structures in the area consist of Cham Shir syncline and two fault sets, the main thrust faults with NW-SE direction and small normal faults in NE-SW direction. There are three joint sets in the study area, both of them (J1 and J3) are the main large fractures around the Chamshir dam. These fractures indeed consist with the normal faults in NE-SW direction. The third joint set in NW-SE is normal to the others. In general, according to topography, geomorphology and structural geology evidences, Chamshir dam has a potential for sliding in some parts of Gachsaran formation.

Keywords: DEM, chamshir dam, zohreh river, satellite images

Procedia PDF Downloads 458