Search results for: small data sets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27988

Search results for: small data sets

27718 DEMs: A Multivariate Comparison Approach

Authors: Juan Francisco Reinoso Gordo, Francisco Javier Ariza-López, José Rodríguez Avi, Domingo Barrera Rosillo

Abstract:

The evaluation of the quality of a data product is based on the comparison of the product with a reference of greater accuracy. In the case of MDE data products, quality assessment usually focuses on positional accuracy and few studies consider other terrain characteristics, such as slope and orientation. The proposal that is made consists of evaluating the similarity of two DEMs (a product and a reference), through the joint analysis of the distribution functions of the variables of interest, for example, elevations, slopes and orientations. This is a multivariable approach that focuses on distribution functions, not on single parameters such as mean values or dispersions (e.g. root mean squared error or variance). This is considered to be a more holistic approach. The use of the Kolmogorov-Smirnov test is proposed due to its non-parametric nature, since the distributions of the variables of interest cannot always be adequately modeled by parametric models (e.g. the Normal distribution model). In addition, its application to the multivariate case is carried out jointly by means of a single test on the convolution of the distribution functions of the variables considered, which avoids the use of corrections such as Bonferroni when several statistics hypothesis tests are carried out together. In this work, two DEM products have been considered, DEM02 with a resolution of 2x2 meters and DEM05 with a resolution of 5x5 meters, both generated by the National Geographic Institute of Spain. DEM02 is considered as the reference and DEM05 as the product to be evaluated. In addition, the slope and aspect derived models have been calculated by GIS operations on the two DEM datasets. Through sample simulation processes, the adequate behavior of the Kolmogorov-Smirnov statistical test has been verified when the null hypothesis is true, which allows calibrating the value of the statistic for the desired significance value (e.g. 5%). Once the process has been calibrated, the same process can be applied to compare the similarity of different DEM data sets (e.g. the DEM05 versus the DEM02). In summary, an innovative alternative for the comparison of DEM data sets based on a multinomial non-parametric perspective has been proposed by means of a single Kolmogorov-Smirnov test. This new approach could be extended to other DEM features of interest (e.g. curvature, etc.) and to more than three variables

Keywords: data quality, DEM, kolmogorov-smirnov test, multivariate DEM comparison

Procedia PDF Downloads 88
27717 Adaptation Mechanism and Planning Response to Resiliency Shrinking of Small Towns Based on Complex Adaptive System by Taking Wuhan as an Example

Authors: Yanqun Li, Hong Geng

Abstract:

The rapid urbanization process taking big cities as the main body leads to the unequal configuration of urban and rural areas in the aspects of land supply, industrial division of labor, service supply and space allocation, and induces the shrinking characterization of service energy, industrial system and population vitality in small towns. As an important spatial unit in the spectrum of urbanization that serves, connects and couples urban and rural areas, the shrinking phenomenon faced by small towns has an important influence on the healthy development of urbanization. Based on the census of small towns in Wuhan metropolitan area, we have found that the shrinking of small towns is a passive contraction of elastic tension under the squeeze in cities. Once affected by the external forces such as policy regulation, planning guidance, and population return, small towns will achieve expansion and growth. Based on the theory of complex adaptive systems, this paper comprehensively constructs the development index evaluation system of small towns from five aspects of population, economy, space, society and ecology, measures the shrinking level of small towns, further analyzes the shrinking characteristics of small towns, and identifies whether the shrinking is elastic or not. And then this paper measures the resilience ability index of small town contract from the above-mentioned five aspects. Finally, this paper proposes an adaptive mechanism of urban-rural interaction evolution under fine division of labor to response the passive shrinking in small towns of Wuhan. Based on the above, the paper creatively puts forward the planning response measures of the small towns on the aspects of spatial layout, function orientation and service support, which can provide reference for other regions.

Keywords: complex adaptive systems, resiliency shrinking, adaptation mechanism, planning response

Procedia PDF Downloads 89
27716 Diagnosis of the Heart Rhythm Disorders by Using Hybrid Classifiers

Authors: Sule Yucelbas, Gulay Tezel, Cuneyt Yucelbas, Seral Ozsen

Abstract:

In this study, it was tried to identify some heart rhythm disorders by electrocardiography (ECG) data that is taken from MIT-BIH arrhythmia database by subtracting the required features, presenting to artificial neural networks (ANN), artificial immune systems (AIS), artificial neural network based on artificial immune system (AIS-ANN) and particle swarm optimization based artificial neural network (PSO-NN) classifier systems. The main purpose of this study is to evaluate the performance of hybrid AIS-ANN and PSO-ANN classifiers with regard to the ANN and AIS. For this purpose, the normal sinus rhythm (NSR), atrial premature contraction (APC), sinus arrhythmia (SA), ventricular trigeminy (VTI), ventricular tachycardia (VTK) and atrial fibrillation (AF) data for each of the RR intervals were found. Then these data in the form of pairs (NSR-APC, NSR-SA, NSR-VTI, NSR-VTK and NSR-AF) is created by combining discrete wavelet transform which is applied to each of these two groups of data and two different data sets with 9 and 27 features were obtained from each of them after data reduction. Afterwards, the data randomly was firstly mixed within themselves, and then 4-fold cross validation method was applied to create the training and testing data. The training and testing accuracy rates and training time are compared with each other. As a result, performances of the hybrid classification systems, AIS-ANN and PSO-ANN were seen to be close to the performance of the ANN system. Also, the results of the hybrid systems were much better than AIS, too. However, ANN had much shorter period of training time than other systems. In terms of training times, ANN was followed by PSO-ANN, AIS-ANN and AIS systems respectively. Also, the features that extracted from the data affected the classification results significantly.

Keywords: AIS, ANN, ECG, hybrid classifiers, PSO

Procedia PDF Downloads 410
27715 Transcriptional Profiling of Developing Ovules in Litchi chinensis

Authors: Ashish Kumar Pathak, Ritika Sharma, Vishal Nath, Sudhir Pratap Singh, Rakesh Tuli

Abstract:

Litchi is a sub-tropical fruit crop with genotypes bearing delicious juicy fruits with variable seed size (bold to rudimentary size). Small seed size is a desirable trait in litchi, as it increases consumer acceptance and fruit processing. The biochemical activities in mid- stage ovules (e.g. 16, 20, 24 and 28 days after anthesis) determine the fate of seed and fruit development in litchi. Comprehensive ovule-specific transcriptome analysis was performed in two litchi genotypes with contrasting seed size to gain molecular insight on determinants of seed fates in litchi fruits. The transcriptomic data was de-novo assembled in 1,39,608 trinity transcripts, out of which 6,325 trinity transcripts were differentially expressed between the two contrasting genotypes. Differential transcriptional pattern was found among ovule development stages in contrasting litchi genotypes. The putative genes for salicylic acid, jasmonic acid and brassinosteroid pathway were down-regulated in ovules of small-seeded litchi. Embryogenesis, cell expansion, seed size and stress related trinity transcripts exhibited altered expression in small-seeded genotype. The putative regulators of seed maturation and seed storage were down-regulated in small-seed genotype.

Keywords: Litchi, seed, transcriptome, defence

Procedia PDF Downloads 211
27714 An Ab Initio Molecular Orbital Theory and Density Functional Theory Study of Fluorous 1,3-Dion Compounds

Authors: S. Ghammamy, M. Mirzaabdollahiha

Abstract:

Quantum mechanical calculations of energies, geometries, and vibrational wavenumbers of fluorous 1,3-dion compounds are carried out using density functional theory (DFT/B3LYP) method with LANL2DZ basis sets. The calculated HOMO and LUMO energies show that charge transfer occurs in the molecules. The thermodynamic functions of fluorous 1,3-dion compounds have been performed at B3LYP/LANL2DZ basis sets. The theoretical spectrograms for F NMR spectra of fluorous 1,3-dion compounds have also been constructed. The F NMR nuclear shieldings of fluoride ligands in fluorous 1,3-dion compounds have been studied quantum chemical.

Keywords: density function theory, natural bond orbital, HOMO, LOMO, fluorous

Procedia PDF Downloads 360
27713 Efficiency, Effectiveness, and Technological Change in Armed Forces: Indonesian Case

Authors: Citra Pertiwi, Muhammad Fikruzzaman Rahawarin

Abstract:

Government of Indonesia had committed to increasing its national defense the budget up to 1,5 percent of GDP. However, the budget increase does not necessarily allocate efficiently and effectively. Using Data Envelopment Analysis (DEA), the operational units of Indonesian Armed Forces are considered as a proxy to measure those two aspects. The bootstrap technique is being used as well to reduce uncertainty in the estimation. Additionally, technological change is being measured as a nonstationary component. Nearly half of the units are being estimated as fully efficient, with less than a third is considered as effective. Longer and larger sets of data might increase the robustness of the estimation in the future.

Keywords: bootstrap, effectiveness, efficiency, DEA, military, Malmquist, technological change

Procedia PDF Downloads 278
27712 Problems and Needs Help of Frozen Shrimp Industry Small and Medium in the Central Region of the Lower Three Provinces

Authors: P. Thepnarintra

Abstract:

Frozen shrimp industry plays an important role in the development of production industry of the country. There has been a continuing development to response the increasing demand; however, there have been some problems in running the enterprises. The purposes of this study are to: 1) investigate problems related to basic factors in operating frozen shrimp industry based on the entrepreneurs’ points of view. The enterprises involved in this study were small and medium industry receiving Thai Frozen Foods Association. 2) Compare the problems of the frozen shrimp industry according to their sizes of operation in 3 provinces of the central region Thailand. Population in this study consisted of 148 managers from 148 frozen shrimp enterprises Thai Frozen Foods Association, of which 77 were small size and 71 were medium size. The data were analyzed to find percentage, arithmetic mean, standard deviation, and independent sample T-test with the significant hypothesis at .05. The results revealed that the problems of the frozen shrimp industries of both size were in high level. The needs for government supporting were in high level. The comparison of the problems and the basic factors between the small and medium size enterprises showed no statistically significant level. The problems that they mentioned included raw materials, labors, production, marketing, and the need for academic supporting from the government sector.

Keywords: frozen shrimp industry, problems, related to the enterprise, operation

Procedia PDF Downloads 518
27711 Predicting Personality and Psychological Distress Using Natural Language Processing

Authors: Jihee Jang, Seowon Yoon, Gaeun Son, Minjung Kang, Joon Yeon Choeh, Kee-Hong Choi

Abstract:

Background: Self-report multiple choice questionnaires have been widely utilized to quantitatively measure one’s personality and psychological constructs. Despite several strengths (e.g., brevity and utility), self-report multiple-choice questionnaires have considerable limitations in nature. With the rise of machine learning (ML) and Natural language processing (NLP), researchers in the field of psychology are widely adopting NLP to assess psychological constructs to predict human behaviors. However, there is a lack of connections between the work being performed in computer science and that psychology due to small data sets and unvalidated modeling practices. Aims: The current article introduces the study method and procedure of phase II, which includes the interview questions for the five-factor model (FFM) of personality developed in phase I. This study aims to develop the interview (semi-structured) and open-ended questions for the FFM-based personality assessments, specifically designed with experts in the field of clinical and personality psychology (phase 1), and to collect the personality-related text data using the interview questions and self-report measures on personality and psychological distress (phase 2). The purpose of the study includes examining the relationship between natural language data obtained from the interview questions, measuring the FFM personality constructs, and psychological distress to demonstrate the validity of the natural language-based personality prediction. Methods: The phase I (pilot) study was conducted on fifty-nine native Korean adults to acquire the personality-related text data from the interview (semi-structured) and open-ended questions based on the FFM of personality. The interview questions were revised and finalized with the feedback from the external expert committee, consisting of personality and clinical psychologists. Based on the established interview questions, a total of 425 Korean adults were recruited using a convenience sampling method via an online survey. The text data collected from interviews were analyzed using natural language processing. The results of the online survey, including demographic data, depression, anxiety, and personality inventories, were analyzed together in the model to predict individuals’ FFM of personality and the level of psychological distress (phase 2).

Keywords: personality prediction, psychological distress prediction, natural language processing, machine learning, the five-factor model of personality

Procedia PDF Downloads 54
27710 SQL Generator Based on MVC Pattern

Authors: Chanchai Supaartagorn

Abstract:

Structured Query Language (SQL) is the standard de facto language to access and manipulate data in a relational database. Although SQL is a language that is simple and powerful, most novice users will have trouble with SQL syntax. Thus, we are presenting SQL generator tool which is capable of translating actions and displaying SQL commands and data sets simultaneously. The tool was developed based on Model-View-Controller (MVC) pattern. The MVC pattern is a widely used software design pattern that enforces the separation between the input, processing, and output of an application. Developers take full advantage of it to reduce the complexity in architectural design and to increase flexibility and reuse of code. In addition, we use White-Box testing for the code verification in the Model module.

Keywords: MVC, relational database, SQL, White-Box testing

Procedia PDF Downloads 400
27709 Improving Similarity Search Using Clustered Data

Authors: Deokho Kim, Wonwoo Lee, Jaewoong Lee, Teresa Ng, Gun-Ill Lee, Jiwon Jeong

Abstract:

This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.

Keywords: visual search, deep learning, convolutional neural network, machine learning

Procedia PDF Downloads 188
27708 Retail Strategy to Reduce Waste Keeping High Profit Utilizing Taylor's Law in Point-of-Sales Data

Authors: Gen Sakoda, Hideki Takayasu, Misako Takayasu

Abstract:

Waste reduction is a fundamental problem for sustainability. Methods for waste reduction with point-of-sales (POS) data are proposed, utilizing the knowledge of a recent econophysics study on a statistical property of POS data. Concretely, the non-stationary time series analysis method based on the Particle Filter is developed, which considers abnormal fluctuation scaling known as Taylor's law. This method is extended for handling incomplete sales data because of stock-outs by introducing maximum likelihood estimation for censored data. The way for optimal stock determination with pricing the cost of waste reduction is also proposed. This study focuses on the examination of the methods for large sales numbers where Taylor's law is obvious. Numerical analysis using aggregated POS data shows the effectiveness of the methods to reduce food waste maintaining a high profit for large sales numbers. Moreover, the way of pricing the cost of waste reduction reveals that a small profit loss realizes substantial waste reduction, especially in the case that the proportionality constant  of Taylor’s law is small. Specifically, around 1% profit loss realizes half disposal at =0.12, which is the actual  value of processed food items used in this research. The methods provide practical and effective solutions for waste reduction keeping a high profit, especially with large sales numbers.

Keywords: food waste reduction, particle filter, point-of-sales, sustainable development goals, Taylor's law, time series analysis

Procedia PDF Downloads 104
27707 Culture of Manager of a Medium or Small Enterprises

Authors: Omar Bendjimaa, Karzabi Abdelatif

Abstract:

Small and medium enterprises have witnessed several developments in recent years thanks to the policies and programs of support given by the state, and that is due to their importance in local and national development. Nevertheless, the success and development of these firms depends on a number of factors, especially the human element, for instance, the culture of the manager has its origin in the culture of the community and is of crucial influence in these firms. In fact, this culture is nothing more than a set of values, perceptions, beliefs, symbols and practices repeated, in addition to the knowledge it has received from the readings and the modern means of education. All these factors have an impact on the effectiveness of governance, its resolutions, instructions and performance of its function as a manager of a medium or small enterprise is inevitably affected by these cultural values, it is the driving force, the leader, and the observer at the same time.

Keywords: small and medium enterprises, the culture of the manager, the culture of the community, values, perceptions, beliefs, symbols, performance

Procedia PDF Downloads 353
27706 Application of Single Subject Experimental Designs in Adapted Physical Activity Research: A Descriptive Analysis

Authors: Jiabei Zhang, Ying Qi

Abstract:

The purpose of this study was to develop a descriptive profile of the adapted physical activity research using single subject experimental designs. All research articles using single subject experimental designs published in the journal of Adapted Physical Activity Quarterly from 1984 to 2013 were employed as the data source. Each of the articles was coded in a subcategory of seven categories: (a) the size of sample; (b) the age of participants; (c) the type of disabilities; (d) the type of data analysis; (e) the type of designs, (f) the independent variable, and (g) the dependent variable. Frequencies, percentages, and trend inspection were used to analyze the data and develop a profile. The profile developed characterizes a small portion of research articles used single subject designs, in which most researchers used a small sample size, recruited children as subjects, emphasized learning and behavior impairments, selected visual inspection with descriptive statistics, preferred a multiple baseline design, focused on effects of therapy, inclusion, and strategy, and measured desired behaviors more often, with a decreasing trend over years.

Keywords: adapted physical activity research, single subject experimental designs, physical education, sport science

Procedia PDF Downloads 438
27705 An Exploratory Investigation into the Quality of Life of People with Multi-Drug Resistant Pulmonary Tuberculosis (MDR-PTB) Using the ICF Core Sets: A Preliminary Investigation

Authors: Shamila Manie, Soraya Maart, Ayesha Osman

Abstract:

Introduction: People diagnosed with multidrug resistant pulmonary tuberculosis (MDR-PTB) is subjected to prolonged hospitalization in South Africa. It has thus become essential for research to shift its focus from a purely medical approach, but to include social and environmental factors when looking at the impact of the disease on those affected. Aim: To explore the factors affecting individuals with multi-drug resistant pulmonary tuberculosis during long-term hospitalization using the comprehensive ICF core-sets for obstructive pulmonary disease (OPD) and cardiopulmonary (CPR) conditions at Brooklyn Chest Hospital (BCH). Methods: A quantitative descriptive, cross-sectional study design was utilized. A convenient sample of 19 adults at Brooklyn Chest Hospital were interviewed. Results: Most participants reported a decrease in exercise tolerance levels (b455: n=11). However it did not limit participation. Participants reported that a lack of privacy in the environment (e155) was a barrier to health. The presence of health professionals (e355) and the provision of skills development services (e585) are facilitators to health and well-being. No differences exist in the functional ability of HIV positive and negative participants in this sample. Conclusion: The ICF Core Sets appeared valid in identifying the barriers and facilitators experienced by individuals with MDR-PTB admitted to BCH. The hospital environment must be improved to add to the QoL of those admitted, especially improving privacy within the wards. Although the social grant is seen as a facilitator, greater emphasis must be placed on preparing individuals to be economically active in the labour for when they are discharged.

Keywords: multidrug resistant tuberculosis, MDR ICF core sets, health-related quality of life (HRQoL), hospitalization

Procedia PDF Downloads 316
27704 Impact of Stack Caches: Locality Awareness and Cost Effectiveness

Authors: Abdulrahman K. Alshegaifi, Chun-Hsi Huang

Abstract:

Treating data based on its location in memory has received much attention in recent years due to its different properties, which offer important aspects for cache utilization. Stack data and non-stack data may interfere with each other’s locality in the data cache. One of the important aspects of stack data is that it has high spatial and temporal locality. In this work, we simulate non-unified cache design that split data cache into stack and non-stack caches in order to maintain stack data and non-stack data separate in different caches. We observe that the overall hit rate of non-unified cache design is sensitive to the size of non-stack cache. Then, we investigate the appropriate size and associativity for stack cache to achieve high hit ratio especially when over 99% of accesses are directed to stack cache. The result shows that on average more than 99% of stack cache accuracy is achieved by using 2KB of capacity and 1-way associativity. Further, we analyze the improvement in hit rate when adding small, fixed, size of stack cache at level1 to unified cache architecture. The result shows that the overall hit rate of unified cache design with adding 1KB of stack cache is improved by approximately, on average, 3.9% for Rijndael benchmark. The stack cache is simulated by using SimpleScalar toolset.

Keywords: hit rate, locality of program, stack cache, stack data

Procedia PDF Downloads 277
27703 An Adaptive Oversampling Technique for Imbalanced Datasets

Authors: Shaukat Ali Shahee, Usha Ananthakumar

Abstract:

A data set exhibits class imbalance problem when one class has very few examples compared to the other class, and this is also referred to as between class imbalance. The traditional classifiers fail to classify the minority class examples correctly due to its bias towards the majority class. Apart from between-class imbalance, imbalance within classes where classes are composed of a different number of sub-clusters with these sub-clusters containing different number of examples also deteriorates the performance of the classifier. Previously, many methods have been proposed for handling imbalanced dataset problem. These methods can be classified into four categories: data preprocessing, algorithmic based, cost-based methods and ensemble of classifier. Data preprocessing techniques have shown great potential as they attempt to improve data distribution rather than the classifier. Data preprocessing technique handles class imbalance either by increasing the minority class examples or by decreasing the majority class examples. Decreasing the majority class examples lead to loss of information and also when minority class has an absolute rarity, removing the majority class examples is generally not recommended. Existing methods available for handling class imbalance do not address both between-class imbalance and within-class imbalance simultaneously. In this paper, we propose a method that handles between class imbalance and within class imbalance simultaneously for binary classification problem. Removing between class imbalance and within class imbalance simultaneously eliminates the biases of the classifier towards bigger sub-clusters by minimizing the error domination of bigger sub-clusters in total error. The proposed method uses model-based clustering to find the presence of sub-clusters or sub-concepts in the dataset. The number of examples oversampled among the sub-clusters is determined based on the complexity of sub-clusters. The method also takes into consideration the scatter of the data in the feature space and also adaptively copes up with unseen test data using Lowner-John ellipsoid for increasing the accuracy of the classifier. In this study, neural network is being used as this is one such classifier where the total error is minimized and removing the between-class imbalance and within class imbalance simultaneously help the classifier in giving equal weight to all the sub-clusters irrespective of the classes. The proposed method is validated on 9 publicly available data sets and compared with three existing oversampling techniques that rely on the spatial location of minority class examples in the euclidean feature space. The experimental results show the proposed method to be statistically significantly superior to other methods in terms of various accuracy measures. Thus the proposed method can serve as a good alternative to handle various problem domains like credit scoring, customer churn prediction, financial distress, etc., that typically involve imbalanced data sets.

Keywords: classification, imbalanced dataset, Lowner-John ellipsoid, model based clustering, oversampling

Procedia PDF Downloads 389
27702 Business Continuity Opportunities in the Cloud a Small to Medium Business Perspective

Authors: Donald Zullick, Cihan Varol

Abstract:

This research paper begins with a look at current work in business continuity as it relates to the cloud and small to medium business (SMB). While cloud services are an emerging paradigm that is quickly making an impact on business, there has been no substantive research applied to SMB. Seeing this lapse, we have taken a fusion of continuity and cloud research with application to the SMB market. It is an initial reflection with base framework guidelines as a starting point for implementation. In this approach, our research ties together existing work and fill the gap with an SMB outlook.

Keywords: business continuity, cloud services, medium size business, risk assessment, small business

Procedia PDF Downloads 371
27701 A Small-Scale Flexible Test Bench for the Investigation of Fertigation Strategies in Soilless Culture

Authors: Giacomo Barbieri

Abstract:

In soilless culture, the management of the nutrient solution is the most important aspect for crop growing. Fertigation dose, frequency and nutrient concentration must be planned with the objective of reaching an optimal crop growth by limiting the utilized resources and the associated costs. The definition of efficient fertigation strategies is a complex problem since fertigation requirements vary on the basis of different factors, and crops are sensitive to small variations on fertigation parameters. To the best of author knowledge, a small-scale test bench that is flexible for both nutrient solution preparation and precise irrigation is currently missing, limiting the investigations in standard practices for soilless culture. Starting from the analysis of the state of the art, this paper proposes a small-scale system that is potentially able to concurrently test different fertigation strategies. The system will be designed and implemented throughout a three year project started on August 2018. However, due to the importance of the topic within current challenges as food security and climate change, this work is spread considering that may inspire other universities and organizations.

Keywords: soilless culture, fertigation, test bench, small-scale, automation

Procedia PDF Downloads 148
27700 Microarray Gene Expression Data Dimensionality Reduction Using PCA

Authors: Fuad M. Alkoot

Abstract:

Different experimental technologies such as microarray sequencing have been proposed to generate high-resolution genetic data, in order to understand the complex dynamic interactions between complex diseases and the biological system components of genes and gene products. However, the generated samples have a very large dimension reaching thousands. Therefore, hindering all attempts to design a classifier system that can identify diseases based on such data. Additionally, the high overlap in the class distributions makes the task more difficult. The data we experiment with is generated for the identification of autism. It includes 142 samples, which is small compared to the large dimension of the data. The classifier systems trained on this data yield very low classification rates that are almost equivalent to a guess. We aim at reducing the data dimension and improve it for classification. Here, we experiment with applying a multistage PCA on the genetic data to reduce its dimensionality. Results show a significant improvement in the classification rates which increases the possibility of building an automated system for autism detection.

Keywords: PCA, gene expression, dimensionality reduction, classification, autism

Procedia PDF Downloads 530
27699 Anisotropic Total Fractional Order Variation Model in Seismic Data Denoising

Authors: Jianwei Ma, Diriba Gemechu

Abstract:

In seismic data processing, attenuation of random noise is the basic step to improve quality of data for further application of seismic data in exploration and development in different gas and oil industries. The signal-to-noise ratio of the data also highly determines quality of seismic data. This factor affects the reliability as well as the accuracy of seismic signal during interpretation for different purposes in different companies. To use seismic data for further application and interpretation, we need to improve the signal-to-noise ration while attenuating random noise effectively. To improve the signal-to-noise ration and attenuating seismic random noise by preserving important features and information about seismic signals, we introduce the concept of anisotropic total fractional order denoising algorithm. The anisotropic total fractional order variation model defined in fractional order bounded variation is proposed as a regularization in seismic denoising. The split Bregman algorithm is employed to solve the minimization problem of the anisotropic total fractional order variation model and the corresponding denoising algorithm for the proposed method is derived. We test the effectiveness of theproposed method for synthetic and real seismic data sets and the denoised result is compared with F-X deconvolution and non-local means denoising algorithm.

Keywords: anisotropic total fractional order variation, fractional order bounded variation, seismic random noise attenuation, split Bregman algorithm

Procedia PDF Downloads 185
27698 EDM for Prediction of Academic Trends and Patterns

Authors: Trupti Diwan

Abstract:

Predicting student failure at school has changed into a difficult challenge due to both the large number of factors that can affect the reduced performance of students and the imbalanced nature of these kinds of data sets. This paper surveys the two elements needed to make prediction on Students’ Academic Performances which are parameters and methods. This paper also proposes a framework for predicting the performance of engineering students. Genetic programming can be used to predict student failure/success. Ranking algorithm is used to rank students according to their credit points. The framework can be used as a basis for the system implementation & prediction of students’ Academic Performance in Higher Learning Institute.

Keywords: classification, educational data mining, student failure, grammar-based genetic programming

Procedia PDF Downloads 399
27697 A Study on the Small Biped Soft Robot with Two Insect-Like Nails

Authors: Mami Nishida

Abstract:

This paper presented a study on the development and control of a small biped soft robot using shape memory alloys (SMAs). Author proposed a flexible flat plate (FFP) actuators consisting of a thin polyethylene plate and SMAs. This actuator has a nail like an insect. This robot moves from the front to back and from left to right using two nails. The walking robot has two degrees of freedom and is controlled by switching the ON-OFF current signals to the SMA based FFPs. The resulting small biped soft robot weighs a mere 4.7 g (with a height of 67 mm). The small robot realizes biped walking by transferring the elastic potential energy (generated by deflections of the SMA based FFPs) to kinematic energy. Experimental results demonstrated the viability and utility of the small biped soft robot with the proposed SMA-based FFPs and the control strategy to achieve walking behavior.

Keywords: biped soft robot with nails, flexible flat plate (FFP) actuators, ON-OFF control strategy, shape memory alloys (SMA)

Procedia PDF Downloads 478
27696 Parameter Estimation for Contact Tracing in Graph-Based Models

Authors: Augustine Okolie, Johannes Müller, Mirjam Kretzchmar

Abstract:

We adopt a maximum-likelihood framework to estimate parameters of a stochastic susceptible-infected-recovered (SIR) model with contact tracing on a rooted random tree. Given the number of detectees per index case, our estimator allows to determine the degree distribution of the random tree as well as the tracing probability. Since we do not discover all infectees via contact tracing, this estimation is non-trivial. To keep things simple and stable, we develop an approximation suited for realistic situations (contract tracing probability small, or the probability for the detection of index cases small). In this approximation, the only epidemiological parameter entering the estimator is the basic reproduction number R0. The estimator is tested in a simulation study and applied to covid-19 contact tracing data from India. The simulation study underlines the efficiency of the method. For the empirical covid-19 data, we are able to compare different degree distributions and perform a sensitivity analysis. We find that particularly a power-law and a negative binomial degree distribution meet the data well and that the tracing probability is rather large. The sensitivity analysis shows no strong dependency on the reproduction number.

Keywords: stochastic SIR model on graph, contact tracing, branching process, parameter inference

Procedia PDF Downloads 47
27695 A Bivariate Inverse Generalized Exponential Distribution and Its Applications in Dependent Competing Risks Model

Authors: Fatemah A. Alqallaf, Debasis Kundu

Abstract:

The aim of this paper is to introduce a bivariate inverse generalized exponential distribution which has a singular component. The proposed bivariate distribution can be used when the marginals have heavy-tailed distributions, and they have non-monotone hazard functions. Due to the presence of the singular component, it can be used quite effectively when there are ties in the data. Since it has four parameters, it is a very flexible bivariate distribution, and it can be used quite effectively for analyzing various bivariate data sets. Several dependency properties and dependency measures have been obtained. The maximum likelihood estimators cannot be obtained in closed form, and it involves solving a four-dimensional optimization problem. To avoid that, we have proposed to use an EM algorithm, and it involves solving only one non-linear equation at each `E'-step. Hence, the implementation of the proposed EM algorithm is very straight forward in practice. Extensive simulation experiments and the analysis of one data set have been performed. We have observed that the proposed bivariate inverse generalized exponential distribution can be used for modeling dependent competing risks data. One data set has been analyzed to show the effectiveness of the proposed model.

Keywords: Block and Basu bivariate distributions, competing risks, EM algorithm, Marshall-Olkin bivariate exponential distribution, maximum likelihood estimators

Procedia PDF Downloads 110
27694 Comparison of Different k-NN Models for Speed Prediction in an Urban Traffic Network

Authors: Seyoung Kim, Jeongmin Kim, Kwang Ryel Ryu

Abstract:

A database that records average traffic speeds measured at five-minute intervals for all the links in the traffic network of a metropolitan city. While learning from this data the models that can predict future traffic speed would be beneficial for the applications such as the car navigation system, building predictive models for every link becomes a nontrivial job if the number of links in a given network is huge. An advantage of adopting k-nearest neighbor (k-NN) as predictive models is that it does not require any explicit model building. Instead, k-NN takes a long time to make a prediction because it needs to search for the k-nearest neighbors in the database at prediction time. In this paper, we investigate how much we can speed up k-NN in making traffic speed predictions by reducing the amount of data to be searched for without a significant sacrifice of prediction accuracy. The rationale behind this is that we had a better look at only the recent data because the traffic patterns not only repeat daily or weekly but also change over time. In our experiments, we build several different k-NN models employing different sets of features which are the current and past traffic speeds of the target link and the neighbor links in its up/down-stream. The performances of these models are compared by measuring the average prediction accuracy and the average time taken to make a prediction using various amounts of data.

Keywords: big data, k-NN, machine learning, traffic speed prediction

Procedia PDF Downloads 329
27693 Machine Learning-Enabled Classification of Climbing Using Small Data

Authors: Nicholas Milburn, Yu Liang, Dalei Wu

Abstract:

Athlete performance scoring within the climbing do-main presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.

Keywords: classification, climbing, data imbalance, data scarcity, machine learning, time sequence

Procedia PDF Downloads 116
27692 A Comprehensive Methodology for Voice Segmentation of Large Sets of Speech Files Recorded in Naturalistic Environments

Authors: Ana Londral, Burcu Demiray, Marcus Cheetham

Abstract:

Speech recording is a methodology used in many different studies related to cognitive and behaviour research. Modern advances in digital equipment brought the possibility of continuously recording hours of speech in naturalistic environments and building rich sets of sound files. Speech analysis can then extract from these files multiple features for different scopes of research in Language and Communication. However, tools for analysing a large set of sound files and automatically extract relevant features from these files are often inaccessible to researchers that are not familiar with programming languages. Manual analysis is a common alternative, with a high time and efficiency cost. In the analysis of long sound files, the first step is the voice segmentation, i.e. to detect and label segments containing speech. We present a comprehensive methodology aiming to support researchers on voice segmentation, as the first step for data analysis of a big set of sound files. Praat, an open source software, is suggested as a tool to run a voice detection algorithm, label segments and files and extract other quantitative features on a structure of folders containing a large number of sound files. We present the validation of our methodology with a set of 5000 sound files that were collected in the daily life of a group of voluntary participants with age over 65. A smartphone device was used to collect sound using the Electronically Activated Recorder (EAR): an app programmed to record 30-second sound samples that were randomly distributed throughout the day. Results demonstrated that automatic segmentation and labelling of files containing speech segments was 74% faster when compared to a manual analysis performed with two independent coders. Furthermore, the methodology presented allows manual adjustments of voiced segments with visualisation of the sound signal and the automatic extraction of quantitative information on speech. In conclusion, we propose a comprehensive methodology for voice segmentation, to be used by researchers that have to work with large sets of sound files and are not familiar with programming tools.

Keywords: automatic speech analysis, behavior analysis, naturalistic environments, voice segmentation

Procedia PDF Downloads 258
27691 Establishing Forecasts Pointing Towards the Hungarian Energy Change Based on the Results of Local Municipal Renewable Energy Production and Energy Export

Authors: Balazs Kulcsar

Abstract:

Professional energy organizations perform analyses mainly on the global and national levels about the expected development of the share of renewables in electric power generation, heating, and cooling, as well as the transport sectors. There are just a few publications, research institutions, non-profit organizations, and national initiatives with a focus on studies in the individual towns, settlements. Issues concerning the self-supply of energy on the settlement level have not become too wide-spread. The goal of our energy geographic studies is to determine the share of local renewable energy sources in the settlement-based electricity supply across Hungary. The Hungarian energy supply system defines four categories based on the installed capacities of electric power generating units. From these categories, the theoretical annual electricity production of small-sized household power plants (SSHPP) featuring installed capacities under 50 kW and small power plants with under 0.5 MW capacities have been taken into consideration. In the above-mentioned power plant categories, the Hungarian Electricity Act has allowed the establishment of power plants primarily for the utilization of renewable energy sources since 2008. Though with certain restrictions, these small power plants utilizing renewable energies have the closest links to individual settlements and can be regarded as the achievements of the host settlements in the shift of energy use. Based on the 2017 data, we have ranked settlements to reflect the level of self-sufficiency in electricity production from renewable energy sources. The results show that the supply of all the energy demanded by settlements from local renewables is within reach now in small settlements, e.g., in the form of the small power plant categories discussed in the study, and is not at all impossible even in small towns and cities. In Hungary, 30 settlements produce more renewable electricity than their own annual electricity consumption. If these overproductive settlements export their excess electricity towards neighboring settlements, then full electricity supply can be realized on further 29 settlements from renewable sources by local small power plants. These results provide an opportunity for governmental planning of the realization of energy shift (legislative background, support system, environmental education), as well as framing developmental forecasts and scenarios until 2030.

Keywords: energy geography, Hungary, local small power plants, renewable energy sources, self-sufficiency settlements

Procedia PDF Downloads 124
27690 Exploring the Implementation of Building Information Modelling Level 2 in the UK Construction Industry: The Case of Small and Medium-Sized Enterprises

Authors: Khaled Abu Awwad, Abdussalam Shibani, Michel Ghostin

Abstract:

In the last few years, building information modelling (BIM) has been acknowledged as a new technology capable of transforming the construction sector to a collaborated industry. The implementation of BIM in the United Kingdom (UK) construction sector has increased significantly in the last decade, particularly after the UK government mandated the use of BIM in all public projects by 2016. Despite this, there are many indicators that BIM implementation is the main concern for large companies, while small and medium-sized enterprises (SMEs) are lagging behind in adopting and implementing this new technology. This slow adoption of BIM leads to an uncompetitive disadvantage in public projects and possible private projects. On the other hand, there is limited research focusing on the implementation of BIM Level 2 within SMEs. Therefore, the aim of this study is to bridge this gap and provide a conceptual framework to aid SMEs in implementing BIM Level 2. This framework is a result of interpreting qualitative data obtained by conducting semi-structured interviews with BIM experts in the UK construction industry.

Keywords: building information modelling, critical success factors, small and medium-sized enterprises, United Kingdom

Procedia PDF Downloads 178
27689 The Mechanical Properties of a Small-Size Seismic Isolation Rubber Bearing for Bridges

Authors: Yi F. Wu, Ai Q. Li, Hao Wang

Abstract:

Taking a novel type of bridge bearings with the diameter being 100mm as an example, the theoretical analysis, the experimental research as well as the numerical simulation of the bearing were conducted. Since the normal compression-shear machines cannot be applied to the small-size bearing, an improved device to test the properties of the bearing was proposed and fabricated. Besides, the simulation of the bearing was conducted on the basis of the explicit finite element software ANSYS/LS-DYNA, and some parameters of the bearing are modified in the finite element model to effectively reduce the computation cost. Results show that all the research methods are capable of revealing the fundamental properties of the small-size bearings, and a combined use of these methods can better catch both the integral properties and the inner detailed mechanical behaviors of the bearing.

Keywords: ANSYS/LS-DYNA, compression shear, contact analysis, explicit algorithm, small-size

Procedia PDF Downloads 154