Search results for: binary tree
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1508

Search results for: binary tree

1088 The Role of Specificity in Mastering the English Article System

Authors: Sugene Kim

Abstract:

The English articles are taught as a binary system based on nominal countability and definiteness. Despite the detailed rules of prescriptive grammar, it has been consistently reported in the literature that their correct usage is extremely difficult to master even for advanced learners of English as a second language (ESL) or a foreign language (EFL). Given that an English sentence (except for an imperative) cannot be constructed without a noun, which is always paired with one of the indefinite, definite, and zero articles; it is essential to understand specifically what causes ESL/EFL learners to misuse them. To that end, this study examined EFL learners’ article use employing a one-group pre–post-test design. Forty-three Korean college students received instruction on correct English article usage for two 75-minute classes employing the binary schema set up for the study. They also practiced in class how to apply the rules as instructed. Then, the participants were assigned a forced-choice elicitation task, which was also used as a pre-test administered three months prior to the instruction. Unlike the pre-test on which they only chose the correct article for each of the 40 items, the post-instruction task additionally asked them to give written accounts of their decision-making procedure to choose the article as they did. The participants’ performance was scored manually by checking whether the answer given is correct or incorrect, and their written comments were first categorized using thematic analysis and then ranked by frequency. The analyses of the performance on the two tasks and the written think-aloud data suggested that EFL learners exhibit fluctuation between specificity and definiteness, overgeneralizing the use of the definite article for almost all cataphoric references. It was apparent that they have trouble distinguishing from the two concepts possibly because the former is almost never introduced in the grammar books or classes designed for ESL/EFL learners. Particularly, most participants were found to be ignorant of the possibility of using nouns as [+specific, –definite]. Not surprisingly, the correct answer rates for such nouns averaged out at 33% and 46% on the pre- and post-tests, respectively, which narrowly reach half the overall mean correct answer rates of 65% on the pre-test and 81% on the post-test. In addition, correct article use for specific indefinites was most impermeable to instruction when compared with nouns used as [–specific, –definite] or [± specific, +definite]. Such findings underline the necessity for expanding the binary schema to a ternary form that incorporates the specificity feature, albeit not morphologically marked in the English language.

Keywords: countability, definiteness, English articles, specificity, ternary system

Procedia PDF Downloads 106
1087 Measures of Phylogenetic Support for Phylogenomic and the Whole Genomes of Two Lungfish Restate Lungfish and Origin of Land Vertebrates

Authors: Yunfeng Shan, Xiaoliang Wang, Youjun Zhou

Abstract:

Whole-genome data from two lungfish species, along with other species, present a valuable opportunity to reassess the longstanding debate regarding the evolutionary relationships among tetrapods, lungfishes, and coelacanths. However, the use of bootstrap support has become outdated for large-scale phylogenomic data. Without robust phylogenetic support, the phylogenetic trees become meaningless. Therefore, it is necessary to re-evaluate the phylogenies of tetrapods, lungfishes, and coelacanths using novel measures of phylogenetic support specifically designed for phylogenomic data, as the previous phylogenies were based on 100% bootstrap support. Our findings consistently provide strong evidence favoring lungfish as the closest living relative of tetrapods. This conclusion is based on high gene support confidence with confidence intervals exceeding 95%, high internode certainty, and high gene concordance factor. The evidence stems from two datasets containing recently deciphered whole genomes of two lungfish species, as well as five previous datasets derived from lungfish transcriptomes. These results yield fresh insights into the three hypotheses regarding the phylogenies of tetrapods, lungfishes, and coelacanths. Importantly, these hypotheses are not mere conjectures but are substantiated by a significant number of genes. Analyzing real biological data further demonstrates that the inclusion of additional taxa diminishes the number of orthologues and leads to more diverse tree topologies. Consequently, gene trees and species trees may not be identical even when whole-genome sequencing data is utilized. However, it is worth noting that many gene trees can accurately reflect the species tree if an appropriate number of taxa, typically ranging from six to ten, are sampled. Therefore, it is crucial to carefully select the number of taxa and an appropriate outgroup while excluding fast-evolving taxa as outgroups to mitigate the adverse effects of long-branch attraction (LBA) and achieve an accurate reconstruction of the species tree. This is particularly important as more whole-genome sequencing data becomes available.

Keywords: gene support confidence (GSC), origin of land vertebrates, coelacanth, two whole genomes of lungfishes, confidence intervals

Procedia PDF Downloads 54
1086 Application of Groundwater Level Data Mining in Aquifer Identification

Authors: Liang Cheng Chang, Wei Ju Huang, You Cheng Chen

Abstract:

Investigation and research are keys for conjunctive use of surface and groundwater resources. The hydrogeological structure is an important base for groundwater analysis and simulation. Traditionally, the hydrogeological structure is artificially determined based on geological drill logs, the structure of wells, groundwater levels, and so on. In Taiwan, groundwater observation network has been built and a large amount of groundwater-level observation data are available. The groundwater level is the state variable of the groundwater system, which reflects the system response combining hydrogeological structure, groundwater injection, and extraction. This study applies analytical tools to the observation database to develop a methodology for the identification of confined and unconfined aquifers. These tools include frequency analysis, cross-correlation analysis between rainfall and groundwater level, groundwater regression curve analysis, and decision tree. The developed methodology is then applied to groundwater layer identification of two groundwater systems: Zhuoshui River alluvial fan and Pingtung Plain. The abovementioned frequency analysis uses Fourier Transform processing time-series groundwater level observation data and analyzing daily frequency amplitude of groundwater level caused by artificial groundwater extraction. The cross-correlation analysis between rainfall and groundwater level is used to obtain the groundwater replenishment time between infiltration and the peak groundwater level during wet seasons. The groundwater regression curve, the average rate of groundwater regression, is used to analyze the internal flux in the groundwater system and the flux caused by artificial behaviors. The decision tree uses the information obtained from the above mentioned analytical tools and optimizes the best estimation of the hydrogeological structure. The developed method reaches training accuracy of 92.31% and verification accuracy 93.75% on Zhuoshui River alluvial fan and training accuracy 95.55%, and verification accuracy 100% on Pingtung Plain. This extraordinary accuracy indicates that the developed methodology is a great tool for identifying hydrogeological structures.

Keywords: aquifer identification, decision tree, groundwater, Fourier transform

Procedia PDF Downloads 134
1085 Ab Initio Approach to Generate a Binary Bulk Metallic Glass Foam

Authors: Jonathan Galvan-Colin, Ariel Valladares, Renela Valladares, Alexander Valladares

Abstract:

Both porous materials and bulk metallic glasses have been studied due to their potential applications and their exceptional physical and chemical properties. However, each material presents certain drawbacks which have been thought to be overcome by generating bulk metallic glass foams (BMGF). Although some experimental reports have been performed on multicomponent BMGF, still no ab initio works have been published, as far as we know. We present an approach based on the expanding lattice (EL) method to generate binary amorphous nanoporous Cu64Zr36. Starting from two different configurations: a 108-atom crystalline cubic supercell (cCu64Zr36) and a 108-atom amorphous supercell (aCu64Zr36), both with an initial density of 8.06 g/cm3, we applied EL method to halve the density and to get 50% of porosity. After the lattice expansion the supercells were subject to ab initio molecular dynamics for 500 steps at constant room temperature. Then, the samples were geometry-optimized and characterized with the pair and radial distribution functions, bond-angle distributions and a coordination number analysis. We found that pores appeared along specific spatial directions different from one to another and that they differed in size and form as well, which we think is related to the initial structure. Due to the lack of experimental counterparts our results should be considered predictive and further studies are needed in order to handle a larger number of atoms and its implication on pore topology.

Keywords: ab initio molecular dynamics, bulk mettalic glass, porous alloy

Procedia PDF Downloads 243
1084 Statistical Analysis with Prediction Models of User Satisfaction in Software Project Factors

Authors: Katawut Kaewbanjong

Abstract:

We analyzed a volume of data and found significant user satisfaction in software project factors. A statistical significance analysis (logistic regression) and collinearity analysis determined the significance factors from a group of 71 pre-defined factors from 191 software projects in ISBSG Release 12. The eight prediction models used for testing the prediction potential of these factors were Neural network, k-NN, Naïve Bayes, Random forest, Decision tree, Gradient boosted tree, linear regression and logistic regression prediction model. Fifteen pre-defined factors were truly significant in predicting user satisfaction, and they provided 82.71% prediction accuracy when used with a neural network prediction model. These factors were client-server, personnel changes, total defects delivered, project inactive time, industry sector, application type, development type, how methodology was acquired, development techniques, decision making process, intended market, size estimate approach, size estimate method, cost recording method, and effort estimate method. These findings may benefit software development managers considerably.

Keywords: prediction model, statistical analysis, software project, user satisfaction factor

Procedia PDF Downloads 91
1083 Tree-Based Inference for Regionalization: A Comparative Study of Global Topological Perturbation Methods

Authors: Orhun Aydin, Mark V. Janikas, Rodrigo Alves, Renato Assuncao

Abstract:

In this paper, a tree-based perturbation methodology for regionalization inference is presented. Regionalization is a constrained optimization problem that aims to create groups with similar attributes while satisfying spatial contiguity constraints. Similar to any constrained optimization problem, the spatial constraint may hinder convergence to some global minima, resulting in spatially contiguous members of a group with dissimilar attributes. This paper presents a general methodology for rigorously perturbing spatial constraints through the use of random spanning trees. The general framework presented can be used to quantify the effect of the spatial constraints in the overall regionalization result. We compare several types of stochastic spanning trees used in inference problems such as fuzzy regionalization and determining the number of regions. Performance of stochastic spanning trees is juxtaposed against the traditional permutation-based hypothesis testing frequently used in spatial statistics. Inference results for fuzzy regionalization and determining the number of regions is presented on the Local Area Personal Incomes for Texas Counties provided by the Bureau of Economic Analysis.

Keywords: regionalization, constrained clustering, probabilistic inference, fuzzy clustering

Procedia PDF Downloads 198
1082 Perceived Seriousness of Cybercrime Types: A Comparison across Gender

Authors: Suleman Ibrahim

Abstract:

Purpose: The research is seeking people's perceptions on cybercrime issues, rather than their knowledge of the facts. Unlike the Tripartite Cybercrime Framework (TCF), the binary models are ill-equipped to differentiate between cyber fraud (a socioeconomic crime) and cyber bullying or cyber stalking (psychosocial cybercrimes). Whilst the binary categories suggested that digital crimes are dichotomized: (i.e. cyber-enabled and cyber-dependent), the TCF, recently proposed, argued that cybercrimes can be conceptualized into three groups: socioeconomic, psychosocial and geopolitical. Concomitantly, as regards to the experience/perceptions of cybercrime, the TCF’s claim requires substantiation beyond its theoretical realm. Approach/Methodology: This scholar endeavor framed with the TCF, deploys a survey method to explore the experience of cybercrime across gender. Drawing from over 400 participants in the UK, this study aimed to contrast the differential perceptions/experiences of socioeconomic cybercrime (e.g. cyber fraud) and psychological cybercrime (e.g. cyber bullying and cyber stalking) across gender. Findings: The results revealed that cyber stalking was rated as least serious of the different digital crime categories. Further revealed that female participants judged all types of cybercrimes as more serious than male participants, with the exception of socioeconomic cybercrime – cyber fraud. This distinction helps to emphasize that gender cultures and nuances not only apply both online and offline, it emphasized the utilitarian value of the TCF. Originality: Unlike existing data, this study has contrasted the differential perceptions and experience of socioeconomic and psychosocial cybercrimes with more refined variables.

Keywords: gender variations, psychosocial cybercrime, socioeconomic cybercrime, tripartite cybercrime framework

Procedia PDF Downloads 361
1081 Intrusion Detection in Computer Networks Using a Hybrid Model of Firefly and Differential Evolution Algorithms

Authors: Mohammad Besharatloo

Abstract:

Intrusion detection is an important research topic in network security because of increasing growth in the use of computer network services. Intrusion detection is done with the aim of detecting the unauthorized use or abuse in the networks and systems by the intruders. Therefore, the intrusion detection system is an efficient tool to control the user's access through some predefined regulations. Since, the data used in intrusion detection system has high dimension, a proper representation is required to show the basis structure of this data. Therefore, it is necessary to eliminate the redundant features to create the best representation subset. In the proposed method, a hybrid model of differential evolution and firefly algorithms was employed to choose the best subset of properties. In addition, decision tree and support vector machine (SVM) are adopted to determine the quality of the selected properties. In the first, the sorted population is divided into two sub-populations. These optimization algorithms were implemented on these sub-populations, respectively. Then, these sub-populations are merged to create next repetition population. The performance evaluation of the proposed method is done based on KDD Cup99. The simulation results show that the proposed method has better performance than the other methods in this context.

Keywords: intrusion detection system, differential evolution, firefly algorithm, support vector machine, decision tree

Procedia PDF Downloads 62
1080 A Decision Support System to Detect the Lumbar Disc Disease on the Basis of Clinical MRI

Authors: Yavuz Unal, Kemal Polat, H. Erdinc Kocer

Abstract:

In this study, a decision support system comprising three stages has been proposed to detect the disc abnormalities of the lumbar region. In the first stage named the feature extraction, T2-weighted sagittal and axial Magnetic Resonance Images (MRI) were taken from 55 people and then 27 appearance and shape features were acquired from both sagittal and transverse images. In the second stage named the feature weighting process, k-means clustering based feature weighting (KMCBFW) proposed by Gunes et al. Finally, in the third stage named the classification process, the classifier algorithms including multi-layer perceptron (MLP- neural network), support vector machine (SVM), Naïve Bayes, and decision tree have been used to classify whether the subject has lumbar disc or not. In order to test the performance of the proposed method, the classification accuracy (%), sensitivity, specificity, precision, recall, f-measure, kappa value, and computation times have been used. The best hybrid model is the combination of k-means clustering based feature weighting and decision tree in the detecting of lumbar disc disease based on both sagittal and axial MR images.

Keywords: lumbar disc abnormality, lumbar MRI, lumbar spine, hybrid models, hybrid features, k-means clustering based feature weighting

Procedia PDF Downloads 499
1079 Detecting Music Enjoyment Level Using Electroencephalogram Signals and Machine Learning Techniques

Authors: Raymond Feng, Shadi Ghiasi

Abstract:

An electroencephalogram (EEG) is a non-invasive technique that records electrical activity in the brain using scalp electrodes. Researchers have studied the use of EEG to detect emotions and moods by collecting signals from participants and analyzing how those signals correlate with their activities. In this study, researchers investigated the relationship between EEG signals and music enjoyment. Participants listened to music while data was collected. During the signal-processing phase, power spectral densities (PSDs) were computed from the signals, and dominant brainwave frequencies were extracted from the PSDs to form a comprehensive feature matrix. A machine learning approach was then taken to find correlations between the processed data and the music enjoyment level indicated by the participants. To improve on previous research, multiple machine learning models were employed, including K-Nearest Neighbors Classifier, Support Vector Classifier, and Decision Tree Classifier. Hyperparameters were used to fine-tune each model to further increase its performance. The experiments showed that a strong correlation exists, with the Decision Tree Classifier with hyperparameters yielding 85% accuracy. This study proves that EEG is a reliable means to detect music enjoyment and has future applications, including personalized music recommendation, mood adjustment, and mental health therapy.

Keywords: EEG, electroencephalogram, machine learning, mood, music enjoyment, physiological signals

Procedia PDF Downloads 26
1078 Fraud Detection in Credit Cards with Machine Learning

Authors: Anjali Chouksey, Riya Nimje, Jahanvi Saraf

Abstract:

Online transactions have increased dramatically in this new ‘social-distancing’ era. With online transactions, Fraud in online payments has also increased significantly. Frauds are a significant problem in various industries like insurance companies, baking, etc. These frauds include leaking sensitive information related to the credit card, which can be easily misused. Due to the government also pushing online transactions, E-commerce is on a boom. But due to increasing frauds in online payments, these E-commerce industries are suffering a great loss of trust from their customers. These companies are finding credit card fraud to be a big problem. People have started using online payment options and thus are becoming easy targets of credit card fraud. In this research paper, we will be discussing machine learning algorithms. We have used a decision tree, XGBOOST, k-nearest neighbour, logistic-regression, random forest, and SVM on a dataset in which there are transactions done online mode using credit cards. We will test all these algorithms for detecting fraud cases using the confusion matrix, F1 score, and calculating the accuracy score for each model to identify which algorithm can be used in detecting frauds.

Keywords: machine learning, fraud detection, artificial intelligence, decision tree, k nearest neighbour, random forest, XGBOOST, logistic regression, support vector machine

Procedia PDF Downloads 122
1077 Automatic Furrow Detection for Precision Agriculture

Authors: Manpreet Kaur, Cheol-Hong Min

Abstract:

The increasing advancement in the robotics equipped with machine vision sensors applied to precision agriculture is a demanding solution for various problems in the agricultural farms. An important issue related with the machine vision system concerns crop row and weed detection. This paper proposes an automatic furrow detection system based on real-time processing for identifying crop rows in maize fields in the presence of weed. This vision system is designed to be installed on the farming vehicles, that is, submitted to gyros, vibration and other undesired movements. The images are captured under image perspective, being affected by above undesired effects. The goal is to identify crop rows for vehicle navigation which includes weed removal, where weeds are identified as plants outside the crop rows. The images quality is affected by different lighting conditions and gaps along the crop rows due to lack of germination and wrong plantation. The proposed image processing method consists of four different processes. First, image segmentation based on HSV (Hue, Saturation, Value) decision tree. The proposed algorithm used HSV color space to discriminate crops, weeds and soil. The region of interest is defined by filtering each of the HSV channels between maximum and minimum threshold values. Then the noises in the images were eliminated by the means of hybrid median filter. Further, mathematical morphological processes, i.e., erosion to remove smaller objects followed by dilation to gradually enlarge the boundaries of regions of foreground pixels was applied. It enhances the image contrast. To accurately detect the position of crop rows, the region of interest is defined by creating a binary mask. The edge detection and Hough transform were applied to detect lines represented in polar coordinates and furrow directions as accumulations on the angle axis in the Hough space. The experimental results show that the method is effective.

Keywords: furrow detection, morphological, HSV, Hough transform

Procedia PDF Downloads 208
1076 Optimization of Hate Speech and Abusive Language Detection on Indonesian-language Twitter using Genetic Algorithms

Authors: Rikson Gultom

Abstract:

Hate Speech and Abusive language on social media is difficult to detect, usually, it is detected after it becomes viral in cyberspace, of course, it is too late for prevention. An early detection system that has a fairly good accuracy is needed so that it can reduce conflicts that occur in society caused by postings on social media that attack individuals, groups, and governments in Indonesia. The purpose of this study is to find an early detection model on Twitter social media using machine learning that has high accuracy from several machine learning methods studied. In this study, the support vector machine (SVM), Naïve Bayes (NB), and Random Forest Decision Tree (RFDT) methods were compared with the Support Vector machine with genetic algorithm (SVM-GA), Nave Bayes with genetic algorithm (NB-GA), and Random Forest Decision Tree with Genetic Algorithm (RFDT-GA). The study produced a comparison table for the accuracy of the hate speech and abusive language detection model, and presented it in the form of a graph of the accuracy of the six algorithms developed based on the Indonesian-language Twitter dataset, and concluded the best model with the highest accuracy.

Keywords: abusive language, hate speech, machine learning, optimization, social media

Procedia PDF Downloads 106
1075 Impact of Land-Use and Climate Change on the Population Structure and Distribution Range of the Rare and Endangered Dracaena ombet and Dobera glabra in Northern Ethiopia

Authors: Emiru Birhane, Tesfay Gidey, Haftu Abrha, Abrha Brhan, Amanuel Zenebe, Girmay Gebresamuel, Florent Noulèkoun

Abstract:

Dracaena ombet and Dobera glabra are two of the most rare and endangered tree species in dryland areas. Unfortunately, their sustainability is being compromised by different anthropogenic and natural factors. However, the impacts of ongoing land use and climate change on the population structure and distribution of the species are less explored. This study was carried out in the grazing lands and hillside areas of the Desa'a dry Afromontane forest, northern Ethiopia, to characterize the population structure of the species and predict the impact of climate change on their potential distributions. In each land-use type, abundance, diameter at breast height, and height of the trees were collected using 70 sampling plots distributed over seven transects spaced one km apart. The geographic coordinates of each individual tree were also recorded. The results showed that the species populations were characterized by low abundance and unstable population structure. The latter was evinced by a lack of seedlings and mature trees. The study also revealed that the total abundance and dendrometric traits of the trees were significantly different between the two land uses. The hillside areas had a denser abundance of bigger and taller trees than the grazing lands. Climate change predictions using the MaxEnt model highlighted that future temperature increases coupled with reduced precipitation would lead to significant reductions in the suitable habitats of the species in northern Ethiopia. The species' suitable habitats were predicted to decline by 48–83% for D. ombet and 35–87% for D. glabra. Hence, to sustain the species populations, different strategies should be adopted, namely the introduction of alternative livelihoods (e.g., gathering NTFP) to reduce the overexploitation of the species for subsistence income and the protection of the current habitats that will remain suitable in the future using community-based exclosures. Additionally, the preservation of the species' seeds in gene banks is crucial to ensure their long-term conservation.

Keywords: grazing lands, hillside areas, land-use change, MaxEnt, range limitation, rare and endangered tree species

Procedia PDF Downloads 58
1074 Constraints and Opportunities of Wood Production Value Chain: Evidence from Southwest Ethiopia

Authors: Abduselam Faris, Rijalu Negash, Zera Kedir

Abstract:

This study was initiated to identify constraints and opportunities of the wood production value chain in Southwest Ethiopia. About 385 wood trees growing farmers were randomly interviewed. Similarly, about 30 small-scale wood processors, 30 retailers, 15 local collectors and 5 wholesalers were purposively included in the study. The results of the study indicated that 98.96 % of the smallholder farmers that engaged in the production of wood trees which is used for wood were male-headed, with an average age of 46.88 years. The main activity that the household engaged was agriculture (crop and livestock) which accounts for about 61.56% of the sample respondents. Through value chain mapping of actors, the major value chain participant and supporting actors were identified. On average, the tree-growing farmers generated gross income of 9385.926 Ethiopian birr during the survey year. Among the critical constraints identified along the wood production value chain was limited supply of credit, poor market information dissemination, high interference of brokers, and shortage of machines, inadequate working area and electricity. The availability of forest resources is the leading opportunity in the wood production value chain. Reinforcing the linkage among wood production value chain actors, providing skill training for small-scale processors, and developing suitable policy for wood tree wise use is key recommendations forward.

Keywords: value chain analysis, wood production, southwest Ethiopia, constraints and opportunities

Procedia PDF Downloads 65
1073 A Two-Stage Bayesian Variable Selection Method with the Extension of Lasso for Geo-Referenced Data

Authors: Georgiana Onicescu, Yuqian Shen

Abstract:

Due to the complex nature of geo-referenced data, multicollinearity of the risk factors in public health spatial studies is a commonly encountered issue, which leads to low parameter estimation accuracy because it inflates the variance in the regression analysis. To address this issue, we proposed a two-stage variable selection method by extending the least absolute shrinkage and selection operator (Lasso) to the Bayesian spatial setting, investigating the impact of risk factors to health outcomes. Specifically, in stage I, we performed the variable selection using Bayesian Lasso and several other variable selection approaches. Then, in stage II, we performed the model selection with only the selected variables from stage I and compared again the methods. To evaluate the performance of the two-stage variable selection methods, we conducted a simulation study with different distributions for the risk factors, using geo-referenced count data as the outcome and Michigan as the research region. We considered the cases when all candidate risk factors are independently normally distributed, or follow a multivariate normal distribution with different correlation levels. Two other Bayesian variable selection methods, Binary indicator, and the combination of Binary indicator and Lasso were considered and compared as alternative methods. The simulation results indicated that the proposed two-stage Bayesian Lasso variable selection method has the best performance for both independent and dependent cases considered. When compared with the one-stage approach, and the other two alternative methods, the two-stage Bayesian Lasso approach provides the highest estimation accuracy in all scenarios considered.

Keywords: Lasso, Bayesian analysis, spatial analysis, variable selection

Procedia PDF Downloads 111
1072 An Improved Parallel Algorithm of Decision Tree

Authors: Jiameng Wang, Yunfei Yin, Xiyu Deng

Abstract:

Parallel optimization is one of the important research topics of data mining at this stage. Taking Classification and Regression Tree (CART) parallelization as an example, this paper proposes a parallel data mining algorithm based on SSP-OGini-PCCP. Aiming at the problem of choosing the best CART segmentation point, this paper designs an S-SP model without data association; and in order to calculate the Gini index efficiently, a parallel OGini calculation method is designed. In addition, in order to improve the efficiency of the pruning algorithm, a synchronous PCCP pruning strategy is proposed in this paper. In this paper, the optimal segmentation calculation, Gini index calculation, and pruning algorithm are studied in depth. These are important components of parallel data mining. By constructing a distributed cluster simulation system based on SPARK, data mining methods based on SSP-OGini-PCCP are tested. Experimental results show that this method can increase the search efficiency of the best segmentation point by an average of 89%, increase the search efficiency of the Gini segmentation index by 3853%, and increase the pruning efficiency by 146% on average; and as the size of the data set increases, the performance of the algorithm remains stable, which meets the requirements of contemporary massive data processing.

Keywords: classification, Gini index, parallel data mining, pruning ahead

Procedia PDF Downloads 102
1071 BeamGA Median: A Hybrid Heuristic Search Approach

Authors: Ghada Badr, Manar Hosny, Nuha Bintayyash, Eman Albilali, Souad Larabi Marie-Sainte

Abstract:

The median problem is significantly applied to derive the most reasonable rearrangement phylogenetic tree for many species. More specifically, the problem is concerned with finding a permutation that minimizes the sum of distances between itself and a set of three signed permutations. Genomes with equal number of genes but different order can be represented as permutations. In this paper, an algorithm, namely BeamGA median, is proposed that combines a heuristic search approach (local beam) as an initialization step to generate a number of solutions, and then a Genetic Algorithm (GA) is applied in order to refine the solutions, aiming to achieve a better median with the smallest possible reversal distance from the three original permutations. In this approach, any genome rearrangement distance can be applied. In this paper, we use the reversal distance. To the best of our knowledge, the proposed approach was not applied before for solving the median problem. Our approach considers true biological evolution scenario by applying the concept of common intervals during the GA optimization process. This allows us to imitate a true biological behavior and enhance genetic approach time convergence. We were able to handle permutations with a large number of genes, within an acceptable time performance and with same or better accuracy as compared to existing algorithms.

Keywords: median problem, phylogenetic tree, permutation, genetic algorithm, beam search, genome rearrangement distance

Procedia PDF Downloads 244
1070 An Image Processing Scheme for Skin Fungal Disease Identification

Authors: A. A. M. A. S. S. Perera, L. A. Ranasinghe, T. K. H. Nimeshika, D. M. Dhanushka Dissanayake, Namalie Walgampaya

Abstract:

Nowadays, skin fungal diseases are mostly found in people of tropical countries like Sri Lanka. A skin fungal disease is a particular kind of illness caused by fungus. These diseases have various dangerous effects on the skin and keep on spreading over time. It becomes important to identify these diseases at their initial stage to control it from spreading. This paper presents an automated skin fungal disease identification system implemented to speed up the diagnosis process by identifying skin fungal infections in digital images. An image of the diseased skin lesion is acquired and a comprehensive computer vision and image processing scheme is used to process the image for the disease identification. This includes colour analysis using RGB and HSV colour models, texture classification using Grey Level Run Length Matrix, Grey Level Co-Occurrence Matrix and Local Binary Pattern, Object detection, Shape Identification and many more. This paper presents the approach and its outcome for identification of four most common skin fungal infections, namely, Tinea Corporis, Sporotrichosis, Malassezia and Onychomycosis. The main intention of this research is to provide an automated skin fungal disease identification system that increase the diagnostic quality, shorten the time-to-diagnosis and improve the efficiency of detection and successful treatment for skin fungal diseases.

Keywords: Circularity Index, Grey Level Run Length Matrix, Grey Level Co-Occurrence Matrix, Local Binary Pattern, Object detection, Ring Detection, Shape Identification

Procedia PDF Downloads 209
1069 Response of Six Organic Soil Media on the Germination, Seedling Vigor Performance of Jack Fruit Seeds in Chitwan Nepal

Authors: Birendra Kumar Bhattachan

Abstract:

Organic soil media plays an important role for seed germination, growing, and producing organic jack fruits as the source of food such as vitamin A, C, and others for human health. An experiment was conducted to find out the appropriate organic soil medias to induce germination and seedling vigor of jack fruit seeds at the farm of Agriculture and Forestry University (AFU) Chitwan Nepal during June 2022 to October 2022. The organic soil medias used as treatments were as 1. soil collected under the Molingia tree; 2. soil, FYM and RH (2:1;1); 3. soil, FYM (1:1); 4. sand, FYM and RH (2:1:1), 5, sand, soil, FYM and RH (1:1:1:1) and 6. sand, soil and RH (1:2:1) under Completely Randomized Design (CRD) with four replications. Significantly highest germination of 88% was induced by soil media, followed by media of soil and FYM (!:1) i.e. 63% and the media of soil, FYM and RH (2:1;1) and the least media was sand, soil, FYM and RH (1:1:1:) to induce germination of 28%. Significantly highest seedling length of 73 cm was produced by soil media followed by the media soil, sand, and RH (1:2:1), i.e. 72 cm and the media soil, sand, FYM, and RH (1:1:1:1) and the least media was soil, FYM and RH (2:1:1) to produce 62 cm seedling length, Similarly, significantly highest seedling vigor of 6257 was produced by soil media followed by the media soil and FYM (1:1) i.e. 4253 and the least was the media sand, soil, FYM and RH (1:1:1:1) to produce seedling vigor of1916. Based on this experiment, it was concluded that soil media collected under the Moringia tree could induce the highest germinating capacity of jack fruit seeds and then seedling vigor.

Keywords: jack fruit seed, soil media, farm yard manure, sand media, rice husk

Procedia PDF Downloads 166
1068 Parkinson’s Disease Detection Analysis through Machine Learning Approaches

Authors: Muhtasim Shafi Kader, Fizar Ahmed, Annesha Acharjee

Abstract:

Machine learning and data mining are crucial in health care, as well as medical information and detection. Machine learning approaches are now being utilized to improve awareness of a variety of critical health issues, including diabetes detection, neuron cell tumor diagnosis, COVID 19 identification, and so on. Parkinson’s disease is basically a disease for our senior citizens in Bangladesh. Parkinson's Disease indications often seem progressive and get worst with time. People got affected trouble walking and communicating with the condition advances. Patients can also have psychological and social vagaries, nap problems, hopelessness, reminiscence loss, and weariness. Parkinson's disease can happen in both men and women. Though men are affected by the illness at a proportion that is around partial of them are women. In this research, we have to get out the accurate ML algorithm to find out the disease with a predictable dataset and the model of the following machine learning classifiers. Therefore, nine ML classifiers are secondhand to portion study to use machine learning approaches like as follows, Naive Bayes, Adaptive Boosting, Bagging Classifier, Decision Tree Classifier, Random Forest classifier, XBG Classifier, K Nearest Neighbor Classifier, Support Vector Machine Classifier, and Gradient Boosting Classifier are used.

Keywords: naive bayes, adaptive boosting, bagging classifier, decision tree classifier, random forest classifier, XBG classifier, k nearest neighbor classifier, support vector classifier, gradient boosting classifier

Procedia PDF Downloads 106
1067 Investigating the Impacts on Cyclist Casualty Severity at Roundabouts: A UK Case Study

Authors: Nurten Akgun, Dilum Dissanayake, Neil Thorpe, Margaret C. Bell

Abstract:

Cycling has gained a great attention with comparable speeds, low cost, health benefits and reducing the impact on the environment. The main challenge associated with cycling is the provision of safety for the people choosing to cycle as their main means of transport. From the road safety point of view, cyclists are considered as vulnerable road users because they are at higher risk of serious casualty in the urban network but more specifically at roundabouts. This research addresses the development of an enhanced mathematical model by including a broad spectrum of casualty related variables. These variables were geometric design measures (approach number of lanes and entry path radius), speed limit, meteorological condition variables (light, weather, road surface) and socio-demographic characteristics (age and gender), as well as contributory factors. Contributory factors included driver’s behavior related variables such as failed to look properly, sudden braking, a vehicle passing too close to a cyclist, junction overshot, failed to judge other person’s path, restart moving off at the junction, poor turn or manoeuvre and disobeyed give-way. Tyne and Wear in the UK were selected as a case study area. The cyclist casualty data was obtained from UK STATS19 National dataset. The reference categories for the regression model were set to slight and serious cyclist casualties. Therefore, binary logistic regression was applied. Binary logistic regression analysis showed that approach number of lanes was statistically significant at the 95% level of confidence. A higher number of approach lanes increased the probability of severity of cyclist casualty occurrence. In addition, sudden braking statistically significantly increased the cyclist casualty severity at the 95% level of confidence. The result concluded that cyclist casualty severity was highly related to approach a number of lanes and sudden braking. Further research should be carried out an in-depth analysis to explore connectivity of sudden braking and approach number of lanes in order to investigate the driver’s behavior at approach locations. The output of this research will inform investment in measure to improve the safety of cyclists at roundabouts.

Keywords: binary logistic regression, casualty severity, cyclist safety, roundabout

Procedia PDF Downloads 160
1066 Automatic Detection of Traffic Stop Locations Using GPS Data

Authors: Areej Salaymeh, Loren Schwiebert, Stephen Remias, Jonathan Waddell

Abstract:

Extracting information from new data sources has emerged as a crucial task in many traffic planning processes, such as identifying traffic patterns, route planning, traffic forecasting, and locating infrastructure improvements. Given the advanced technologies used to collect Global Positioning System (GPS) data from dedicated GPS devices, GPS equipped phones, and navigation tools, intelligent data analysis methodologies are necessary to mine this raw data. In this research, an automatic detection framework is proposed to help identify and classify the locations of stopped GPS waypoints into two main categories: signalized intersections or highway congestion. The Delaunay triangulation is used to perform this assessment in the clustering phase. While most of the existing clustering algorithms need assumptions about the data distribution, the effectiveness of the Delaunay triangulation relies on triangulating geographical data points without such assumptions. Our proposed method starts by cleaning noise from the data and normalizing it. Next, the framework will identify stoppage points by calculating the traveled distance. The last step is to use clustering to form groups of waypoints for signalized traffic and highway congestion. Next, a binary classifier was applied to find distinguish highway congestion from signalized stop points. The binary classifier uses the length of the cluster to find congestion. The proposed framework shows high accuracy for identifying the stop positions and congestion points in around 99.2% of trials. We show that it is possible, using limited GPS data, to distinguish with high accuracy.

Keywords: Delaunay triangulation, clustering, intelligent transportation systems, GPS data

Procedia PDF Downloads 252
1065 A Dynamic Solution Approach for Heart Disease Prediction

Authors: Walid Moudani

Abstract:

The healthcare environment is generally perceived as being information rich yet knowledge poor. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. In fact, valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, a proficient methodology for the extraction of significant patterns from the coronary heart disease warehouses for heart attack prediction, which unfortunately continues to be a leading cause of mortality in the whole world, has been presented. For this purpose, we propose to enumerate dynamically the optimal subsets of the reduced features of high interest by using rough sets technique associated to dynamic programming. Therefore, we propose to validate the classification using Random Forest (RF) decision tree to identify the risky heart disease cases. This work is based on a large amount of data collected from several clinical institutions based on the medical profile of patient. Moreover, the experts’ knowledge in this field has been taken into consideration in order to define the disease, its risk factors, and to establish significant knowledge relationships among the medical factors. A computer-aided system is developed for this purpose based on a population of 525 adults. The performance of the proposed model is analyzed and evaluated based on set of benchmark techniques applied in this classification problem.

Keywords: multi-classifier decisions tree, features reduction, dynamic programming, rough sets

Procedia PDF Downloads 385
1064 Hybrid Approach for Software Defect Prediction Using Machine Learning with Optimization Technique

Authors: C. Manjula, Lilly Florence

Abstract:

Software technology is developing rapidly which leads to the growth of various industries. Now-a-days, software-based applications have been adopted widely for business purposes. For any software industry, development of reliable software is becoming a challenging task because a faulty software module may be harmful for the growth of industry and business. Hence there is a need to develop techniques which can be used for early prediction of software defects. Due to complexities in manual prediction, automated software defect prediction techniques have been introduced. These techniques are based on the pattern learning from the previous software versions and finding the defects in the current version. These techniques have attracted researchers due to their significant impact on industrial growth by identifying the bugs in software. Based on this, several researches have been carried out but achieving desirable defect prediction performance is still a challenging task. To address this issue, here we present a machine learning based hybrid technique for software defect prediction. First of all, Genetic Algorithm (GA) is presented where an improved fitness function is used for better optimization of features in data sets. Later, these features are processed through Decision Tree (DT) classification model. Finally, an experimental study is presented where results from the proposed GA-DT based hybrid approach is compared with those from the DT classification technique. The results show that the proposed hybrid approach achieves better classification accuracy.

Keywords: decision tree, genetic algorithm, machine learning, software defect prediction

Procedia PDF Downloads 307
1063 Efficient Frequent Itemset Mining Methods over Real-Time Spatial Big Data

Authors: Hamdi Sana, Emna Bouazizi, Sami Faiz

Abstract:

In recent years, there is a huge increase in the use of spatio-temporal applications where data and queries are continuously moving. As a result, the need to process real-time spatio-temporal data seems clear and real-time stream data management becomes a hot topic. Sliding window model and frequent itemset mining over dynamic data are the most important problems in the context of data mining. Thus, sliding window model for frequent itemset mining is a widely used model for data stream mining due to its emphasis on recent data and its bounded memory requirement. These methods use the traditional transaction-based sliding window model where the window size is based on a fixed number of transactions. Actually, this model supposes that all transactions have a constant rate which is not suited for real-time applications. And the use of this model in such applications endangers their performance. Based on these observations, this paper relaxes the notion of window size and proposes the use of a timestamp-based sliding window model. In our proposed frequent itemset mining algorithm, support conditions are used to differentiate frequents and infrequent patterns. Thereafter, a tree is developed to incrementally maintain the essential information. We evaluate our contribution. The preliminary results are quite promising.

Keywords: real-time spatial big data, frequent itemset, transaction-based sliding window model, timestamp-based sliding window model, weighted frequent patterns, tree, stream query

Procedia PDF Downloads 133
1062 Atom Probe Study of Early Stage of Precipitation on Binary Al-Li, Al-Cu Alloys and Ternary Al-Li-Cu Alloys

Authors: Muna Khushaim

Abstract:

Aluminum-based alloys play a key role in modern engineering, especially in the aerospace industry. Introduction of solute atoms such as Li and Cu is the main approach to improve the strength in age-hardenable Al alloys via the precipitation hardening phenomenon. Knowledge of the decomposition process of the microstructure during the precipitation reaction is particularly important for future technical developments. The objective of this study is to investigate the nano-scale chemical composition in the Al-Cu, Al-Li and Al-Li-Cu during the early stage of the precipitation sequence and to describe whether this compositional difference correlates with variations in the observed precipitation kinetics. Comparing the random binomial frequency distribution and the experimental frequency distribution of concentrations in atom probe tomography data was used to investigate the early stage of decomposition in the different binary and ternary alloys which were experienced different heat treatments. The results show that an Al-1.7 at.% Cu alloy requires a long ageing time of approximately 8 h at 160 °C to allow the diffusion of Cu atoms into Al matrix. For the Al-8.2 at.% Li alloy, a combination of both the natural ageing condition (48 h at room temperature) and a short artificial ageing condition (5 min at 160 °C) induces increasing on the number density of the Li clusters and hence increase number of precipitated δ' particles. Applying this combination of natural ageing and short artificial ageing conditions onto the ternary Al-4 at.% Li-1.7 at.% Cu alloy induces the formation of a Cu-rich phase. Increasing the Li content in the ternary alloy up to 8 at.% and increasing the ageing time to 30 min resulted in the precipitation processes ending with δ' particles. Thus, the results contribute to the understanding of Al-alloy design.

Keywords: aluminum alloy, atom probe tomography, early stage, decomposition

Procedia PDF Downloads 321
1061 The Impact of Digital Inclusive Finance on the High-Quality Development of China's Export Trade

Authors: Yao Wu

Abstract:

In the context of financial globalization, China has put forward the policy goal of high-quality development, and the digital economy, with its advantage of information resources, is driving China's export trade to achieve high-quality development. Due to the long-standing financing constraints of small and medium-sized export enterprises, how to expand the export scale of small and medium-sized enterprises has become a major threshold for the development of China's export trade. This paper firstly adopts the hierarchical analysis method to establish the evaluation system of high-quality development of China's export trade; secondly, the panel data of 30 provinces in China from 2011 to 2018 are selected for empirical analysis to establish the impact model of digital inclusive finance on the high-quality development of China's export trade; based on the analysis of heterogeneous enterprise trade model, a mediating effect model is established to verify the mediating role of credit constraint in the development of high-quality export trade in China. Based on the above analysis, this paper concludes that inclusive digital finance, with its unique digital and inclusive nature, alleviates the credit constraint problem among SMEs, enhances the binary marginal effect of SMEs' exports, optimizes their export scale and structure, and promotes the high-quality development of regional and even national export trade. Finally, based on the findings of this paper, we propose insights and suggestions for inclusive digital finance to promote the high-quality development of export trade.

Keywords: digital inclusive finance, high-quality development of export trade, fixed effects, binary marginal effects

Procedia PDF Downloads 66
1060 Production and Characterization of Biochars from Torrefaction of Biomass

Authors: Serdar Yaman, Hanzade Haykiri-Acma

Abstract:

Biomass is a CO₂-neutral fuel that is renewable and sustainable along with having very huge global potential. Efficient use of biomass in power generation and production of biomass-based biofuels can mitigate the greenhouse gasses (GHG) and reduce dependency on fossil fuels. There are also other beneficial effects of biomass energy use such as employment creation and pollutant reduction. However, most of the biomass materials are not capable of competing with fossil fuels in terms of energy content. High moisture content and high volatile matter yields of biomass make it low calorific fuel, and it is very significant concern over fossil fuels. Besides, the density of biomass is generally low, and it brings difficulty in transportation and storage. These negative aspects of biomass can be overcome by thermal pretreatments that upgrade the fuel property of biomass. That is, torrefaction is such a thermal process in which biomass is heated up to 300ºC under non-oxidizing conditions to avoid burning of the material. The treated biomass is called as biochar that has considerably lower contents of moisture, volatile matter, and oxygen compared to the parent biomass. Accordingly, carbon content and the calorific value of biochar increase to the level which is comparable with that of coal. Moreover, hydrophilic nature of untreated biomass that leads decay in the structure is mostly eliminated, and the surface properties of biochar turn into hydrophobic character upon torrefaction. In order to investigate the effectiveness of torrefaction process on biomass properties, several biomass species such as olive milling residue (OMR), Rhododendron (small shrubby tree with bell-shaped flowers), and ash tree (timber tree) were chosen. The fuel properties of these biomasses were analyzed through proximate and ultimate analyses as well as higher heating value (HHV) determination. For this, samples were first chopped and ground to a particle size lower than 250 µm. Then, samples were subjected to torrefaction in a horizontal tube furnace by heating from ambient up to temperatures of 200, 250, and 300ºC at a heating rate of 10ºC/min. The biochars obtained from this process were also tested by the methods applied to the parent biomass species. Improvement in the fuel properties was interpreted. That is, increasing torrefaction temperature led to regular increases in the HHV in OMR, and the highest HHV (6065 kcal/kg) was gained at 300ºC. Whereas, torrefaction at 250ºC was seen optimum for Rhododendron and ash tree since torrefaction at 300ºC had a detrimental effect on HHV. On the other hand, the increase in carbon contents and reduction in oxygen contents were determined. Burning characteristics of the biochars were also studied using thermal analysis technique. For this purpose, TA Instruments SDT Q600 model thermal analyzer was used and the thermogravimetric analysis (TGA), derivative thermogravimetry (DTG), differential scanning calorimetry (DSC), and differential thermal analysis (DTA) curves were compared and interpreted. It was concluded that torrefaction is an efficient method to upgrade the fuel properties of biomass and the biochars from which have superior characteristics compared to the parent biomasses.

Keywords: biochar, biomass, fuel upgrade, torrefaction

Procedia PDF Downloads 346
1059 Reconstruction of Age-Related Generations of Siberian Larch to Quantify the Climatogenic Dynamics of Woody Vegetation Close the Upper Limit of Its Growth

Authors: A. P. Mikhailovich, V. V. Fomin, E. M. Agapitov, V. E. Rogachev, E. A. Kostousova, E. S. Perekhodova

Abstract:

Woody vegetation among the upper limit of its habitat is a sensitive indicator of biota reaction to regional climate changes. Quantitative assessment of temporal and spatial changes in the distribution of trees and plant biocenoses calls for the development of new modeling approaches based upon selected data from measurements on the ground level and ultra-resolution aerial photography. Statistical models were developed for the study area located in the Polar Urals. These models allow obtaining probabilistic estimates for placing Siberian Larch trees into one of the three age intervals, namely 1-10, 11-40 and over 40 years, based on the Weilbull distribution of the maximum horizontal crown projection. Authors developed the distribution map for larch trees with crown diameters exceeding twenty centimeters by deciphering aerial photographs made by a UAV from an altitude equal to fifty meters. The total number of larches was equal to 88608, forming the following distribution row across the abovementioned intervals: 16980, 51740, and 19889 trees. The results demonstrate that two processes can be observed in the course of recent decades: first is the intensive forestation of previously barren or lightly wooded fragments of the study area located within the patches of wood, woodlands, and sparse stand, and second, expansion into mountain tundra. The current expansion of the Siberian Larch in the region replaced the depopulation process that occurred in the course of the Little Ice Age from the late 13ᵗʰ to the end of the 20ᵗʰ century. Using data from field measurements of Siberian larch specimen biometric parameters (including height, diameter at root collar and at 1.3 meters, and maximum projection of the crown in two orthogonal directions) and data on tree ages obtained at nine circular test sites, authors developed a model for artificial neural network including two layers with three and two neurons, respectively. The model allows quantitative assessment of a specimen's age based on height and maximum crone projection values. Tree height and crown diameters can be quantitatively assessed using data from aerial photographs and lidar scans. The resulting model can be used to assess the age of all Siberian larch trees. The proposed approach, after validation, can be applied to assessing the age of other tree species growing near the upper tree boundaries in other mountainous regions. This research was collaboratively funded by the Russian Ministry for Science and Education (project No. FEUG-2023-0002) and Russian Science Foundation (project No. 24-24-00235) in the field of data modeling on the basis of artificial neural network.

Keywords: treeline, dynamic, climate, modeling

Procedia PDF Downloads 36