Search results for: Euclidean

6 An Adaptive Oversampling Technique for Imbalanced Datasets

Authors: Shaukat Ali Shahee, Usha Ananthakumar

Abstract:

A data set exhibits class imbalance problem when one class has very few examples compared to the other class, and this is also referred to as between class imbalance. The traditional classifiers fail to classify the minority class examples correctly due to its bias towards the majority class. Apart from between-class imbalance, imbalance within classes where classes are composed of a different number of sub-clusters with these sub-clusters containing different number of examples also deteriorates the performance of the classifier. Previously, many methods have been proposed for handling imbalanced dataset problem. These methods can be classified into four categories: data preprocessing, algorithmic based, cost-based methods and ensemble of classifier. Data preprocessing techniques have shown great potential as they attempt to improve data distribution rather than the classifier. Data preprocessing technique handles class imbalance either by increasing the minority class examples or by decreasing the majority class examples. Decreasing the majority class examples lead to loss of information and also when minority class has an absolute rarity, removing the majority class examples is generally not recommended. Existing methods available for handling class imbalance do not address both between-class imbalance and within-class imbalance simultaneously. In this paper, we propose a method that handles between class imbalance and within class imbalance simultaneously for binary classification problem. Removing between class imbalance and within class imbalance simultaneously eliminates the biases of the classifier towards bigger sub-clusters by minimizing the error domination of bigger sub-clusters in total error. The proposed method uses model-based clustering to find the presence of sub-clusters or sub-concepts in the dataset. The number of examples oversampled among the sub-clusters is determined based on the complexity of sub-clusters. The method also takes into consideration the scatter of the data in the feature space and also adaptively copes up with unseen test data using Lowner-John ellipsoid for increasing the accuracy of the classifier. In this study, neural network is being used as this is one such classifier where the total error is minimized and removing the between-class imbalance and within class imbalance simultaneously help the classifier in giving equal weight to all the sub-clusters irrespective of the classes. The proposed method is validated on 9 publicly available data sets and compared with three existing oversampling techniques that rely on the spatial location of minority class examples in the euclidean feature space. The experimental results show the proposed method to be statistically significantly superior to other methods in terms of various accuracy measures. Thus the proposed method can serve as a good alternative to handle various problem domains like credit scoring, customer churn prediction, financial distress, etc., that typically involve imbalanced data sets.

Keywords: classification, imbalanced dataset, Lowner-John ellipsoid, model based clustering, oversampling

Procedia PDF Downloads 389

5 Phylogenetic Inferences based on Morphoanatomical Characters in Plectranthus esculentus N. E. Br. (Lamiaceae) from Nigeria

Authors: Otuwose E. Agyeno, Adeniyi A. Jayeola, Bashir A. Ajala

Abstract:

P. esculentus is indigenous to Nigeria yet no wild relation has been encountered or reported. This has made it difficult to establish proper lineages between the varieties and landraces under cultivation. The present work is the first to determine the apormophy of 135 morphoanatomical characters in organs of 46 accessions drawn from 23 populations of this species based on dicta. The character states were coded in accession x character-state matrices and only 83 were informative and utilised for neighbour joining clustering based on euclidean values, and heuristic search in parsimony analysis using PAST ver. 3.15 software. Compatibility and evolutionary trends between accessions were then explored from values and diagrams produced. The low consistency indices (CI) recorded support monophyly and low homoplasy in this taxon. Agglomerative schedules based on character type and source data sets divided the accessions into mainly 3 clades, each of complexes of accessions. Solenostemon rotundifolius (Poir) J.K Morton was the outgroup (OG) used, and it occurred within the largest clades except when the characters were combined in a data set. The OG showed better compatibility with accessions of populations of landrace Isci, and varieties Riyum and Long’at. Otherwise, its aerial parts are more consistent with those of accessions of variety Bebot. The highly polytomous clades produced due to anatomical data set may be an indication of how stable such characters are in this species. Strict consensus trees with more than 60 nodes outputted showed that the basal nodes were strongly supported by 3 to 17 characters across the data sets, suggesting that populations of this species are more alike. The OG was clearly the first diverging lineage and closely related to accessions of landrace Gwe and variety Bebot morphologically, but different from them anatomically. It was also distantly related to landrace Fina and variety Long’at in terms of root, stem and leaf structural attributes. There were at least 5 other clades with each comprising of complexes of accessions from different localities and terrains within the study area. Spherical stem in cross section, size of vascular bundles at the stem corners as well as the alternate and whorl phyllotaxy are attributes which may have facilitated each other’s evolution in all accessions of the landrace Gwe, and they may be innovative since such states are not characteristic of the larger Lamiaceae, and Plectranthus L’Her in particular. In conclusion, this study has provided valuable information about infraspecific diversity in this taxon. It supports recognition of the varietal statuses accorded to populations of P. esculentus, as well as the hypothesis that the wild gene might have been distributed on the Jos Plateau. However, molecular characterisation of accessions of populations of this species would resolve this problem better.

Keywords: clustering, lineage, morphoanatomical characters, Nigeria, phylogenetics, Plectranthus esculentus, population

Procedia PDF Downloads 108

4 A Formal Microlectic Framework for Biological Circularchy

Authors: Ellis D. Cooper

Abstract:

“Circularchy” is supposed to be an adjustable formal framework with enough expressive power to articulate biological theory about Earthly Life in the sense of multi-scale biological autonomy constrained by non-equilibrium thermodynamics. “Formal framework” means specifically a multi-sorted first-order-theorywithequality (for each sort). Philosophically, such a theory is one kind of “microlect,” which means a “way of speaking” (or, more generally, a “way of behaving”) for overtly expressing a “mental model” of some “referent.” Other kinds of microlect include “natural microlect,” “diagrammatic microlect,” and “behavioral microlect,” with examples such as “political theory,” “Euclidean geometry,” and “dance choreography,” respectively. These are all describable in terms of a vocabulary conforming to grammar. As aspects of human culture, they are possibly reminiscent of Ernst Cassirer’s idea of “symbolic form;” as vocabularies, they are akin to Richard Rorty’s idea of “final vocabulary” for expressing a mental model of one’s life. A formal microlect is presented by stipulating sorts, variables, calculations, predicates, and postulates. Calculations (a.k.a., “terms”) may be composed to form more complicated calculations; predicates (a.k.a., “relations”) may be logically combined to form more complicated predicates; and statements (a.k.a., “sentences”) are grammatically correct expressions which are true or false. Conclusions are statements derived using logical rules of deduction from postulates, other assumed statements, or previously derived conclusions. A circularchy is a formal microlect constituted by two or more sub-microlects, each with its distinct stipulations of sorts, variables, calculations, predicates, and postulates. Within a sub-microlect some postulates or conclusions are equations which are statements that declare equality of specified calculations. An equational bond between an equation in one sub-microlect and an equation in either the same sub-microlect or in another sub-microlect is a predicate that declares equality of symbols occurring in a side of one equation with symbols occurring in a side of the other equation. Briefly, a circularchy is a network of equational bonds between sub-microlects. A circularchy is solvable if there exist solutions for all equations that satisfy all equational bonds. If a circularchy is not solvable, then a challenge would be to discover the obstruction to solvability and then conjecture what adjustments might remove the obstruction. Adjustment means changes in stipulated ingredients (sorts, etc.) of sub-microlects, or changes in equational bonds between sub-microlects, or introduction of new sub-microlects and new equational bonds. A circularchy is modular insofar as each sub-microlect is a node in a network of equation bonds. Solvability of a circularchy may be conjectured. Efforts to prove solvability may be thwarted by a counter-example or may lead to the construction of a solution. An automated theorem-proof assistant would likely be necessary for investigating a substantial circularchy, such as one purported to represent Earthly Life. Such investigations (chains of statements) would be concurrent with and no substitute for simulations (chains of numbers).

Keywords: autonomy, first-order theory, mathematics, thermodynamics

Procedia PDF Downloads 192

3 Modeling Visual Memorability Assessment with Autoencoders Reveals Characteristics of Memorable Images

Authors: Elham Bagheri, Yalda Mohsenzadeh

Abstract:

Image memorability refers to the phenomenon where certain images are more likely to be remembered by humans than others. It is a quantifiable and intrinsic attribute of an image. Understanding how visual perception and memory interact is important in both cognitive science and artificial intelligence. It reveals the complex processes that support human cognition and helps to improve machine learning algorithms by mimicking the brain's efficient data processing and storage mechanisms. To explore the computational underpinnings of image memorability, this study examines the relationship between an image's reconstruction error, distinctiveness in latent space, and its memorability score. A trained autoencoder is used to replicate human-like memorability assessment inspired by the visual memory game employed in memorability estimations. This study leverages a VGG-based autoencoder that is pre-trained on the vast ImageNet dataset, enabling it to recognize patterns and features that are common to a wide and diverse range of images. An empirical analysis is conducted using the MemCat dataset, which includes 10,000 images from five broad categories: animals, sports, food, landscapes, and vehicles, along with their corresponding memorability scores. The memorability score assigned to each image represents the probability of that image being remembered by participants after a single exposure. The autoencoder is finetuned for one epoch with a batch size of one, attempting to create a scenario similar to human memorability experiments where memorability is quantified by the likelihood of an image being remembered after being seen only once. The reconstruction error, which is quantified as the difference between the original and reconstructed images, serves as a measure of how well the autoencoder has learned to represent the data. The reconstruction error of each image, the error reduction, and its distinctiveness in latent space are calculated and correlated with the memorability score. Distinctiveness is measured as the Euclidean distance between each image's latent representation and its nearest neighbor within the autoencoder's latent space. Different structural and perceptual loss functions are considered to quantify the reconstruction error. The results indicate that there is a strong correlation between the reconstruction error and the distinctiveness of images and their memorability scores. This suggests that images with more unique distinct features that challenge the autoencoder's compressive capacities are inherently more memorable. There is also a negative correlation between the reduction in reconstruction error compared to the autoencoder pre-trained on ImageNet, which suggests that highly memorable images are harder to reconstruct, probably due to having features that are more difficult to learn by the autoencoder. These insights suggest a new pathway for evaluating image memorability, which could potentially impact industries reliant on visual content and mark a step forward in merging the fields of artificial intelligence and cognitive science. The current research opens avenues for utilizing neural representations as instruments for understanding and predicting visual memory.

Keywords: autoencoder, computational vision, image memorability, image reconstruction, memory retention, reconstruction error, visual perception

Procedia PDF Downloads 41

2 Planning for Location and Distribution of Regional Facilities Using Central Place Theory and Location-Allocation Model

Authors: Danjuma Bawa

Abstract:

This paper aimed at exploring the capabilities of Location-Allocation model in complementing the strides of the existing physical planning models in the location and distribution of facilities for regional consumption. The paper was designed to provide a blueprint to the Nigerian government and other donor agencies especially the Fertilizer Distribution Initiative (FDI) by the federal government for the revitalization of the terrorism ravaged regions. Theoretical underpinnings of central place theory related to spatial distribution, interrelationships, and threshold prerequisites were reviewed. The study showcased how Location-Allocation Model (L-AM) alongside Central Place Theory (CPT) was applied in Geographic Information System (GIS) environment to; map and analyze the spatial distribution of settlements; exploit their physical and economic interrelationships, and to explore their hierarchical and opportunistic influences. The study was purely spatial qualitative research which largely used secondary data such as; spatial location and distribution of settlements, population figures of settlements, network of roads linking them and other landform features. These were sourced from government ministries and open source consortium. GIS was used as a tool for processing and analyzing such spatial features within the dictum of CPT and L-AM to produce a comprehensive spatial digital plan for equitable and judicious location and distribution of fertilizer deports in the study area in an optimal way. Population threshold was used as yardstick for selecting suitable settlements that could stand as service centers to other hinterlands; this was accomplished using the query syntax in ArcMapTM. ArcGISTM’ network analyst was used in conducting location-allocation analysis for apportioning of groups of settlements around such service centers within a given threshold distance. Most of the techniques and models ever used by utility planners have been centered on straight distance to settlements using Euclidean distances. Such models neglect impedance cutoffs and the routing capabilities of networks. CPT and L-AM take into consideration both the influential characteristics of settlements and their routing connectivity. The study was undertaken in two terrorism ravaged Local Government Areas of Adamawa state. Four (4) existing depots in the study area were identified. 20 more depots in 20 villages were proposed using suitability analysis. Out of the 300 settlements mapped in the study area about 280 of such settlements where optimally grouped and allocated to the selected service centers respectfully within 2km impedance cutoff. This study complements the giant strides by the federal government of Nigeria by providing a blueprint for ensuring proper distribution of these public goods in the spirit of bringing succor to these terrorism ravaged populace. This will ardently at the same time help in boosting agricultural activities thereby lowering food shortage and raising per capita income as espoused by the government.

Keywords: central place theory, GIS, location-allocation, network analysis, urban and regional planning, welfare economics

Procedia PDF Downloads 120

1 A Geoprocessing Tool for Early Civil Work Notification to Optimize Fiber Optic Cable Installation Cost

Authors: Hussain Adnan Alsalman, Khalid Alhajri, Humoud Alrashidi, Abdulkareem Almakrami, Badie Alguwaisem, Said Alshahrani, Abdullah Alrowaished

Abstract:

Most of the cost of installing a new fiber optic cable is attributed to civil work-trenching-cost. In many cases, information technology departments receive project proposals in their eReview system, but not all projects are visible to everyone. Additionally, if there was no IT scope in the proposed project, it is not likely to be visible to IT. Sometimes it is too late to add IT scope after project budgets have been finalized. Finally, the eReview system is a repository of PDF files for each project, which commits the reviewer to manual work and limits automation potential. This paper details a solution to address the late notification of the eReview system by integrating IT Sites GIS data-sites locations-with land use permit (LUP) data-civil work activity, which is the first step before securing the required land usage authorizations and means no detailed designs for any relevant project before an approved LUP request. To address the manual nature of eReview system, both the LUP System and IT data are using ArcGIS Desktop, which enables the creation of a geoprocessing tool with either Python or Model Builder to automate finding and evaluating potentially usable LUP requests to reduce trenching between two sites in need of a new FOC. To achieve this, a weekly dump was taken from LUP system production data and loaded manually onto ArcMap Desktop. Then a custom tool was developed in model builder, which consisted of a table of two columns containing all the pairs of sites in need of new fiber connectivity. The tool then iterates all rows of this table, taking the sites’ pair one at a time and finding potential LUPs between them, which satisfies the provided search radius. If a group of LUPs was found, an iterator would go through each LUP to find the required civil work between the two sites and the LUP Polyline feature and the distance through the line, which would be counted as cost avoidance if an IT scope had been added. Finally, the tool will export an Excel file named with sites pair, and it will contain as many rows as the number of LUPs, which met the search radius containing trenching and pulling information and cost. As a result, multiple projects have been identified – historical, missed opportunity, and proposed projects. For the proposed project, the savings were about 75% ($750,000) to install a new fiber with the Euclidean distance between Abqaiq GOSP2 and GOSP3 DCOs. In conclusion, the current tool setup identifies opportunities to bundle civil work on single projects at a time and between two sites. More work is needed to allow the bundling of multiple projects between two sites to achieve even more cost avoidance in both capital cost and carbon footprint.

Keywords: GIS, fiber optic cable installation optimization, eliminate redundant civil work, reduce carbon footprint for fiber optic cable installation

Procedia PDF Downloads 195