Search results for: curse of dimensionality
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 138

Search results for: curse of dimensionality

48 A Fourier Method for Risk Quantification and Allocation of Credit Portfolios

Authors: Xiaoyu Shen, Fang Fang, Chujun Qiu

Abstract:

Herewith we present a Fourier method for credit risk quantification and allocation in the factor-copula model framework. The key insight is that, compared to directly computing the cumulative distribution function of the portfolio loss via Monte Carlo simulation, it is, in fact, more efficient to calculate the transformation of the distribution function in the Fourier domain instead and inverting back to the real domain can be done in just one step and semi-analytically, thanks to the popular COS method (with some adjustments). We also show that the Euler risk allocation problem can be solved in the same way since it can be transformed into the problem of evaluating a conditional cumulative distribution function. Once the conditional or unconditional cumulative distribution function is known, one can easily calculate various risk metrics. The proposed method not only fills the niche in literature, to the best of our knowledge, of accurate numerical methods for risk allocation but may also serve as a much faster alternative to the Monte Carlo simulation method for risk quantification in general. It can cope with various factor-copula model choices, which we demonstrate via examples of a two-factor Gaussian copula and a two-factor Gaussian-t hybrid copula. The fast error convergence is proved mathematically and then verified by numerical experiments, in which Value-at-Risk, Expected Shortfall, and conditional Expected Shortfall are taken as examples of commonly used risk metrics. The calculation speed and accuracy are tested to be significantly superior to the MC simulation for real-sized portfolios. The computational complexity is, by design, primarily driven by the number of factors instead of the number of obligors, as in the case of Monte Carlo simulation. The limitation of this method lies in the "curse of dimension" that is intrinsic to multi-dimensional numerical integration, which, however, can be relaxed with the help of dimension reduction techniques and/or parallel computing, as we will demonstrate in a separate paper. The potential application of this method has a wide range: from credit derivatives pricing to economic capital calculation of the banking book, default risk charge and incremental risk charge computation of the trading book, and even to other risk types than credit risk.

Keywords: credit portfolio, risk allocation, factor copula model, the COS method, Fourier method

Procedia PDF Downloads 125
47 Performance Evaluation and Comparison between the Empirical Mode Decomposition, Wavelet Analysis, and Singular Spectrum Analysis Applied to the Time Series Analysis in Atmospheric Science

Authors: Olivier Delage, Hassan Bencherif, Alain Bourdier

Abstract:

Signal decomposition approaches represent an important step in time series analysis, providing useful knowledge and insight into the data and underlying dynamics characteristics while also facilitating tasks such as noise removal and feature extraction. As most of observational time series are nonlinear and nonstationary, resulting of several physical processes interaction at different time scales, experimental time series have fluctuations at all time scales and requires the development of specific signal decomposition techniques. Most commonly used techniques are data driven, enabling to obtain well-behaved signal components without making any prior-assumptions on input data. Among the most popular time series decomposition techniques, most cited in the literature, are the empirical mode decomposition and its variants, the empirical wavelet transform and singular spectrum analysis. With increasing popularity and utility of these methods in wide ranging applications, it is imperative to gain a good understanding and insight into the operation of these algorithms. In this work, we describe all of the techniques mentioned above as well as their ability to denoise signals, to capture trends, to identify components corresponding to the physical processes involved in the evolution of the observed system and deduce the dimensionality of the underlying dynamics. Results obtained with all of these methods on experimental total ozone columns and rainfall time series will be discussed and compared

Keywords: denoising, empirical mode decomposition, singular spectrum analysis, time series, underlying dynamics, wavelet analysis

Procedia PDF Downloads 82
46 Effects of Different Meteorological Variables on Reference Evapotranspiration Modeling: Application of Principal Component Analysis

Authors: Akinola Ikudayisi, Josiah Adeyemo

Abstract:

The correct estimation of reference evapotranspiration (ETₒ) is required for effective irrigation water resources planning and management. However, there are some variables that must be considered while estimating and modeling ETₒ. This study therefore determines the multivariate analysis of correlated variables involved in the estimation and modeling of ETₒ at Vaalharts irrigation scheme (VIS) in South Africa using Principal Component Analysis (PCA) technique. Weather and meteorological data between 1994 and 2014 were obtained both from South African Weather Service (SAWS) and Agricultural Research Council (ARC) in South Africa for this study. Average monthly data of minimum and maximum temperature (°C), rainfall (mm), relative humidity (%), and wind speed (m/s) were the inputs to the PCA-based model, while ETₒ is the output. PCA technique was adopted to extract the most important information from the dataset and also to analyze the relationship between the five variables and ETₒ. This is to determine the most significant variables affecting ETₒ estimation at VIS. From the model performances, two principal components with a variance of 82.7% were retained after the eigenvector extraction. The results of the two principal components were compared and the model output shows that minimum temperature, maximum temperature and windspeed are the most important variables in ETₒ estimation and modeling at VIS. In order words, ETₒ increases with temperature and windspeed. Other variables such as rainfall and relative humidity are less important and cannot be used to provide enough information about ETₒ estimation at VIS. The outcome of this study has helped to reduce input variable dimensionality from five to the three most significant variables in ETₒ modelling at VIS, South Africa.

Keywords: irrigation, principal component analysis, reference evapotranspiration, Vaalharts

Procedia PDF Downloads 227
45 A Qualitative Study to Explore the Social Perception and Stigma around Disability, and Its Impact on the Caring Experiences of Mothers of Children with Physical Disability in Bangladesh

Authors: Farjina Malek, Julie King, Niki Edwards

Abstract:

Across the globe more than a billion people live with a disability and a further billion people, mostly carers, are indirectly impacted. While prevalence data is problematic, it is estimated that more than 15% of the population in Bangladesh live with a disability. Disability service infrastructure in Bangladesh is under-developed; and consequently, the onus of care falls on family, especially on mothers. Within the caring role, mothers encounter many challenging experiences which are not only due to the lack of support delivered through the Bangladeshi health care system but also related to the existence of stigma and perception around disability in the Bangladeshi society. Within this perception, the causes of disability are mostly associated with 'God’s will'; 'possession of ghosts on the disabled person'; and 'karma or the result of past sins of the family members especially the mothers'. These beliefs are likely to have a significant impact on the well-being of mothers and their caring experience of children with disability. This is an ongoing qualitative study which is conducting in-depth interviews with 30 mothers from five districts (Dhaka, Mymensingh, Manikganj, Tangail, and Gazipur) of Bangladesh with the aim to explore the impact of social perception and stigma around physical disability on the caring role of the mothers of children with physical disability. The major findings of this study show that the social perception around disability and the social expectation from a mother regarding her caring role have a huge impact on the well-being of mothers. Mothers are mostly expected to take their child on their lap to prove that they are ‘good mother’. These practices of lifting their children with physical disability and keeping them on the lap for a long time often cause chronic back pain of the mothers. Existing social beliefs consider disability as a ‘curse’ and punishment for the ‘sins’ of the family members, most often by the mother. Mothers are blamed if they give birth to ‘abnormal’ children. This social construction creates stigma, and thus, the caring responsibility of mothers become more challenging. It also encourages the family and mothers to hide their children from the society and to avoid seeking accessible disability services. The mothers also compromise their careers and social interaction as they have to stay with their children at home, and that has a significant impact on personal wellbeing, income, and empowerment of the mothers. The research is informed by intersectional theory and employed an interpretive phenomenological methodology to explore mothers’ experience of caring their children with physical disability, and the contribution and impact of key relationships within the family and the intersection with community and services.

Keywords: mother, family carer, physical disability, children, social stigma, key relationship

Procedia PDF Downloads 210
44 Combining Diffusion Maps and Diffusion Models for Enhanced Data Analysis

Authors: Meng Su

Abstract:

High-dimensional data analysis often presents challenges in capturing the complex, nonlinear relationships and manifold structures inherent to the data. This article presents a novel approach that leverages the strengths of two powerful techniques, Diffusion Maps and Diffusion Probabilistic Models (DPMs), to address these challenges. By integrating the dimensionality reduction capability of Diffusion Maps with the data modeling ability of DPMs, the proposed method aims to provide a comprehensive solution for analyzing and generating high-dimensional data. The Diffusion Map technique preserves the nonlinear relationships and manifold structure of the data by mapping it to a lower-dimensional space using the eigenvectors of the graph Laplacian matrix. Meanwhile, DPMs capture the dependencies within the data, enabling effective modeling and generation of new data points in the low-dimensional space. The generated data points can then be mapped back to the original high-dimensional space, ensuring consistency with the underlying manifold structure. Through a detailed example implementation, the article demonstrates the potential of the proposed hybrid approach to achieve more accurate and effective modeling and generation of complex, high-dimensional data. Furthermore, it discusses possible applications in various domains, such as image synthesis, time-series forecasting, and anomaly detection, and outlines future research directions for enhancing the scalability, performance, and integration with other machine learning techniques. By combining the strengths of Diffusion Maps and DPMs, this work paves the way for more advanced and robust data analysis methods.

Keywords: diffusion maps, diffusion probabilistic models (DPMs), manifold learning, high-dimensional data analysis

Procedia PDF Downloads 71
43 The Representation of the Medieval Idea of Ugliness in Messiaen's Saint François d’Assise

Authors: Nana Katsia

Abstract:

This paper explores the ways both medieval and medievalist conceptions of ugliness might be linked to the physical and spiritual transformation of the protagonists and how it is realised through specific musical rhythm, such as the dochmiac rhythm in the opera. As Eco and Henderson note, only one kind of ugliness could be represented in conformity with nature in the Middle Ages without destroying all aesthetic pleasure and, in turn, artistic beauty: namely, a form of ugliness which arouses disgust. Moreover, Eco explores the fact that the enemies of Christ who condemn, martyr, and crucify him are represented as wicked inside. In turn, the representation of inner wickedness and hostility toward God brings with it outward ugliness, coarseness, barbarity, and rage. Ultimately these result in the deformation of the figure. In all these regards, the non-beautiful is represented here as a necessary phase, which is not the case with classical (the ancient Greek) concepts of Beauty. As we can see, the understanding of disfigurement and ugliness in the Middle Ages was both varied and complex. In the Middle Ages, the disfigurement caused by leprosy (and other skin and bodily conditions) was interpreted, in a somewhat contradictory manner, as both a curse and a gift from God. Some saints’ lives even have the saint appealing to be inflicted with the disease as part of their mission toward true humility. We shall explore that this ‘different concept’ of ugliness (non-classical beauty) might be represented in Messiaen’s opera. According to Messiaen, the Leper and Saint François are the principal characters of the third scene, as both of them will be transformed, and a double miracle will take place in the process. Messiaen mirrors the idea of the true humility of Saint’s life and positions Le Baiser au Lépreux as the culmination of the first act. The Leper’s character represents his physical and spiritual disfigurement, which are healed after the miracle. So, the scene can be viewed as an encounter between beauty and ugliness, and that much of it is spent in a study of ugliness. Dochmiac rhythm is one of the most important compositional elements in the opera. It plays a crucial role in the process of creating a dramatic musical narrative and structure in the composition. As such, we shall explore how Messiaen represents the medieval idea of ugliness in the opera through particular musical elements linked to the main protagonists’ spiritual or physical ugliness; why Messiaen makes reference to dochmiac rhythm, and how they create the musical and dramatic context in the opera for the medieval aesthetic category of ugliness.

Keywords: ugliness in music, medieval time, saint françois d’assise, messiaen

Procedia PDF Downloads 123
42 An Efficient Machine Learning Model to Detect Metastatic Cancer in Pathology Scans Using Principal Component Analysis Algorithm, Genetic Algorithm, and Classification Algorithms

Authors: Bliss Singhal

Abstract:

Machine learning (ML) is a branch of Artificial Intelligence (AI) where computers analyze data and find patterns in the data. The study focuses on the detection of metastatic cancer using ML. Metastatic cancer is the stage where cancer has spread to other parts of the body and is the cause of approximately 90% of cancer-related deaths. Normally, pathologists spend hours each day to manually classifying whether tumors are benign or malignant. This tedious task contributes to mislabeling metastasis being over 60% of the time and emphasizes the importance of being aware of human error and other inefficiencies. ML is a good candidate to improve the correct identification of metastatic cancer, saving thousands of lives and can also improve the speed and efficiency of the process, thereby taking fewer resources and time. So far, the deep learning methodology of AI has been used in research to detect cancer. This study is a novel approach to determining the potential of using preprocessing algorithms combined with classification algorithms in detecting metastatic cancer. The study used two preprocessing algorithms: principal component analysis (PCA) and the genetic algorithm, to reduce the dimensionality of the dataset and then used three classification algorithms: logistic regression, decision tree classifier, and k-nearest neighbors to detect metastatic cancer in the pathology scans. The highest accuracy of 71.14% was produced by the ML pipeline comprising of PCA, the genetic algorithm, and the k-nearest neighbor algorithm, suggesting that preprocessing and classification algorithms have great potential for detecting metastatic cancer.

Keywords: breast cancer, principal component analysis, genetic algorithm, k-nearest neighbors, decision tree classifier, logistic regression

Procedia PDF Downloads 56
41 Understanding the Excited State Dynamics of a Phase Transformable Photo-Active Metal-Organic Framework MIP 177 through Time-Resolved Infrared Spectroscopy

Authors: Aneek Kuila, Yaron Paz

Abstract:

MIP 177 LT and HT are two-phase transformable metal organic frameworks consisting of a Ti12O15 oxocluster and a tetracarboxylate ligand that exhibits robust chemical stability and improved photoactivity. LT to HT only shows the changes in dimensionality from 0D to 1D without any change in the overall chemical structure. In terms of chemical and photoactivity MIP 177 LT is found to perform better than the MIP 177HT. Step-scan Fourier transform absorption difference time-resolved spectroscopy has been used to collect mid-IR time-resolved infrared spectra of the transient electronic excited states of a nano-porous metal–organic framework MIP 177-LT and HT with 2.5 ns time resolution. Analyzing the time-resolved vibrational data after 355nm LASER excitation reveals the presence of the temporal changes of ν (O-Ti-O) of Ti-O metal cluster and ν (-COO) of the ligand concluding the fact that these moieties are the ultimate acceptors of the excited charges which are localized over those regions on the nanosecond timescale. A direct negative correlation between the differential absorbance (Δ Absorbance) reveals the charge transfer relation among these two moieties. A longer-lived transient signal up to 180ns for MIP 177 LT compared to the 100 ns of MIP 177 HT shows the extended lifetime of the reactive charges over the surface that exerts in their effectivity. An ultrafast change of bidentate to monodentate bridging in the -COO-Ti-O ligand-metal coordination environment was observed after the photoexcitation of MIP 177 LT which remains and lives with for seconds after photoexcitation is halted. This phenomenon is very unique to MIP 177 LT but not observed with HT. This in-situ change in the coordination denticity during the photoexcitation was not observed previously which can rationalize the reason behind the ability of MIP 177 LT to accumulate electrons during continuous photoexcitation leading to a superior photocatalytic activity.

Keywords: time resolved FTIR, metal organic framework, denticity, photoacatalysis

Procedia PDF Downloads 32
40 Machine Learning Techniques for COVID-19 Detection: A Comparative Analysis

Authors: Abeer A. Aljohani

Abstract:

COVID-19 virus spread has been one of the extreme pandemics across the globe. It is also referred to as coronavirus, which is a contagious disease that continuously mutates into numerous variants. Currently, the B.1.1.529 variant labeled as omicron is detected in South Africa. The huge spread of COVID-19 disease has affected several lives and has surged exceptional pressure on the healthcare systems worldwide. Also, everyday life and the global economy have been at stake. This research aims to predict COVID-19 disease in its initial stage to reduce the death count. Machine learning (ML) is nowadays used in almost every area. Numerous COVID-19 cases have produced a huge burden on the hospitals as well as health workers. To reduce this burden, this paper predicts COVID-19 disease is based on the symptoms and medical history of the patient. This research presents a unique architecture for COVID-19 detection using ML techniques integrated with feature dimensionality reduction. This paper uses a standard UCI dataset for predicting COVID-19 disease. This dataset comprises symptoms of 5434 patients. This paper also compares several supervised ML techniques to the presented architecture. The architecture has also utilized 10-fold cross validation process for generalization and the principal component analysis (PCA) technique for feature reduction. Standard parameters are used to evaluate the proposed architecture including F1-Score, precision, accuracy, recall, receiver operating characteristic (ROC), and area under curve (AUC). The results depict that decision tree, random forest, and neural networks outperform all other state-of-the-art ML techniques. This achieved result can help effectively in identifying COVID-19 infection cases.

Keywords: supervised machine learning, COVID-19 prediction, healthcare analytics, random forest, neural network

Procedia PDF Downloads 66
39 A Hybrid-Evolutionary Optimizer for Modeling the Process of Obtaining Bricks

Authors: Marius Gavrilescu, Sabina-Adriana Floria, Florin Leon, Silvia Curteanu, Costel Anton

Abstract:

Natural sciences provide a wide range of experimental data whose related problems require study and modeling beyond the capabilities of conventional methodologies. Such problems have solution spaces whose complexity and high dimensionality require correspondingly complex regression methods for proper characterization. In this context, we propose an optimization method which consists in a hybrid dual optimizer setup: a global optimizer based on a modified variant of the popular Imperialist Competitive Algorithm (ICA), and a local optimizer based on a gradient descent approach. The ICA is modified such that intermediate solution populations are more quickly and efficiently pruned of low-fitness individuals by appropriately altering the assimilation, revolution and competition phases, which, combined with an initialization strategy based on low-discrepancy sampling, allows for a more effective exploration of the corresponding solution space. Subsequently, gradient-based optimization is used locally to seek the optimal solution in the neighborhoods of the solutions found through the modified ICA. We use this combined approach to find the optimal configuration and weights of a fully-connected neural network, resulting in regression models used to characterize the process of obtained bricks using silicon-based materials. Installations in the raw ceramics industry, i.e., bricks, are characterized by significant energy consumption and large quantities of emissions. Thus, the purpose of our approach is to determine by simulation the working conditions, including the manufacturing mix recipe with the addition of different materials, to minimize the emissions represented by CO and CH4. Our approach determines regression models which perform significantly better than those found using the traditional ICA for the aforementioned problem, resulting in better convergence and a substantially lower error.

Keywords: optimization, biologically inspired algorithm, regression models, bricks, emissions

Procedia PDF Downloads 57
38 2D Ferromagnetism in Van der Waals Bonded Fe₃GeTe₂

Authors: Ankita Tiwari, Jyoti Saini, Subhasis Ghosh

Abstract:

For many years, researchers have been fascinated by the subject of how properties evolve as dimensionality is lowered. Early on, it was shown that the presence of a significant magnetic anisotropy might compensate for the lack of long-range (LR) magnetic order in a low-dimensional system (d < 3) with continuous symmetry, as proposed by Hohenberg-Mermin and Wagner (HMW). Strong magnetic anisotropy allows an LR magnetic order to stabilize in two dimensions (2D) even in the presence of stronger thermal fluctuations which is responsible for the absence of Heisenberg ferromagnetism in 2D. Van der Waals (vdW) ferromagnets, including CrI₃, CrTe₂, Cr₂X₂Te₆ (X = Si and Ge) and Fe₃GeTe₂, offer a nearly ideal platform for studying ferromagnetism in 2D. Fe₃GeTe₂ is the subject of extensive investigation due to its tunable magnetic properties, high Curie temperature (Tc ~ 220K), and perpendicular magnetic anisotropy. Many applications in the field of spintronics device development have been quite active due to these appealing features of Fe₃GeTe₂. Although it is known that LR-driven ferromagnetism is necessary to get around the HMW theorem in 2D experimental realization, Heisenberg 2D ferromagnetism remains elusive in condensed matter systems. Here, we show that Fe₃GeTe₂ hosts both localized and delocalized spins, resulting in itinerant and local-moment ferromagnetism. The presence of LR itinerant interaction facilitates to stabilize Heisenberg ferromagnet in 2D. With the help of Rhodes-Wohlfarth (RW) and generalized RW-based analysis, Fe₃GeTe₂ has been shown to be a 2D ferromagnet with itinerant magnetism that can be modulated by an external magnetic field. Hence, the presence of both local moment and itinerant magnetism has made this system interesting in terms of research in low dimensions. We have also rigorously performed critical analysis using an improvised method. We show that the variable critical exponents are typical signatures of 2D ferromagnetism in Fe₃GeTe₂. The spontaneous magnetization exponent β changes the universality class from mean-field to 2D Heisenberg with field. We have also confirmed the range of interaction via the renormalization group (RG) theory. According to RG theory, Fe₃GeTe₂ is a 2D ferromagnet with LR interactions.

Keywords: Van der Waal ferromagnet, 2D ferromagnetism, phase transition, itinerant ferromagnetism, long range order

Procedia PDF Downloads 41
37 Mapping the Digital Landscape: An Analysis of Party Differences between Conventional and Digital Policy Positions

Authors: Daniel Schwarz, Jan Fivaz, Alessia Neuroni

Abstract:

Although digitization is a buzzword in almost every election campaign, the political parties leave voters largely in the dark about their specific positions on digital issues. In the run-up to the 2019 elections in Switzerland, the ‘Digitization Monitor’ project (DMP) was launched in order to change this situation. Within the framework of the DMP, all 4,736 candidates were surveyed about their digital policy positions and values. The DMP is designed as a digital policy supplement to the existing ‘smartvote’ voting advice application. This enabled a direct comparison of the digital policy attitudes according to the DMP with the topics of the ‘smartvote’ questionnaire which are comprehensive in content but mainly related to conventional policy areas. This paper’s main research goal is to analyze and visualize possible differences between conventional and digital policy areas in terms of response patterns between and within political parties. The analysis is based on dimensionality reduction methods (multidimensional scaling and principal component analysis) for the visualization of inter-party differences, and on standard deviation as a measure of variation for the evaluation of intra-party unity. The results reveal that digital issues show a lower degree of inter-party polarization compared to conventional policy areas. Thus, the parties have more common ground in issues on digitization than in conventional policy areas. In contrast, the study reveals a mixed picture regarding intra-party unity. Homogeneous parties show a lower degree of unity in digitization issues whereas parties with heterogeneous positions in conventional areas have more united positions in digital areas. All things considered, the findings are encouraging as less polarized conditions apply to the debate on digital development compared to conventional politics. For the future, it would be desirable if in further countries similar projects to the DMP could emerge to broaden the basis for conclusions.

Keywords: comparison of political issue dimensions, digital awareness of candidates, digital policy space, party positions on digital issues

Procedia PDF Downloads 161
36 Multi-Elemental Analysis Using Inductively Coupled Plasma Mass Spectrometry for the Geographical Origin Discrimination of Greek Giant Beans “Gigantes Elefantes”

Authors: Eleni C. Mazarakioti, Anastasios Zotos, Anna-Akrivi Thomatou, Efthimios Kokkotos, Achilleas Kontogeorgos, Athanasios Ladavos, Angelos Patakas

Abstract:

“Gigantes Elefantes” is a particularly dynamic crop of giant beans cultivated in western Macedonia (Greece). This variety of large beans growing in this area and specifically in the regions of Prespes and Kastoria is a protected designation of origin (PDO) species with high nutritional quality. Mislabeling of geographical origin and blending with unidentified samples are common fraudulent practices in Greek food market with financial and possible health consequences. In the last decades, multi-elemental composition analysis has been used in identifying the geographical origin of foods and agricultural products. In an attempt to discriminate the authenticity of Greek beans, multi-elemental analysis (Ag, Al, As, B, Ba, Be, Ca, Cd, Co, Cr, Cs, Cu, Fe, Ga, Ge, K, Li, Mg, Mn, Mo, Na, Nb, Ni, P, Pb, Rb, Re, Se, Sr, Ta, Ti, Tl, U, V, W, Zn, Zr) was performed by inductively coupled plasma mass spectrometry (ICP-MS) on 320 samples of beans, originated from Greece (Prespes and Kastoria), China and Poland. All samples were collected during the autumn of 2021. The obtained data were analysed by principal component analysis (PCA), an unsupervised statistical method, which allows for to reduce of the dimensionality of the enormous datasets. Statistical analysis revealed a clear separation of beans that had been cultivated in Greece compared with those from China and Poland. An adequate discrimination of geographical origin between bean samples originating from the two Greece regions, Prespes and Kastoria, was also evident. Our results suggest that multi-elemental analysis combined with the appropriate multivariate statistical method could be a useful tool for bean’s geographical authentication. Acknowledgment: This research has been financed by the Public Investment Programme/General Secretariat for Research and Innovation, under the call “YPOERGO 3, code 2018SE01300000: project title: ‘Elaboration and implementation of methodology for authenticity and geographical origin assessment of agricultural products.

Keywords: geographical origin, authenticity, multi-elemental analysis, beans, ICP-MS, PCA

Procedia PDF Downloads 49
35 Neural Networks Models for Measuring Hotel Users Satisfaction

Authors: Asma Ameur, Dhafer Malouche

Abstract:

Nowadays, user comments on the Internet have an important impact on hotel bookings. This confirms that the e-reputation issue can influence the likelihood of customer loyalty to a hotel. In this way, e-reputation has become a real differentiator between hotels. For this reason, we have a unique opportunity in the opinion mining field to analyze the comments. In fact, this field provides the possibility of extracting information related to the polarity of user reviews. This sentimental study (Opinion Mining) represents a new line of research for analyzing the unstructured textual data. Knowing the score of e-reputation helps the hotelier to better manage his marketing strategy. The score we then obtain is translated into the image of hotels to differentiate between them. Therefore, this present research highlights the importance of hotel satisfaction ‘scoring. To calculate the satisfaction score, the sentimental analysis can be manipulated by several techniques of machine learning. In fact, this study treats the extracted textual data by using the Artificial Neural Networks Approach (ANNs). In this context, we adopt the aforementioned technique to extract information from the comments available in the ‘Trip Advisor’ website. This actual paper details the description and the modeling of the ANNs approach for the scoring of online hotel reviews. In summary, the validation of this used method provides a significant model for hotel sentiment analysis. So, it provides the possibility to determine precisely the polarity of the hotel users reviews. The empirical results show that the ANNs are an accurate approach for sentiment analysis. The obtained results show also that this proposed approach serves to the dimensionality reduction for textual data’ clustering. Thus, this study provides researchers with a useful exploration of this technique. Finally, we outline guidelines for future research in the hotel e-reputation field as comparing the ANNs with other technique.

Keywords: clustering, consumer behavior, data mining, e-reputation, machine learning, neural network, online hotel ‘reviews, opinion mining, scoring

Procedia PDF Downloads 110
34 Modeling Biomass and Biodiversity across Environmental and Management Gradients in Temperate Grasslands with Deep Learning and Sentinel-1 and -2

Authors: Javier Muro, Anja Linstadter, Florian Manner, Lisa Schwarz, Stephan Wollauer, Paul Magdon, Gohar Ghazaryan, Olena Dubovyk

Abstract:

Monitoring the trade-off between biomass production and biodiversity in grasslands is critical to evaluate the effects of management practices across environmental gradients. New generations of remote sensing sensors and machine learning approaches can model grasslands’ characteristics with varying accuracies. However, studies often fail to cover a sufficiently broad range of environmental conditions, and evidence suggests that prediction models might be case specific. In this study, biomass production and biodiversity indices (species richness and Fishers’ α) are modeled in 150 grassland plots for three sites across Germany. These sites represent a North-South gradient and are characterized by distinct soil types, topographic properties, climatic conditions, and management intensities. Predictors used are derived from Sentinel-1 & 2 and a set of topoedaphic variables. The transferability of the models is tested by training and validating at different sites. The performance of feed-forward deep neural networks (DNN) is compared to a random forest algorithm. While biomass predictions across gradients and sites were acceptable (r2 0.5), predictions of biodiversity indices were poor (r2 0.14). DNN showed higher generalization capacity than random forest when predicting biomass across gradients and sites (relative root mean squared error of 0.5 for DNN vs. 0.85 for random forest). DNN also achieved high performance when using the Sentinel-2 surface reflectance data rather than different combinations of spectral indices, Sentinel-1 data, or topoedaphic variables, simplifying dimensionality. This study demonstrates the necessity of training biomass and biodiversity models using a broad range of environmental conditions and ensuring spatial independence to have realistic and transferable models where plot level information can be upscaled to landscape scale.

Keywords: ecosystem services, grassland management, machine learning, remote sensing

Procedia PDF Downloads 190
33 Constructing a Semi-Supervised Model for Network Intrusion Detection

Authors: Tigabu Dagne Akal

Abstract:

While advances in computer and communications technology have made the network ubiquitous, they have also rendered networked systems vulnerable to malicious attacks devised from a distance. These attacks or intrusions start with attackers infiltrating a network through a vulnerable host and then launching further attacks on the local network or Intranet. Nowadays, system administrators and network professionals can attempt to prevent such attacks by developing intrusion detection tools and systems using data mining technology. In this study, the experiments were conducted following the Knowledge Discovery in Database Process Model. The Knowledge Discovery in Database Process Model starts from selection of the datasets. The dataset used in this study has been taken from Massachusetts Institute of Technology Lincoln Laboratory. After taking the data, it has been pre-processed. The major pre-processing activities include fill in missed values, remove outliers; resolve inconsistencies, integration of data that contains both labelled and unlabelled datasets, dimensionality reduction, size reduction and data transformation activity like discretization tasks were done for this study. A total of 21,533 intrusion records are used for training the models. For validating the performance of the selected model a separate 3,397 records are used as a testing set. For building a predictive model for intrusion detection J48 decision tree and the Naïve Bayes algorithms have been tested as a classification approach for both with and without feature selection approaches. The model that was created using 10-fold cross validation using the J48 decision tree algorithm with the default parameter values showed the best classification accuracy. The model has a prediction accuracy of 96.11% on the training datasets and 93.2% on the test dataset to classify the new instances as normal, DOS, U2R, R2L and probe classes. The findings of this study have shown that the data mining methods generates interesting rules that are crucial for intrusion detection and prevention in the networking industry. Future research directions are forwarded to come up an applicable system in the area of the study.

Keywords: intrusion detection, data mining, computer science, data mining

Procedia PDF Downloads 270
32 A Risk Assessment Tool for the Contamination of Aflatoxins on Dried Figs Based on Machine Learning Algorithms

Authors: Kottaridi Klimentia, Demopoulos Vasilis, Sidiropoulos Anastasios, Ihara Diego, Nikolaidis Vasileios, Antonopoulos Dimitrios

Abstract:

Aflatoxins are highly poisonous and carcinogenic compounds produced by species of the genus Aspergillus spp. that can infect a variety of agricultural foods, including dried figs. Biological and environmental factors, such as population, pathogenicity, and aflatoxinogenic capacity of the strains, topography, soil, and climate parameters of the fig orchards, are believed to have a strong effect on aflatoxin levels. Existing methods for aflatoxin detection and measurement, such as high performance liquid chromatography (HPLC), and enzyme-linked immunosorbent assay (ELISA), can provide accurate results, but the procedures are usually time-consuming, sample-destructive, and expensive. Predicting aflatoxin levels prior to crop harvest is useful for minimizing the health and financial impact of a contaminated crop. Consequently, there is interest in developing a tool that predicts aflatoxin levels based on topography and soil analysis data of fig orchards. This paper describes the development of a risk assessment tool for the contamination of aflatoxin on dried figs, based on the location and altitude of the fig orchards, the population of the fungus Aspergillus spp. in the soil, and soil parameters such as pH, saturation percentage (SP), electrical conductivity (EC), organic matter, particle size analysis (sand, silt, clay), the concentration of the exchangeable cations (Ca, Mg, K, Na), extractable P, and trace of elements (B, Fe, Mn, Zn and Cu), by employing machine learning methods. In particular, our proposed method integrates three machine learning techniques, i.e., dimensionality reduction on the original dataset (principal component analysis), metric learning (Mahalanobis metric for clustering), and k-nearest neighbors learning algorithm (KNN), into an enhanced model, with mean performance equal to 85% by terms of the Pearson correlation coefficient (PCC) between observed and predicted values.

Keywords: aflatoxins, Aspergillus spp., dried figs, k-nearest neighbors, machine learning, prediction

Procedia PDF Downloads 151
31 Feature Selection Approach for the Classification of Hydraulic Leakages in Hydraulic Final Inspection using Machine Learning

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

Manufacturing companies are facing global competition and enormous cost pressure. The use of machine learning applications can help reduce production costs and create added value. Predictive quality enables the securing of product quality through data-supported predictions using machine learning models as a basis for decisions on test results. Furthermore, machine learning methods are able to process large amounts of data, deal with unfavourable row-column ratios and detect dependencies between the covariates and the given target as well as assess the multidimensional influence of all input variables on the target. Real production data are often subject to highly fluctuating boundary conditions and unbalanced data sets. Changes in production data manifest themselves in trends, systematic shifts, and seasonal effects. Thus, Machine learning applications require intensive pre-processing and feature selection. Data preprocessing includes rule-based data cleaning, the application of dimensionality reduction techniques, and the identification of comparable data subsets. Within the used real data set of Bosch hydraulic valves, the comparability of the same production conditions in the production of hydraulic valves within certain time periods can be identified by applying the concept drift method. Furthermore, a classification model is developed to evaluate the feature importance in different subsets within the identified time periods. By selecting comparable and stable features, the number of features used can be significantly reduced without a strong decrease in predictive power. The use of cross-process production data along the value chain of hydraulic valves is a promising approach to predict the quality characteristics of workpieces. In this research, the ada boosting classifier is used to predict the leakage of hydraulic valves based on geometric gauge blocks from machining, mating data from the assembly, and hydraulic measurement data from end-of-line testing. In addition, the most suitable methods are selected and accurate quality predictions are achieved.

Keywords: classification, achine learning, predictive quality, feature selection

Procedia PDF Downloads 141
30 The Analyzer: Clustering Based System for Improving Business Productivity by Analyzing User Profiles to Enhance Human Computer Interaction

Authors: Dona Shaini Abhilasha Nanayakkara, Kurugamage Jude Pravinda Gregory Perera

Abstract:

E-commerce platforms have revolutionized the shopping experience, offering convenient ways for consumers to make purchases. To improve interactions with customers and optimize marketing strategies, it is essential for businesses to understand user behavior, preferences, and needs on these platforms. This paper focuses on recommending businesses to customize interactions with users based on their behavioral patterns, leveraging data-driven analysis and machine learning techniques. Businesses can improve engagement and boost the adoption of e-commerce platforms by aligning behavioral patterns with user goals of usability and satisfaction. We propose TheAnalyzer, a clustering-based system designed to enhance business productivity by analyzing user-profiles and improving human-computer interaction. The Analyzer seamlessly integrates with business applications, collecting relevant data points based on users' natural interactions without additional burdens such as questionnaires or surveys. It defines five key user analytics as features for its dataset, which are easily captured through users' interactions with e-commerce platforms. This research presents a study demonstrating the successful distinction of users into specific groups based on the five key analytics considered by TheAnalyzer. With the assistance of domain experts, customized business rules can be attached to each group, enabling The Analyzer to influence business applications and provide an enhanced personalized user experience. The outcomes are evaluated quantitatively and qualitatively, demonstrating that utilizing TheAnalyzer’s capabilities can optimize business outcomes, enhance customer satisfaction, and drive sustainable growth. The findings of this research contribute to the advancement of personalized interactions in e-commerce platforms. By leveraging user behavioral patterns and analyzing both new and existing users, businesses can effectively tailor their interactions to improve customer satisfaction, loyalty and ultimately drive sales.

Keywords: data clustering, data standardization, dimensionality reduction, human computer interaction, user profiling

Procedia PDF Downloads 44
29 Measuring Fluctuating Asymmetry in Human Faces Using High-Density 3D Surface Scans

Authors: O. Ekrami, P. Claes, S. Van Dongen

Abstract:

Fluctuating asymmetry (FA) has been studied for many years as an indicator of developmental stability or ‘genetic quality’ based on the assumption that perfect symmetry is ideally the expected outcome for a bilateral organism. Further studies have also investigated the possible link between FA and attractiveness or levels of masculinity or femininity. These hypotheses have been mostly examined using 2D images, and the structure of interest is usually presented using a limited number of landmarks. Such methods have the downside of simplifying and reducing the dimensionality of the structure, which will in return increase the error of the analysis. In an attempt to reach more conclusive and accurate results, in this study we have used high-resolution 3D scans of human faces and have developed an algorithm to measure and localize FA, taking a spatially-dense approach. A symmetric spatially dense anthropometric mask with paired vertices is non-rigidly mapped on target faces using an Iterative Closest Point (ICP) registration algorithm. A set of 19 manually indicated landmarks were used to examine the precision of our mapping step. The protocol’s accuracy in measurement and localizing FA is assessed using simulated faces with known amounts of asymmetry added to them. The results of validation of our approach show that the algorithm is perfectly capable of locating and measuring FA in 3D simulated faces. With the use of such algorithm, the additional captured information on asymmetry can be used to improve the studies of FA as an indicator of fitness or attractiveness. This algorithm can especially be of great benefit in studies of high number of subjects due to its automated and time-efficient nature. Additionally, taking a spatially dense approach provides us with information about the locality of FA, which is impossible to obtain using conventional methods. It also enables us to analyze the asymmetry of a morphological structures in a multivariate manner; This can be achieved by using methods such as Principal Components Analysis (PCA) or Factor Analysis, which can be a step towards understanding the underlying processes of asymmetry. This method can also be used in combination with genome wide association studies to help unravel the genetic bases of FA. To conclude, we introduced an algorithm to study and analyze asymmetry in human faces, with the possibility of extending the application to other morphological structures, in an automated, accurate and multi-variate framework.

Keywords: developmental stability, fluctuating asymmetry, morphometrics, 3D image processing

Procedia PDF Downloads 116
28 The Application of Video Segmentation Methods for the Purpose of Action Detection in Videos

Authors: Nassima Noufail, Sara Bouhali

Abstract:

In this work, we develop a semi-supervised solution for the purpose of action detection in videos and propose an efficient algorithm for video segmentation. The approach is divided into video segmentation, feature extraction, and classification. In the first part, a video is segmented into clips, and we used the K-means algorithm for this segmentation; our goal is to find groups based on similarity in the video. The application of k-means clustering into all the frames is time-consuming; therefore, we started by the identification of transition frames where the scene in the video changes significantly, and then we applied K-means clustering into these transition frames. We used two image filters, the gaussian filter and the Laplacian of Gaussian. Each filter extracts a set of features from the frames. The Gaussian filter blurs the image and omits the higher frequencies, and the Laplacian of gaussian detects regions of rapid intensity changes; we then used this vector of filter responses as an input to our k-means algorithm. The output is a set of cluster centers. Each video frame pixel is then mapped to the nearest cluster center and painted with a corresponding color to form a visual map. The resulting visual map had similar pixels grouped. We then computed a cluster score indicating how clusters are near each other and plotted a signal representing frame number vs. clustering score. Our hypothesis was that the evolution of the signal would not change if semantically related events were happening in the scene. We marked the breakpoints at which the root mean square level of the signal changes significantly, and each breakpoint is an indication of the beginning of a new video segment. In the second part, for each segment from part 1, we randomly selected a 16-frame clip, then we extracted spatiotemporal features using convolutional 3D network C3D for every 16 frames using a pre-trained model. The C3D final output is a 512-feature vector dimension; hence we used principal component analysis (PCA) for dimensionality reduction. The final part is the classification. The C3D feature vectors are used as input to a multi-class linear support vector machine (SVM) for the training model, and we used a multi-classifier to detect the action. We evaluated our experiment on the UCF101 dataset, which consists of 101 human action categories, and we achieved an accuracy that outperforms the state of art by 1.2%.

Keywords: video segmentation, action detection, classification, Kmeans, C3D

Procedia PDF Downloads 51
27 Resonant Fluorescence in a Two-Level Atom and the Terahertz Gap

Authors: Nikolai N. Bogolubov, Andrey V. Soldatov

Abstract:

Terahertz radiation occupies a range of frequencies somewhere from 100 GHz to approximately 10 THz, just between microwaves and infrared waves. This range of frequencies holds promise for many useful applications in experimental applied physics and technology. At the same time, reliable, simple techniques for generation, amplification, and modulation of electromagnetic radiation in this range are far from been developed enough to meet the requirements of its practical usage, especially in comparison to the level of technological abilities already achieved for other domains of the electromagnetic spectrum. This situation of relative underdevelopment of this potentially very important range of electromagnetic spectrum is known under the name of the 'terahertz gap.' Among other things, technological progress in the terahertz area has been impeded by the lack of compact, low energy consumption, easily controlled and continuously radiating terahertz radiation sources. Therefore, development of new techniques serving this purpose as well as various devices based on them is of obvious necessity. No doubt, it would be highly advantageous to employ the simplest of suitable physical systems as major critical components in these techniques and devices. The purpose of the present research was to show by means of conventional methods of non-equilibrium statistical mechanics and the theory of open quantum systems, that a thoroughly studied two-level quantum system, also known as an one-electron two-level 'atom', being driven by external classical monochromatic high-frequency (e.g. laser) field, can radiate continuously at much lower (e.g. terahertz) frequency in the fluorescent regime if the transition dipole moment operator of this 'atom' possesses permanent non-equal diagonal matrix elements. This assumption contradicts conventional assumption routinely made in quantum optics that only the non-diagonal matrix elements persist. The conventional assumption is pertinent to natural atoms and molecules and stems from the property of spatial inversion symmetry of their eigenstates. At the same time, such an assumption is justified no more in regard to artificially manufactured quantum systems of reduced dimensionality, such as, for example, quantum dots, which are often nicknamed 'artificial atoms' due to striking similarity of their optical properties to those ones of the real atoms. Possible ways to experimental observation and practical implementation of the predicted effect are discussed too.

Keywords: terahertz gap, two-level atom, resonant fluorescence, quantum dot, resonant fluorescence, two-level atom

Procedia PDF Downloads 243
26 Evaluation of the CRISP-DM Business Understanding Step: An Approach for Assessing the Predictive Power of Regression versus Classification for the Quality Prediction of Hydraulic Test Results

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

Digitalisation in production technology is a driver for the application of machine learning methods. Through the application of predictive quality, the great potential for saving necessary quality control can be exploited through the data-based prediction of product quality and states. However, the serial use of machine learning applications is often prevented by various problems. Fluctuations occur in real production data sets, which are reflected in trends and systematic shifts over time. To counteract these problems, data preprocessing includes rule-based data cleaning, the application of dimensionality reduction techniques, and the identification of comparable data subsets to extract stable features. Successful process control of the target variables aims to centre the measured values around a mean and minimise variance. Competitive leaders claim to have mastered their processes. As a result, much of the real data has a relatively low variance. For the training of prediction models, the highest possible generalisability is required, which is at least made more difficult by this data availability. The implementation of a machine learning application can be interpreted as a production process. The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model with six phases that describes the life cycle of data science. As in any process, the costs to eliminate errors increase significantly with each advancing process phase. For the quality prediction of hydraulic test steps of directional control valves, the question arises in the initial phase whether a regression or a classification is more suitable. In the context of this work, the initial phase of the CRISP-DM, the business understanding, is critically compared for the use case at Bosch Rexroth with regard to regression and classification. The use of cross-process production data along the value chain of hydraulic valves is a promising approach to predict the quality characteristics of workpieces. Suitable methods for leakage volume flow regression and classification for inspection decision are applied. Impressively, classification is clearly superior to regression and achieves promising accuracies.

Keywords: classification, CRISP-DM, machine learning, predictive quality, regression

Procedia PDF Downloads 118
25 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: deep learning, disease prediction, end-to-end machine learning, metagenomics, multiple instance learning, precision medicine

Procedia PDF Downloads 99
24 Dimensionality Control of Li Transport by MOFs Based Quasi-Solid to Solid Electrolyte

Authors: Manuel Salado, Mikel Rincón, Arkaitz Fidalgo, Roberto Fernandez, Senentxu Lanceros-Méndez

Abstract:

Lithium-ion batteries (LIBs) are a promising technology for energy storage, but they suffer from safety concerns due to the use of flammable organic solvents in their liquid electrolytes. Solid-state electrolytes (SSEs) offer a potential solution to this problem, but they have their own limitations, such as poor ionic conductivity and high interfacial resistance. The aim of this research was to develop a new type of SSE based on metal-organic frameworks (MOFs) and ionic liquids (ILs). MOFs are porous materials with high surface area and tunable electronic properties, making them ideal for use in SSEs. ILs are liquid electrolytes that are non-flammable and have high ionic conductivity. A series of MOFs were synthesized, and their electrochemical properties were evaluated. The MOFs were then infiltrated with ILs to form a quasi-solid gel and solid xerogel SSEs. The ionic conductivity, interfacial resistance, and electrochemical performance of the SSEs were characterized. The results showed that the MOF-IL SSEs had significantly higher ionic conductivity and lower interfacial resistance than conventional SSEs. The SSEs also exhibited excellent electrochemical performance, with high discharge capacity and long cycle life. The development of MOF-IL SSEs represents a significant advance in the field of solid-state electrolytes. The high ionic conductivity and low interfacial resistance of the SSEs make them promising candidates for use in next-generation LIBs. The data for this research was collected using a variety of methods, including X-ray diffraction, scanning electron microscopy, and electrochemical impedance spectroscopy. The data was analyzed using a variety of statistical and computational methods, including principal component analysis, density functional theory, and molecular dynamics simulations. The main question addressed by this research was whether MOF-IL SSEs could be developed that have high ionic conductivity, low interfacial resistance, and excellent electrochemical performance. The results of this research demonstrate that MOF-IL SSEs are a promising new type of solid-state electrolyte for use in LIBs. The SSEs have high ionic conductivity, low interfacial resistance, and excellent electrochemical performance. These properties make them promising candidates for use in next-generation LIBs that are safer and have higher energy densities.

Keywords: energy storage, solid-electrolyte, ionic liquid, metal-organic-framework, electrochemistry, organic inorganic plastic crystal

Procedia PDF Downloads 51
23 Data Clustering Algorithm Based on Multi-Objective Periodic Bacterial Foraging Optimization with Two Learning Archives

Authors: Chen Guo, Heng Tang, Ben Niu

Abstract:

Clustering splits objects into different groups based on similarity, making the objects have higher similarity in the same group and lower similarity in different groups. Thus, clustering can be treated as an optimization problem to maximize the intra-cluster similarity or inter-cluster dissimilarity. In real-world applications, the datasets often have some complex characteristics: sparse, overlap, high dimensionality, etc. When facing these datasets, simultaneously optimizing two or more objectives can obtain better clustering results than optimizing one objective. However, except for the objectives weighting methods, traditional clustering approaches have difficulty in solving multi-objective data clustering problems. Due to this, evolutionary multi-objective optimization algorithms are investigated by researchers to optimize multiple clustering objectives. In this paper, the Data Clustering algorithm based on Multi-objective Periodic Bacterial Foraging Optimization with two Learning Archives (DC-MPBFOLA) is proposed. Specifically, first, to reduce the high computing complexity of the original BFO, periodic BFO is employed as the basic algorithmic framework. Then transfer the periodic BFO into a multi-objective type. Second, two learning strategies are proposed based on the two learning archives to guide the bacterial swarm to move in a better direction. On the one hand, the global best is selected from the global learning archive according to the convergence index and diversity index. On the other hand, the personal best is selected from the personal learning archive according to the sum of weighted objectives. According to the aforementioned learning strategies, a chemotaxis operation is designed. Third, an elite learning strategy is designed to provide fresh power to the objects in two learning archives. When the objects in these two archives do not change for two consecutive times, randomly initializing one dimension of objects can prevent the proposed algorithm from falling into local optima. Fourth, to validate the performance of the proposed algorithm, DC-MPBFOLA is compared with four state-of-art evolutionary multi-objective optimization algorithms and one classical clustering algorithm on evaluation indexes of datasets. To further verify the effectiveness and feasibility of designed strategies in DC-MPBFOLA, variants of DC-MPBFOLA are also proposed. Experimental results demonstrate that DC-MPBFOLA outperforms its competitors regarding all evaluation indexes and clustering partitions. These results also indicate that the designed strategies positively influence the performance improvement of the original BFO.

Keywords: data clustering, multi-objective optimization, bacterial foraging optimization, learning archives

Procedia PDF Downloads 113
22 Profiling Risky Code Using Machine Learning

Authors: Zunaira Zaman, David Bohannon

Abstract:

This study explores the application of machine learning (ML) for detecting security vulnerabilities in source code. The research aims to assist organizations with large application portfolios and limited security testing capabilities in prioritizing security activities. ML-based approaches offer benefits such as increased confidence scores, false positives and negatives tuning, and automated feedback. The initial approach using natural language processing techniques to extract features achieved 86% accuracy during the training phase but suffered from overfitting and performed poorly on unseen datasets during testing. To address these issues, the study proposes using the abstract syntax tree (AST) for Java and C++ codebases to capture code semantics and structure and generate path-context representations for each function. The Code2Vec model architecture is used to learn distributed representations of source code snippets for training a machine-learning classifier for vulnerability prediction. The study evaluates the performance of the proposed methodology using two datasets and compares the results with existing approaches. The Devign dataset yielded 60% accuracy in predicting vulnerable code snippets and helped resist overfitting, while the Juliet Test Suite predicted specific vulnerabilities such as OS-Command Injection, Cryptographic, and Cross-Site Scripting vulnerabilities. The Code2Vec model achieved 75% accuracy and a 98% recall rate in predicting OS-Command Injection vulnerabilities. The study concludes that even partial AST representations of source code can be useful for vulnerability prediction. The approach has the potential for automated intelligent analysis of source code, including vulnerability prediction on unseen source code. State-of-the-art models using natural language processing techniques and CNN models with ensemble modelling techniques did not generalize well on unseen data and faced overfitting issues. However, predicting vulnerabilities in source code using machine learning poses challenges such as high dimensionality and complexity of source code, imbalanced datasets, and identifying specific types of vulnerabilities. Future work will address these challenges and expand the scope of the research.

Keywords: code embeddings, neural networks, natural language processing, OS command injection, software security, code properties

Procedia PDF Downloads 79
21 An Adjoint-Based Method to Compute Derivatives with Respect to Bed Boundary Positions in Resistivity Measurements

Authors: Mostafa Shahriari, Theophile Chaumont-Frelet, David Pardo

Abstract:

Resistivity measurements are used to characterize the Earth’s subsurface. They are categorized into two different groups: (a) those acquired on the Earth’s surface, for instance, controlled source electromagnetic (CSEM) and Magnetotellurics (MT), and (b) those recorded with borehole logging instruments such as Logging-While-Drilling (LWD) devices. LWD instruments are mostly used for geo-steering purposes, i.e., to adjust dip and azimuthal angles of a well trajectory to drill along a particular geological target. Modern LWD tools measure all nine components of the magnetic field corresponding to three orthogonal transmitter and receiver orientations. In order to map the Earth’s subsurface and perform geo-steering, we invert measurements using a gradient-based method that utilizes the derivatives of the recorded measurements with respect to the inversion variables. For resistivity measurements, these inversion variables are usually the constant resistivity value of each layer and the bed boundary positions. It is well-known how to compute derivatives with respect to the constant resistivity value of each layer using semi-analytic or numerical methods. However, similar formulas for computing the derivatives with respect to bed boundary positions are unavailable. The main contribution of this work is to provide an adjoint-based formulation for computing derivatives with respect to the bed boundary positions. The key idea to obtain the aforementioned adjoint state formulations for the derivatives is to separate the tangential and normal components of the field and treat them differently. This formulation allows us to compute the derivatives faster and more accurately than with traditional finite differences approximations. In the presentation, we shall first derive a formula for computing the derivatives with respect to the bed boundary positions for the potential equation. Then, we shall extend our formulation to 3D Maxwell’s equations. Finally, by considering a 1D domain and reducing the dimensionality of the problem, which is a common practice in the inversion of resistivity measurements, we shall derive a formulation to compute the derivatives of the measurements with respect to the bed boundary positions using a 1.5D variational formulation. Then, we shall illustrate the accuracy and convergence properties of our formulations by comparing numerical results with the analytical derivatives for the potential equation. For the 1.5D Maxwell’s system, we shall compare our numerical results based on the proposed adjoint-based formulation vs those obtained with a traditional finite difference approach. Numerical results shall show that our proposed adjoint-based technique produces enhanced accuracy solutions while its cost is negligible, as opposed to the finite difference approach that requires the solution of one additional problem per derivative.

Keywords: inverse problem, bed boundary positions, electromagnetism, potential equation

Procedia PDF Downloads 158
20 Quantifying the Aspect of ‘Imagining’ in the Map of Dialogical inquiry

Authors: Chua Si Wen Alicia, Marcus Goh Tian Xi, Eunice Gan Ghee Wu, Helen Bound, Lee Liang Ying, Albert Lee

Abstract:

In a world full of rapid changes, people often need a set of skills to help them navigate an ever-changing workscape. These skills, often known as “future-oriented skills,” include learning to learn, critical thinking, understanding multiple perspectives, and knowledge creation. Future-oriented skills are typically assumed to be domain-general, applicable to multiple domains, and can be cultivated through a learning approach called Dialogical Inquiry. Dialogical Inquiry is known for its benefits of making sense of multiple perspectives, encouraging critical thinking, and developing learner’s capability to learn. However, it currently exists as a quantitative tool, which makes it hard to track and compare learning processes over time. With these concerns, the present research aimed to develop and validate a quantitative tool for the Map of Dialogical Inquiry, focusing Imagining aspect of learning. The Imagining aspect four dimensions: 1) speculative/ look for alternatives, 2) risk taking/ break rules, 3) create/ design, and 4) vision/ imagine. To do so, an exploratory literature review was conducted to better understand the dimensions of Imagining. This included deep-diving into the history of the creation of the Map of Dialogical Inquiry and a review on how “Imagining” has been conceptually defined in the field of social psychology, education, and beyond. Then, we synthesised and validated scales. These scales measured the dimension of Imagination and related concepts like creativity, divergent thinking regulatory focus, and instrumental risk. Thereafter, items were adapted from the aforementioned procured scales to form items that would contribute to the preliminary version of the Imagining Scale. For scale validation, 250 participants were recruited. A Confirmatory Factor Analysis (CFA) sought to establish dimensionality of the Imagining Scale with an iterative procedure in item removal. Reliability and validity of the scale’s dimensions were sought through measurements of Cronbach’s alpha, convergent validity, and discriminant validity. While CFA found that the distinction of Imagining’s four dimensions could not be validated, the scale was able to establish high reliability with a Cronbach alpha of .96. In addition, the convergent validity of the Imagining scale was established. A lack of strong discriminant validity may point to overlaps with other components of the Dialogical Map as a measure of learning. Thus, a holistic approach to forming the tool – encompassing all eight different components may be preferable.

Keywords: learning, education, imagining, pedagogy, dialogical teaching

Procedia PDF Downloads 67
19 Quantifying Processes of Relating Skills in Learning: The Map of Dialogical Inquiry

Authors: Eunice Gan Ghee Wu, Marcus Goh Tian Xi, Alicia Chua Si Wen, Helen Bound, Lee Liang Ying, Albert Lee

Abstract:

The Map of Dialogical Inquiry provides a conceptual basis of learning processes. According to the Map, dialogical inquiry motivates complex thinking, dialogue, reflection, and learner agency. For instance, classrooms that incorporated dialogical inquiry enabled learners to construct more meaning in their learning, to engage in self-reflection, and to challenge their ideas with different perspectives. While the Map contributes to the psychology of learning, its qualitative approach makes it hard to track and compare learning processes over time for both teachers and learners. Qualitative approach typically relies on open-ended responses, which can be time-consuming and resource-intensive. With these concerns, the present research aimed to develop and validate a quantifiable measure for the Map. Specifically, the Map of Dialogical Inquiry reflects the eight different learning processes and perspectives employed during a learner’s experience. With a focus on interpersonal and emotional learning processes, the purpose of the present study is to construct and validate a scale to measure the “Relating” aspect of learning. According to the Map, the Relating aspect of learning contains four conceptual components: using intuition and empathy, seeking personal meaning, building relationships and meaning with others, and likes stories and metaphors. All components have been shown to benefit learning in past research. This research began with a literature review with the goal of identifying relevant scales in the literature. These scales were used as a basis for item development, guided by the four conceptual dimensions in the “Relating” aspect of learning, resulting in a pool of 47 preliminary items. Then, all items were administered to 200 American participants via an online survey along with other scales of learning. Dimensionality, reliability, and validity of the “Relating” scale was assessed. Data were submitted to a confirmatory factor analysis (CFA), revealing four distinct components and items. Items with lower factor loadings were removed in an iterative manner, resulting in 34 items in the final scale. CFA also revealed that the “Relating” scale was a four-factor model, following its four distinct components as described in the Map of Dialogical Inquiry. In sum, this research was able to develop a quantitative scale for the “Relating” aspect of the Map of Dialogical Inquiry. By representing learning as numbers, users, such as educators and learners, can better track, evaluate, and compare learning processes over time in an efficient manner. More broadly, this scale may also be used as a learning tool in lifelong learning.

Keywords: lifelong learning, scale development, dialogical inquiry, relating, social and emotional learning, socio-affective intuition, empathy, narrative identity, perspective taking, self-disclosure

Procedia PDF Downloads 112