Search results for: Clustering Validation
588 TheAnalyzer: Clustering-Based System for Improving Business Productivity by Analyzing User Profiles to Enhance Human-Computer Interaction
Authors: D. S. A. Nanayakkara, K. J. P. G. Perera
Abstract:
E-commerce platforms have revolutionized the shopping experience, offering convenient ways for consumers to make purchases. To improve interactions with customers and optimize marketing strategies, it is essential for businesses to understand user behavior, preferences, and needs on these platforms. This paper focuses on recommending businesses to customize interactions with users based on their behavioral patterns, leveraging data-driven analysis and machine learning techniques. Businesses can improve engagement and boost the adoption of e-commerce platforms by aligning behavioral patterns with user goals of usability and satisfaction. We propose TheAnalyzer, a clustering-based system designed to enhance business productivity by analyzing user-profiles and improving human-computer interaction. TheAnalyzer seamlessly integrates with business applications, collecting relevant data points based on users' natural interactions without additional burdens such as questionnaires or surveys. It defines five key user analytics as features for its dataset, which are easily captured through users' interactions with e-commerce platforms. This research presents a study demonstrating the successful distinction of users into specific groups based on the five key analytics considered by TheAnalyzer. With the assistance of domain experts, customized business rules can be attached to each group, enabling TheAnalyzer to influence business applications and provide an enhanced personalized user experience. The outcomes are evaluated quantitatively and qualitatively, demonstrating that utilizing TheAnalyzer’s capabilities can optimize business outcomes, enhance customer satisfaction, and drive sustainable growth. The findings of this research contribute to the advancement of personalized interactions in e-commerce platforms. By leveraging user behavioral patterns and analyzing both new and existing users, businesses can effectively tailor their interactions to improve customer satisfaction, loyalty and ultimately drive sales.
Keywords: Data clustering, data standardization, dimensionality reduction, human-computer interaction, user profiling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 230587 Novel Hybrid Method for Gene Selection and Cancer Prediction
Authors: Liping Jing, Michael K. Ng, Tieyong Zeng
Abstract:
Microarray data profiles gene expression on a whole genome scale, therefore, it provides a good way to study associations between gene expression and occurrence or progression of cancer. More and more researchers realized that microarray data is helpful to predict cancer sample. However, the high dimension of gene expressions is much larger than the sample size, which makes this task very difficult. Therefore, how to identify the significant genes causing cancer becomes emergency and also a hot and hard research topic. Many feature selection algorithms have been proposed in the past focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for stable gene selection and efficient cancer prediction . The proposed framework first performs clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects the significant genes in each group with Bayesian Lasso and important gene groups with group Lasso, and finally builds prediction model based on the shrinkage gene space with efficient classification algorithm (such as, SVM, 1NN, Regression and etc.). Experiment results on real world data show that the proposed framework often outperforms the existing feature selection and prediction methods, say SAM, IG and Lasso-type prediction model.Keywords: Gene Selection, Cancer Prediction, Lasso, Clustering, Classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2044586 Tree Based Data Fusion Clustering Routing Algorithm for Illimitable Network Administration in Wireless Sensor Network
Authors: Y. Harold Robinson, M. Rajaram, E. Golden Julie, S. Balaji
Abstract:
In wireless sensor networks, locality and positioning information can be captured using Global Positioning System (GPS). This message can be congregated initially from spot to identify the system. Users can retrieve information of interest from a wireless sensor network (WSN) by injecting queries and gathering results from the mobile sink nodes. Routing is the progression of choosing optimal path in a mobile network. Intermediate node employs permutation of device nodes into teams and generating cluster heads that gather the data from entity cluster’s node and encourage the collective data to base station. WSNs are widely used for gathering data. Since sensors are power-constrained devices, it is quite vital for them to reduce the power utilization. A tree-based data fusion clustering routing algorithm (TBDFC) is used to reduce energy consumption in wireless device networks. Here, the nodes in a tree use the cluster formation, whereas the elevation of the tree is decided based on the distance of the member nodes to the cluster-head. Network simulation shows that this scheme improves the power utilization by the nodes, and thus considerably improves the lifetime.
Keywords: WSN, TBDFC, LEACH, PEGASIS, TREEPSI.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1116585 Evaluating the Validity of Computational Fluid Dynamics Model of Dispersion in a Complex Urban Geometry Using Two Sets of Experimental Measurements
Authors: Mohammad R. Kavian Nezhad, Carlos F. Lange, Brian A. Fleck
Abstract:
This research presents the validation study of a computational fluid dynamics (CFD) model developed to simulate the scalar dispersion emitted from rooftop sources around the buildings at the University of Alberta North Campus. The ANSYS CFX code was used to perform the numerical simulation of the wind regime and pollutant dispersion by solving the 3D steady Reynolds-averaged Navier-Stokes (RANS) equations on a building-scale high-resolution grid. The validation study was performed in two steps. First, the CFD model performance in 24 cases (eight wind directions and three wind speeds) was evaluated by comparing the predicted flow fields with the available data from the previous measurement campaign designed at the North Campus, using the standard deviation method (SDM), while the estimated results of the numerical model showed maximum average percent errors of approximately 53% and 37% for wind incidents from the North and Northwest, respectively. Good agreement with the measurements was observed for the other six directions, with an average error of less than 30%. In the second step, the reliability of the implemented turbulence model, numerical algorithm, modeling techniques, and the grid generation scheme was further evaluated using the Mock Urban Setting Test (MUST) dispersion dataset. Different statistical measures, including the fractional bias (FB), the mean geometric bias (MG), and the normalized mean square error (NMSE), were used to assess the accuracy of the predicted dispersion field. Our CFD results are in very good agreement with the field measurements.
Keywords: CFD, plume dispersion, complex urban geometry, validation study, wind flow.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 379584 Development and Validation of an Instrument Measuring the Coping Strategies in Situations of Stress
Authors: Lucie Côté, Martin Lauzier, Guy Beauchamp, France Guertin
Abstract:
Stress causes deleterious effects to the physical, psychological and organizational levels, which highlight the need to use effective coping strategies to deal with it. Several coping models exist, but they don’t integrate the different strategies in a coherent way nor do they take into account the new research on the emotional coping and acceptance of the stressful situation. To fill these gaps, an integrative model incorporating the main coping strategies was developed. This model arises from the review of the scientific literature on coping and from a qualitative study carried out among workers with low or high levels of stress, as well as from an analysis of clinical cases. The model allows one to understand under what circumstances the strategies are effective or ineffective and to learn how one might use them more wisely. It includes Specific Strategies in controllable situations (the Modification of the Situation and the Resignation-Disempowerment), Specific Strategies in non-controllable situations (Acceptance and Stubborn Relentlessness) as well as so-called General Strategies (Wellbeing and Avoidance). This study is intended to undertake and present the process of development and validation of an instrument to measure coping strategies based on this model. An initial pool of items has been generated from the conceptual definitions and three expert judges have validated the content. Of these, 18 items have been selected for a short form questionnaire. A sample of 300 students and employees from a Quebec university was used for the validation of the questionnaire. Concerning the reliability of the instrument, the indices observed following the inter-rater agreement (Krippendorff’s alpha) and the calculation of the coefficients for internal consistency (Cronbach's alpha) are satisfactory. To evaluate the construct validity, a confirmatory factor analysis using MPlus supports the existence of a model with six factors. The results of this analysis suggest also that this configuration is superior to other alternative models. The correlations show that the factors are only loosely related to each other. Overall, the analyses carried out suggest that the instrument has good psychometric qualities and demonstrates the relevance of further work to establish predictive validity and reconfirm its structure. This instrument will help researchers and clinicians better understand and assess coping strategies to cope with stress and thus prevent mental health issues.
Keywords: Acceptance, coping strategies, measurement instrument, questionnaire, stress, validation process.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 923583 Validation of the Linear Trend Estimation Technique for Prediction of Average Water and Sewerage Charge Rate Prices in the Czech Republic
Authors: Aneta Oblouková, Eva Vítková
Abstract:
The article deals with the issue of water and sewerage charge rate prices in the Czech Republic. The research is specifically focused on the analysis of the development of the average prices of water and sewerage charge rate in the Czech Republic in 1994-2021 and on the validation of the chosen methodology relevant for the prediction of the development of the average prices of water and sewerage charge rate in the Czech Republic. The research is based on data collection. The data for this research were obtained from the Czech Statistical Office. The aim of the paper is to validate the relevance of the mathematical linear trend estimate technique for the calculation of the predicted average prices of water and sewerage charge rates. The real values of the average prices of water and sewerage charge rates in the Czech Republic in 1994-2018 were obtained from the Czech Statistical Office and were converted into a mathematical equation. The same type of real data was obtained from the Czech Statistical Office for 2019-2021. Prediction of the average prices of water and sewerage charge rates in the Czech Republic in 2019-2021 was also calculated using a chosen method – a linear trend estimation technique. The values obtained from the Czech Statistical Office and the values calculated using the chosen methodology were subsequently compared. The research result is a validation of the chosen mathematical technique to be a suitable technique for this research.
Keywords: Czech Republic, linear trend estimation, price prediction, water and sewerage charge rate.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 209582 Study of the Late Phase of Core Degradation during Reflooding by Safety Injection System for VVER1000 with ASTECv2 Computer Code
Authors: Antoaneta Stefanova, Rositsa Gencheva, Pavlin Groudev
Abstract:
This paper presents the modeling approach in SBO sequence for VVER 1000 reactors and describes the reactor core behavior at late in-vessel phase in case of late reflooding by HPIS and gives preliminary results for the ASTECv2 validation. The work is focused on investigation of plant behavior during total loss of power and the operator actions. The main goal of these analyses is to assess the phenomena arising during the Station blackout (SBO) followed by primary side high pressure injection system (HPIS) reflooding of already damaged reactor core at very late “in-vessel” phase. The purpose of the analyses is to define how the later HPIS switching on can delay the time of vessel failure or possibly avoid vessel failure. The times for HPP injection were chosen based on previously performed investigations.Keywords: VVER, operator action validation, reflooding of overheated reactor core, ASTEC computer code.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1442581 Validation of Reverse Engineered Web Application Models
Authors: Carlo Bellettini, Alessandro Marchetto, Andrea Trentini
Abstract:
Web applications have become complex and crucial for many firms, especially when combined with areas such as CRM (Customer Relationship Management) and BPR (Business Process Reengineering). The scientific community has focused attention to Web application design, development, analysis, testing, by studying and proposing methodologies and tools. Static and dynamic techniques may be used to analyze existing Web applications. The use of traditional static source code analysis may be very difficult, for the presence of dynamically generated code, and for the multi-language nature of the Web. Dynamic analysis may be useful, but it has an intrinsic limitation, the low number of program executions used to extract information. Our reverse engineering analysis, used into our WAAT (Web Applications Analysis and Testing) project, applies mutational techniques in order to exploit server side execution engines to accomplish part of the dynamic analysis. This paper studies the effects of mutation source code analysis applied to Web software to build application models. Mutation-based generated models may contain more information then necessary, so we need a pruning mechanism.Keywords: Validation, Dynamic Analysis, MutationAnalysis, Reverse Engineering, Web Applications
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1625580 Validation of the Career Motivation Scale among Chinese University and Vocational College Teachers
Authors: Wei Zhang, Lifen Zhao
Abstract:
The present study aims to translate and validate the Career Motivation Scale among Chinese University and vocational college teachers. Exploratory factor analysis supported a three-factor structure that was consistent with the original structure of career motivation: career insight, career identity, and career resilience. Confirmatory factor analysis showed that a second-order three-factor model with correlated measurement errors best fit the data. Configural, metric, and scalar invariance models were tested, demonstrating that the Chinese version of the Career Motivation Scale did not differ across groups of school type, educational level, and working years in current institutions. The concurrent validity of the Chinese Career Motivation Scale was confirmed by its significant correlations with work engagement, career adaptability, career satisfaction, job crafting, and intention to quit. The results of the study indicated that the Chinese Career Motivation Scale was a valid and reliable measure of career motivation among university and vocational college teachers in China.
Keywords: Career motivation scale, Chinese university and vocational college teachers, measurement invariance, validation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 371579 Optical Flow Technique for Supersonic Jet Measurements
Authors: H. D. Lim, Jie Wu, T. H. New, Shengxian Shi
Abstract:
This paper outlines the development of an experimental technique in quantifying supersonic jet flows, in an attempt to avoid seeding particle problems frequently associated with particle-image velocimetry (PIV) techniques at high Mach numbers. Based on optical flow algorithms, the idea behind the technique involves using high speed cameras to capture Schlieren images of the supersonic jet shear layers, before they are subjected to an adapted optical flow algorithm based on the Horn-Schnuck method to determine the associated flow fields. The proposed method is capable of offering full-field unsteady flow information with potentially higher accuracy and resolution than existing point-measurements or PIV techniques. Preliminary study via numerical simulations of a circular de Laval jet nozzle successfully reveals flow and shock structures typically associated with supersonic jet flows, which serve as useful data for subsequent validation of the optical flow based experimental results. For experimental technique, a Z-type Schlieren setup is proposed with supersonic jet operated in cold mode, stagnation pressure of 4 bar and exit Mach of 1.5. High-speed singleframe or double-frame cameras are used to capture successive Schlieren images. As implementation of optical flow technique to supersonic flows remains rare, the current focus revolves around methodology validation through synthetic images. The results of validation test offers valuable insight into how the optical flow algorithm can be further improved to improve robustness and accuracy. Despite these challenges however, this supersonic flow measurement technique may potentially offer a simpler way to identify and quantify the fine spatial structures within the shock shear layer.
Keywords: Schlieren, optical flow, supersonic jets, shock shear layer.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1904578 Ontology Population via NLP Techniques in Risk Management
Authors: Jawad Makki, Anne-Marie Alquier, Violaine Prince
Abstract:
In this paper we propose an NLP-based method for Ontology Population from texts and apply it to semi automatic instantiate a Generic Knowledge Base (Generic Domain Ontology) in the risk management domain. The approach is semi-automatic and uses a domain expert intervention for validation. The proposed approach relies on a set of Instances Recognition Rules based on syntactic structures, and on the predicative power of verbs in the instantiation process. It is not domain dependent since it heavily relies on linguistic knowledge. A description of an experiment performed on a part of the ontology of the PRIMA1 project (supported by the European community) is given. A first validation of the method is done by populating this ontology with Chemical Fact Sheets from Environmental Protection Agency2. The results of this experiment complete the paper and support the hypothesis that relying on the predicative power of verbs in the instantiation process improves the performance.Keywords: Information Extraction, Instance Recognition Rules, Ontology Population, Risk Management, Semantic analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1536577 Visualization and Indexing of Spectral Databases
Authors: Tibor Kulcsar, Gabor Sarossy, Gabor Bereznai, Robert Auer, Janos Abonyi
Abstract:
On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.
Keywords: indexing high dimensional databases, dimensional reduction, clustering, similarity, k-nn algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1771576 Specific Emitter Identification Based on Refined Composite Multiscale Dispersion Entropy
Authors: Shaoying Guo, Yanyun Xu, Meng Zhang, Weiqing Huang
Abstract:
The wireless communication network is developing rapidly, thus the wireless security becomes more and more important. Specific emitter identification (SEI) is an vital part of wireless communication security as a technique to identify the unique transmitters. In this paper, a SEI method based on multiscale dispersion entropy (MDE) and refined composite multiscale dispersion entropy (RCMDE) is proposed. The algorithms of MDE and RCMDE are used to extract features for identification of five wireless devices and cross-validation support vector machine (CV-SVM) is used as the classifier. The experimental results show that the total identification accuracy is 99.3%, even at low signal-to-noise ratio(SNR) of 5dB, which proves that MDE and RCMDE can describe the communication signal series well. In addition, compared with other methods, the proposed method is effective and provides better accuracy and stability for SEI.Keywords: Cross-validation support vector machine, refined composite multiscale dispersion entropy, specific emitter identification, transient signal, wireless communication device.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 859575 Research Design for Developing and Validating Ice-Hockey Team Diagnostics Scale
Authors: Gergely Géczi
Abstract:
In the modern world, ice-hockey (and in a broader sense, team sports) is becoming an increasingly popular field of entertainment. Although the main element is most likely perceived as the show itself, winning is an inevitable part of the successful operation of any sports team. In this paper, the author creates a research design allowing to develop and validate an ice-hockey team-focused diagnostics scale, which enables researchers and practitioners to identify the problems associated with underperforming teams. The construction of the scale starts with personal interviews with experts of the field, carefully chosen from Hungarian ice-hockey sector. Based on the interviews, the author is shown to be in the position to create the categories and the relevant items for the scale. When constructed, the next step is the validation process on a Hungarian sample. Data for validation are acquired through reaching the licensed database of the Hungarian Ice-Hockey Federation involving Hungarian ice-hockey coaches and players. The Ice-Hockey Team Diagnostics Scale is to be created to orientate practitioners in understanding both effective and underperforming team work.
Keywords: Diagnostics Scale, effective versus underperforming team work, ice-hockey, research design.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 553574 Multi-Objective Evolutionary Computation Based Feature Selection Applied to Behaviour Assessment of Children
Authors: F. Jiménez, R. Jódar, M. Martín, G. Sánchez, G. Sciavicco
Abstract:
Abstract—Attribute or feature selection is one of the basic strategies to improve the performances of data classification tasks, and, at the same time, to reduce the complexity of classifiers, and it is a particularly fundamental one when the number of attributes is relatively high. Its application to unsupervised classification is restricted to a limited number of experiments in the literature. Evolutionary computation has already proven itself to be a very effective choice to consistently reduce the number of attributes towards a better classification rate and a simpler semantic interpretation of the inferred classifiers. We present a feature selection wrapper model composed by a multi-objective evolutionary algorithm, the clustering method Expectation-Maximization (EM), and the classifier C4.5 for the unsupervised classification of data extracted from a psychological test named BASC-II (Behavior Assessment System for Children - II ed.) with two objectives: Maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We present a methodology to integrate feature selection for unsupervised classification, model evaluation, decision making (to choose the most satisfactory model according to a a posteriori process in a multi-objective context), and testing. We compare the performance of the classifier obtained by the multi-objective evolutionary algorithms ENORA and NSGA-II, and the best solution is then validated by the psychologists that collected the data.Keywords: Feature selection, multi-objective evolutionary computation, unsupervised classification, behavior assessment system for children.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1448573 An Examination and Validation of the Theoretical Resistivity-Temperature Relationship for Conductors
Authors: Fred Lacy
Abstract:
Electrical resistivity is a fundamental parameter of metals or electrical conductors. Since resistivity is a function of temperature, in order to completely understand the behavior of metals, a temperature dependent theoretical model is needed. A model based on physics principles has recently been developed to obtain an equation that relates electrical resistivity to temperature. This equation is dependent upon a parameter associated with the electron travel time before being scattered, and a parameter that relates the energy of the atoms and their separation distance. Analysis of the energy parameter reveals that the equation is optimized if the proportionality term in the equation is not constant but varies over the temperature range. Additional analysis reveals that the theoretical equation can be used to determine the mean free path of conduction electrons, the number of defects in the atomic lattice, and the ‘equivalent’ charge associated with the metallic bonding of the atoms. All of this analysis provides validation for the theoretical model and provides insight into the behavior of metals where performance is affected by temperatures (e.g., integrated circuits and temperature sensors).
Keywords: Callendar–van Dusen, conductivity, mean free path, resistance temperature detector, temperature sensor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2186572 Five-Phase Induction Motor Drive System Driven by Five-Phase Packed U Cell Inverter: Its Modeling and Performance Evaluation
Authors: Mohd Tariq
Abstract:
The three phase system drives produce the problem of more torque pulsations and harmonics. This issue prevents the smooth operation of the drives and it also induces the amount of heat generated thus resulting in an increase in power loss. Higher phase system offers smooth operation of the machines with greater power capacity. Five phase variable-speed induction motor drives are commonly used in various industrial and commercial applications like tractions, electrical vehicles, ship propulsions and conveyor belt drive system. In this work, a comparative analysis of the different modulation schemes applied on the five-level five-phase Packed U Cell (PUC) inverter fed induction motor drives is presented. The performance of the inverter is greatly affected with the modulation schemes applied. The system is modeled, designed, and implemented in MATLAB®/Simulink environment. Experimental validation is done for the prototype of single phase, whereas five phase experimental validation is proposed in the future works.Keywords: Packed U-Cell inverter, pulse width modulation, five-phase system, induction motor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 734571 Feature Based Unsupervised Intrusion Detection
Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein
Abstract:
The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.
Keywords: Information Gain (IG), Intrusion Detection System (IDS), K-means Clustering, Weka.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2778570 Web Proxy Detection via Bipartite Graphs and One-Mode Projections
Authors: Zhipeng Chen, Peng Zhang, Qingyun Liu, Li Guo
Abstract:
With the Internet becoming the dominant channel for business and life, many IPs are increasingly masked using web proxies for illegal purposes such as propagating malware, impersonate phishing pages to steal sensitive data or redirect victims to other malicious targets. Moreover, as Internet traffic continues to grow in size and complexity, it has become an increasingly challenging task to detect the proxy service due to their dynamic update and high anonymity. In this paper, we present an approach based on behavioral graph analysis to study the behavior similarity of web proxy users. Specifically, we use bipartite graphs to model host communications from network traffic and build one-mode projections of bipartite graphs for discovering social-behavior similarity of web proxy users. Based on the similarity matrices of end-users from the derived one-mode projection graphs, we apply a simple yet effective spectral clustering algorithm to discover the inherent web proxy users behavior clusters. The web proxy URL may vary from time to time. Still, the inherent interest would not. So, based on the intuition, by dint of our private tools implemented by WebDriver, we examine whether the top URLs visited by the web proxy users are web proxies. Our experiment results based on real datasets show that the behavior clusters not only reduce the number of URLs analysis but also provide an effective way to detect the web proxies, especially for the unknown web proxies.
Keywords: Bipartite graph, clustering, one-mode projection, web proxy detection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 747569 Statistics of Exon Lengths in Animals, Plants, Fungi, and Protists
Authors: Alexander Kaplunovsky, Vladimir Khailenko, Alexander Bolshoy, Shara Atambayeva, AnatoliyIvashchenko
Abstract:
Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from the RNA transcripts before translation into a protein. The exon-intron structures of different eukaryotic species are quite different from each other, and the evolution of such structures raises many questions. We try to address some of these questions using statistical analysis of whole genomes. We go through all the protein-coding genes in a genome and study correlations between the net length of all the exons in a gene, the number of the exons, and the average length of an exon. We also take average values of these features for each chromosome and study correlations between those averages on the chromosomal level. Our data show universal features of exon-intron structures common to animals, plants, and protists (specifically, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Cryptococcus neoformans, Homo sapiens, Mus musculus, Oryza sativa, and Plasmodium falciparum). We have verified linear correlation between the number of exons in a gene and the length of a protein coded by the gene, while the protein length increases in proportion to the number of exons. On the other hand, the average length of an exon always decreases with the number of exons. Finally, chromosome clustering based on average chromosome properties and parameters of linear regression between the number of exons in a gene and the net length of those exons demonstrates that these average chromosome properties are genome-specific features.
Keywords: Comparative genomics, exon-intron structure, eukaryotic clustering, linear regression.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2575568 The Survey Research and Evaluation of Green Residential Building Based on the Improved Group Analytical Hierarchy Process Method in Yinchuan
Abstract:
Due to the economic downturn and the deterioration of the living environment, the development of residential buildings as high energy consuming building is gradually changing from “extensive” to green building in China. So, the evaluation system of green building is continuously improved, but the current evaluation work has the following problems: (1) There are differences in the cost of the actual investment and the purchasing power of residents, also construction target of green residential building is single and lacks multi-objective performance development. (2) Green building evaluation lacks regional characteristics and cannot reflect the different regional residents demand. (3) In the process of determining the criteria weight, the experts’ judgment matrix is difficult to meet the requirement of consistency. Therefore, to solve those problems, questionnaires which are about the green residential building for Ningxia area are distributed, and the results of questionnaires can feedback the purchasing power of residents and the acceptance of the green building cost. Secondly, combined with the geographical features of Ningxia minority areas, the evaluation criteria system of green residential building is constructed. Finally, using the improved group AHP method and the grey clustering method, the criteria weight is determined, and a real case is evaluated, which is located in Xing Qing district, Ningxia. A conclusion can be obtained that the professional evaluation for this project and good social recognition is basically the same.
Keywords: Evaluation, green residential building, grey clustering method, group AHP.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 828567 Modelling, Simulation and Validation of Plastic Zone Size during Deformation of Mild Steel
Authors: S. O. Adeosun, E. I. Akpan, S. A. Balogun, O. O. Taiwo
Abstract:
A model to predict the plastic zone size for material under plane stress condition has been developed and verified experimentally. The developed model is a function of crack size, crack angle and material property (dislocation density). Simulation and validation results show that the model developed show good agreement with experimental results. Samples of low carbon steel (0.035%C) with included surface crack angles of 45o, 50o, 60o, 70o and 90o and crack depths of 2mm and 4mm were subjected to low strain rate between 0.48 x 10-3 s-1 – 2.38 x 10-3 s-1. The mechanical properties studied were ductility, tensile strength, modulus of elasticity, yield strength, yield strain, stress at fracture and fracture toughness. The experimental study shows that strain rate has no appreciable effect on the size of plastic zone while crack depth and crack angle plays an imperative role in determining the size of the plastic zone of mild steel materials.Keywords: Applied stress, crack angle, crack size, material property, plastic zone size, strain rate.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1609566 Effect of Mass on Bus Superstructure Strength Having Rollover Crash
Authors: Mustafa Bin Yusof, Mohammad Amirul Affiz Bin Afripin
Abstract:
Safety of bus journey is a fundamental concern. Risk of injuries and fatalities is severe when bus superstructure fails during rollover accident. Adequate design and sufficient strength of bus superstructure can reduce the number of injuries and fatalities. This paper deals with structural analysis of bus superstructure undergoes rollover event. Several value of mass will be varied in multiple simulations. The purpose of this work is to analyze structural response of bus superstructure in terms of deformation, stress and strain under several loading and constraining conditions. A complete bus superstructure with forty four passenger-s capability was developed using finite element analysis software. Simulations have been conducted to observe the effect of total mass of bus on the strength of superstructure. These simulations are following United Nation Economic Commission of Europe regulation 66 which focuses on strength of large vehicle superstructure. Validation process had been done using simple box model experiment and results obtained are comparing with simulation results. Inputs data from validation process had been used in full scale simulation. Analyses suggested that, the failure of bus superstructure during rollover situation is basically dependent on the total mass of bus and on the strength of bus superstructure.
Keywords: Bus, rollover, superstructure strength, UNECE regulation 66.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2558565 Microscopic Emission and Fuel Consumption Modeling for Light-duty Vehicles Using Portable Emission Measurement System Data
Authors: Wei Lei, Hui Chen, Lin Lu
Abstract:
Microscopic emission and fuel consumption models have been widely recognized as an effective method to quantify real traffic emission and energy consumption when they are applied with microscopic traffic simulation models. This paper presents a framework for developing the Microscopic Emission (HC, CO, NOx, and CO2) and Fuel consumption (MEF) models for light-duty vehicles. The variable of composite acceleration is introduced into the MEF model with the purpose of capturing the effects of historical accelerations interacting with current speed on emission and fuel consumption. The MEF model is calibrated by multivariate least-squares method for two types of light-duty vehicle using on-board data collected in Beijing, China by a Portable Emission Measurement System (PEMS). The instantaneous validation results shows the MEF model performs better with lower Mean Absolute Percentage Error (MAPE) compared to other two models. Moreover, the aggregate validation results tells the MEF model produces reasonable estimations compared to actual measurements with prediction errors within 12%, 10%, 19%, and 9% for HC, CO, NOx emissions and fuel consumption, respectively.Keywords: Emission, Fuel consumption, Light-duty vehicle, Microscopic, Modeling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2009564 Comparative Study of Tensile Properties of Cast and Hot Forged Alumina Nanoparticle Reinforced Composites
Authors: S. Ghanaraja, Subrata Ray, S. K. Nath
Abstract:
Particle reinforced Metal Matrix Composite (MMC) succeeds in synergizing the metallic matrix with ceramic particle reinforcements to result in improved strength, particularly at elevated temperatures, but adversely it affects the ductility of the matrix because of agglomeration and porosity. The present study investigates the outcome of tensile properties in a cast and hot forged composite reinforced simultaneously with coarse and fine particles. Nano-sized alumina particles have been generated by milling mixture of aluminum and manganese dioxide powders. Milled particles after drying are added to molten metal and the resulting slurry is cast. The microstructure of the composites shows good distribution of both the size categories of particles without significant clustering. The presence of nanoparticles along with coarser particles in a composite improves both strength and ductility considerably. Delay in debonding of coarser particles to higher stress is due to reduced mismatch in extension caused by increased strain hardening in presence of the nanoparticles. However, higher addition of powder mix beyond a limit results in deterioration of mechanical properties, possibly due to clustering of nanoparticles. The porosity in cast composite generally increases with the increasing addition of powder mix as observed during process and on forging it has got reduced. The base alloy and nanocomposites show improvement in flow stress which could be attributed to lowering of porosity and grain refinement as a consequence of forging.
Keywords: Aluminum, alumina, nanoparticle reinforced composites, porosity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1477563 Educational Data Mining: The Case of Department of Mathematics and Computing in the Period 2009-2018
Authors: M. Sitoe, O. Zacarias
Abstract:
University education is influenced by several factors that range from the adoption of strategies to strengthen the whole process to the academic performance improvement of the students themselves. This work uses data mining techniques to develop a predictive model to identify students with a tendency to evasion and retention. To this end, a database of real students’ data from the Department of University Admission (DAU) and the Department of Mathematics and Informatics (DMI) was used. The data comprised 388 undergraduate students admitted in the years 2009 to 2014. The Weka tool was used for model building, using three different techniques, namely: K-nearest neighbor, random forest, and logistic regression. To allow for training on multiple train-test splits, a cross-validation approach was employed with a varying number of folds. To reduce bias variance and improve the performance of the models, ensemble methods of Bagging and Stacking were used. After comparing the results obtained by the three classifiers, Logistic Regression using Bagging with seven folds obtained the best performance, showing results above 90% in all evaluated metrics: accuracy, rate of true positives, and precision. Retention is the most common tendency.
Keywords: Evasion and retention, cross validation, bagging, stacking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 126562 PoPCoRN: A Power-Aware Periodic Surveillance Scheme in Convex Region using Wireless Mobile Sensor Networks
Authors: A. K. Prajapati
Abstract:
In this paper, the periodic surveillance scheme has been proposed for any convex region using mobile wireless sensor nodes. A sensor network typically consists of fixed number of sensor nodes which report the measurements of sensed data such as temperature, pressure, humidity, etc., of its immediate proximity (the area within its sensing range). For the purpose of sensing an area of interest, there are adequate number of fixed sensor nodes required to cover the entire region of interest. It implies that the number of fixed sensor nodes required to cover a given area will depend on the sensing range of the sensor as well as deployment strategies employed. It is assumed that the sensors to be mobile within the region of surveillance, can be mounted on moving bodies like robots or vehicle. Therefore, in our scheme, the surveillance time period determines the number of sensor nodes required to be deployed in the region of interest. The proposed scheme comprises of three algorithms namely: Hexagonalization, Clustering, and Scheduling, The first algorithm partitions the coverage area into fixed sized hexagons that approximate the sensing range (cell) of individual sensor node. The clustering algorithm groups the cells into clusters, each of which will be covered by a single sensor node. The later determines a schedule for each sensor to serve its respective cluster. Each sensor node traverses all the cells belonging to the cluster assigned to it by oscillating between the first and the last cell for the duration of its life time. Simulation results show that our scheme provides full coverage within a given period of time using few sensors with minimum movement, less power consumption, and relatively less infrastructure cost.Keywords: Sensor Network, Graph Theory, MSN, Communication.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1466561 Accelerating GLA with an M-Tree
Authors: Olli Luoma, Johannes Tuikkala, Olli Nevalainen
Abstract:
In this paper, we propose a novel improvement for the generalized Lloyd Algorithm (GLA). Our algorithm makes use of an M-tree index built on the codebook which makes it possible to reduce the number of distance computations when the nearest code words are searched. Our method does not impose the use of any specific distance function, but works with any metric distance, making it more general than many other fast GLA variants. Finally, we present the positive results of our performance experiments.Keywords: Clustering, GLA, M-Tree, Vector Quantization .
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1527560 Predictive Analytics of Student Performance Determinants in Education
Authors: Mahtab Davari, Charles Edward Okon, Somayeh Aghanavesi
Abstract:
Every institute of learning is usually interested in the performance of enrolled students. The level of these performances determines the approach an institute of study may adopt in rendering academic services. The focus of this paper is to evaluate students' academic performance in given courses of study using machine learning methods. This study evaluated various supervised machine learning classification algorithms such as Logistic Regression (LR), Support Vector Machine (SVM), Random Forest, Decision Tree, K-Nearest Neighbors, Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis, using selected features to predict study performance. The accuracy, precision, recall, and F1 score obtained from a 5-Fold Cross-Validation were used to determine the best classification algorithm to predict students’ performances. SVM (using a linear kernel), LDA, and LR were identified as the best-performing machine learning methods. Also, using the LR model, this study identified students' educational habits such as reading and paying attention in class as strong determinants for a student to have an above-average performance. Other important features include the academic history of the student and work. Demographic factors such as age, gender, high school graduation, etc., had no significant effect on a student's performance.
Keywords: Student performance, supervised machine learning, prediction, classification, cross-validation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 552559 Development and Validation of a HPLC Method for 6-Gingerol and 6-Shogaol in Joint Pain Relief Gel Containing Ginger (Zingiber officinale)
Authors: Tanwarat Kajsongkram, Saowalux Rotamporn, Sirinat Limbunruang, Sirinan Thubthimthed
Abstract:
High Performance Liquid Chromatography (HPLC) method was developed and validated for simultaneous estimation of 6-Gingerol(6G) and 6-Shogaol(6S) in joint pain relief gel containing ginger extract. The chromatographic separation was achieved by using C18 column, 150 x 4.6mm i.d., 5μ Luna, mobile phase containing acetonitrile and water (gradient elution). The flow rate was 1.0 ml/min and the absorbance was monitored at 282 nm. The proposed method was validated in terms of the analytical parameters such as specificity, accuracy, precision, linearity, range, limit of detection (LOD), limit of quantification (LOQ), and determined based on the International Conference on Harmonization (ICH) guidelines. The linearity ranges of 6G and 6S were obtained over 20- 60 and 6-18 μg/ml respectively. Good linearity was observed over the above-mentioned range with linear regression equation Y= 11016x- 23778 for 6G and Y = 19276x-19604 for 6S (x is concentration of analytes in μg/ml and Y is peak area). The value of correlation coefficient was found to be 0.9994 for both markers. The limit of detection (LOD) and limit of quantification (LOQ) for 6G were 0.8567 and 2.8555 μg/ml and for 6S were 0.3672 and 1.2238 μg/ml respectively. The recovery range for 6G and 6S were found to be 91.57 to 102.36 % and 84.73 to 92.85 % for all three spiked levels. The RSD values from repeated extractions for 6G and 6S were 3.43 and 3.09% respectively. The validation of developed method on precision, accuracy, specificity, linearity, and range were also performed with well-accepted results.
Keywords: Ginger, 6-gingerol, HPLC, 6-shogaol.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3424