Search results for: statistical classifiers
3958 Machine Learning Methods for Network Intrusion Detection
Authors: Mouhammad Alkasassbeh, Mohammad Almseidin
Abstract:
Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE. Procedia PDF Downloads 2333957 Exploratory Study of the Influencing Factors for Hotels' Competitors
Authors: Asma Ameur, Dhafer Malouche
Abstract:
Hotel competitiveness research is an essential phase of the marketing strategy for any hotel. Certainly, knowing the hotels' competitors helps the hotelier to grasp its position in the market and the citizen to make the right choice in picking a hotel. Thus, competitiveness is an important indicator that can be influenced by various factors. In fact, the issue of competitiveness, this ability to cope with competition, remains a difficult and complex concept to define and to exploit. Therefore, the purpose of this article is to make an exploratory study to calculate a competitiveness indicator for hotels. Further on, this paper makes it possible to determine the criteria of direct or indirect effect on the image and the perception of a hotel. The actual research is used to look into the right model for hotel ‘competitiveness. For this reason, we exploit different theoretical contributions in the field of machine learning. Thus, we use some statistical techniques such as the Principal Component Analysis (PCA) to reduce the dimensions, as well as other techniques of statistical modeling. This paper presents a survey covering of the techniques and methods in hotel competitiveness research. Furthermore, this study allows us to deduct the significant variables that influence the determination of hotel’s competitors. Lastly, the discussed experiences in this article found that the hotel competitors are influenced by several factors with different rates.Keywords: competitiveness, e-reputation, hotels' competitors, online hotel’ review, principal component analysis, statistical modeling
Procedia PDF Downloads 1173956 Electricity Generation from Renewables and Targets: An Application of Multivariate Statistical Techniques
Authors: Filiz Ersoz, Taner Ersoz, Tugrul Bayraktar
Abstract:
Renewable energy is referred to as "clean energy" and common popular support for the use of renewable energy (RE) is to provide electricity with zero carbon dioxide emissions. This study provides useful insight into the European Union (EU) RE, especially, into electricity generation obtained from renewables, and their targets. The objective of this study is to identify groups of European countries, using multivariate statistical analysis and selected indicators. The hierarchical clustering method is used to decide the number of clusters for EU countries. The conducted statistical hierarchical cluster analysis is based on the Ward’s clustering method and squared Euclidean distances. Hierarchical cluster analysis identified eight distinct clusters of European countries. Then, non-hierarchical clustering (k-means) method was applied. Discriminant analysis was used to determine the validity of the results with data normalized by Z score transformation. To explore the relationship between the selected indicators, correlation coefficients were computed. The results of the study reveal the current situation of RE in European Union Member States.Keywords: share of electricity generation, k-means clustering, discriminant, CO2 emission
Procedia PDF Downloads 4143955 Monte Carlo Methods and Statistical Inference of Multitype Branching Processes
Authors: Ana Staneva, Vessela Stoimenova
Abstract:
A parametric estimation of the MBP with Power Series offspring distribution family is considered in this paper. The MLE for the parameters is obtained in the case when the observable data are incomplete and consist only with the generation sizes of the family tree of MBP. The parameter estimation is calculated by using the Monte Carlo EM algorithm. The estimation for the posterior distribution and for the offspring distribution parameters are calculated by using the Bayesian approach and the Gibbs sampler. The article proposes various examples with bivariate branching processes together with computational results, simulation and an implementation using R.Keywords: Bayesian, branching processes, EM algorithm, Gibbs sampler, Monte Carlo methods, statistical estimation
Procedia PDF Downloads 4163954 TDApplied: An R Package for Machine Learning and Inference with Persistence Diagrams
Authors: Shael Brown, Reza Farivar
Abstract:
Persistence diagrams capture valuable topological features of datasets that other methods cannot uncover. Still, their adoption in data pipelines has been limited due to the lack of publicly available tools in R (and python) for analyzing groups of them with machine learning and statistical inference. In an easy-to-use and scalable R package called TDApplied, we implement several applied analysis methods tailored to groups of persistence diagrams. The two main contributions of our package are comprehensiveness (most functions do not have implementations elsewhere) and speed (shown through benchmarking against other R packages). We demonstrate applications of the tools on simulated data to illustrate how easily practical analyses of any dataset can be enhanced with topological information.Keywords: machine learning, persistence diagrams, R, statistical inference
Procedia PDF Downloads 843953 Predicting the Relationship Between the Corona Virus Anxiety and Psychological Hardiness in Staff Working at Hospital in Shiraz Iran
Authors: Gholam Reza Mirzaei, Mehran Roost
Abstract:
This research was conducted with the aim of predicting the relationship between coronavirus anxiety and psychological hardiness in employees working at Shahid Beheshti Hospital in Shiraz. The current research design was descriptive and correlational. The statistical population of the research consisted of all the employees of Shahid Beheshti Hospital in Shiraz in 2021. From among the statistical population, 220 individuals were selected and studied based on available sampling. To collect data, Kobasa's psychological hardiness questionnaire and coronavirus anxiety questionnaire were used. After collecting the data, the scores of the participants were analyzed using Pearson's correlation coefficient multiple regression analysis and SPSS-24 statistical software. The results of Pearson's correlation coefficient showed that there is a significant negative correlation between psychological hardiness and its components (challenge, commitment, and control) with coronavirus anxiety; also, psychological hardiness with a beta coefficient of 0.20 could predict coronavirus anxiety in hospital employees. Based on the results, plans can be made to enhance psychological hardiness through educational workshops to relieve the anxiety of the healthcare staff.Keywords: the corona virus, commitment, hospital employees, psychological hardiness
Procedia PDF Downloads 613952 Use of Multivariate Statistical Techniques for Water Quality Monitoring Network Assessment, Case of Study: Jequetepeque River Basin
Authors: Jose Flores, Nadia Gamboa
Abstract:
A proper water quality management requires the establishment of a monitoring network. Therefore, evaluation of the efficiency of water quality monitoring networks is needed to ensure high-quality data collection of critical quality chemical parameters. Unfortunately, in some Latin American countries water quality monitoring programs are not sustainable in terms of recording historical data or environmentally representative sites wasting time, money and valuable information. In this study, multivariate statistical techniques, such as principal components analysis (PCA) and hierarchical cluster analysis (HCA), are applied for identifying the most significant monitoring sites as well as critical water quality parameters in the monitoring network of the Jequetepeque River basin, in northern Peru. The Jequetepeque River basin, like others in Peru, shows socio-environmental conflicts due to economical activities developed in this area. Water pollution by trace elements in the upper part of the basin is mainly related with mining activity, and agricultural land lost due to salinization is caused by the extensive use of groundwater in the lower part of the basin. Since the 1980s, the water quality in the basin has been non-continuously assessed by public and private organizations, and recently the National Water Authority had established permanent water quality networks in 45 basins in Peru. Despite many countries use multivariate statistical techniques for assessing water quality monitoring networks, those instruments have never been applied for that purpose in Peru. For this reason, the main contribution of this study is to demonstrate that application of the multivariate statistical techniques could serve as an instrument that allows the optimization of monitoring networks using least number of monitoring sites as well as the most significant water quality parameters, which would reduce costs concerns and improve the water quality management in Peru. Main socio-economical activities developed and the principal stakeholders related to the water management in the basin are also identified. Finally, water quality management programs will also be discussed in terms of their efficiency and sustainability.Keywords: PCA, HCA, Jequetepeque, multivariate statistical
Procedia PDF Downloads 3523951 A Comparative Study of Wellness Among Sportsmen and Non Sportsmen
Authors: Jaskaran Singh Sidhu
Abstract:
Aim: The purpose of this study is to find the relationship between wellness among sportsmen and non sportsmen. Methodology: The present study is an experimental study for 80 senior secondary volleyball players of 16-19 years of age from Ludhiana District of Punjab (India), and 80 non-sportsperson were taken from senior secondary school of Ludhiana district. The sample for this study was taken through a random sampling technique. Tools: A five point scale havinf 50 items was used to acess the wellness Statistical Analysis: To find out the relationship among the variables exists or not, a t-test was used to test the significance of the difference between the means. Statistics for each characteristic were calculated; Mean, Standard deviation, Standard error of Mean. Data were analyzed using SPSS (statistical package for the social sciences). Statistical significance was set at p < 0.05. Results: Substantial deviations were noted at p<0.5 in the totality of wellness. Sportsmen show significant differences exist at p<0.5 in three parameters of wellness i.e., physical wellness, mental wellness, and social wellness. In spiritual and emotional wellness attributes, non-sportsmen shows significant difference at p<0.5. Conclusion: From the data interpretation it reflects that overall wellness can be improved by participation in sports. It further noted in study that participation in sports promote the attributes of wellness i.e., physical wellness, mental wellness, emotional wellness and social wellness.Keywords: physical, mental, social, emotional, wellness, spiritual
Procedia PDF Downloads 883950 Improvement of Water Distillation Plant by Using Statistical Process Control System
Authors: Qasim Kriri, Harsh B. Desai
Abstract:
Water supply and sanitation in Saudi Arabia is portrayed by difficulties and accomplishments. One of the fundamental difficulties is water shortage. With a specific end goal to beat water shortage, significant ventures have been attempted in sea water desalination, water circulation, sewerage, and wastewater treatment. The motivation behind Statistical Process Control (SPC) is to decide whether the execution of a procedure is keeping up an acceptable quality level [AQL]. SPC is an analytical decision-making method. A fundamental apparatus in the SPC is the Control Charts, which follow the inconstancy in the estimations of the item quality attributes. By utilizing the suitable outline, administration can decide whether changes should be made with a specific end goal to keep the procedure in charge. The two most important quality factors in the distilled water which were taken into consideration were pH (Potential of Hydrogen) and TDS (Total Dissolved Solids). There were three stages at which the quality checks were done. The stages were as follows: (1) Water at the source, (2) water after chemical treatment & (3) water which is sent for packing. The upper specification limit, central limit and lower specification limit are taken as per Saudi water standards. The procedure capacity to accomplish the particulars set for the quality attributes of Berain water Factory chose to be focused by the proposed SPC system.Keywords: acceptable quality level, statistical quality control, control charts, process charts
Procedia PDF Downloads 1833949 Terrain Classification for Ground Robots Based on Acoustic Features
Authors: Bernd Kiefer, Abraham Gebru Tesfay, Dietrich Klakow
Abstract:
The motivation of our work is to detect different terrain types traversed by a robot based on acoustic data from the robot-terrain interaction. Different acoustic features and classifiers were investigated, such as Mel-frequency cepstral coefficient and Gamma-tone frequency cepstral coefficient for the feature extraction, and Gaussian mixture model and Feed forward neural network for the classification. We analyze the system’s performance by comparing our proposed techniques with some other features surveyed from distinct related works. We achieve precision and recall values between 87% and 100% per class, and an average accuracy at 95.2%. We also study the effect of varying audio chunk size in the application phase of the models and find only a mild impact on performance.Keywords: acoustic features, autonomous robots, feature extraction, terrain classification
Procedia PDF Downloads 3663948 Machine Learning for Disease Prediction Using Symptoms and X-Ray Images
Authors: Ravija Gunawardana, Banuka Athuraliya
Abstract:
Machine learning has emerged as a powerful tool for disease diagnosis and prediction. The use of machine learning algorithms has the potential to improve the accuracy of disease prediction, thereby enabling medical professionals to provide more effective and personalized treatments. This study focuses on developing a machine-learning model for disease prediction using symptoms and X-ray images. The importance of this study lies in its potential to assist medical professionals in accurately diagnosing diseases, thereby improving patient outcomes. Respiratory diseases are a significant cause of morbidity and mortality worldwide, and chest X-rays are commonly used in the diagnosis of these diseases. However, accurately interpreting X-ray images requires significant expertise and can be time-consuming, making it difficult to diagnose respiratory diseases in a timely manner. By incorporating machine learning algorithms, we can significantly enhance disease prediction accuracy, ultimately leading to better patient care. The study utilized the Mask R-CNN algorithm, which is a state-of-the-art method for object detection and segmentation in images, to process chest X-ray images. The model was trained and tested on a large dataset of patient information, which included both symptom data and X-ray images. The performance of the model was evaluated using a range of metrics, including accuracy, precision, recall, and F1-score. The results showed that the model achieved an accuracy rate of over 90%, indicating that it was able to accurately detect and segment regions of interest in the X-ray images. In addition to X-ray images, the study also incorporated symptoms as input data for disease prediction. The study used three different classifiers, namely Random Forest, K-Nearest Neighbor and Support Vector Machine, to predict diseases based on symptoms. These classifiers were trained and tested using the same dataset of patient information as the X-ray model. The results showed promising accuracy rates for predicting diseases using symptoms, with the ensemble learning techniques significantly improving the accuracy of disease prediction. The study's findings indicate that the use of machine learning algorithms can significantly enhance disease prediction accuracy, ultimately leading to better patient care. The model developed in this study has the potential to assist medical professionals in diagnosing respiratory diseases more accurately and efficiently. However, it is important to note that the accuracy of the model can be affected by several factors, including the quality of the X-ray images, the size of the dataset used for training, and the complexity of the disease being diagnosed. In conclusion, the study demonstrated the potential of machine learning algorithms for disease prediction using symptoms and X-ray images. The use of these algorithms can improve the accuracy of disease diagnosis, ultimately leading to better patient care. Further research is needed to validate the model's accuracy and effectiveness in a clinical setting and to expand its application to other diseases.Keywords: K-nearest neighbor, mask R-CNN, random forest, support vector machine
Procedia PDF Downloads 1513947 The Value of Dynamic Priorities in Motor Learning between Some Basic Skills in Beginner's Basketball, U14 Years
Authors: Guebli Abdelkader, Regiueg Madani, Sbaa Bouabdellah
Abstract:
The goals of this study are to find ways to determine the value of dynamic priorities in motor learning between some basic skills in beginner’s basketball (U14), based on skills of shooting and defense against the shooter. Our role is to expose the statistical results in compare & correlation between samples of study in tests skills for the shooting and defense against the shooter. In order to achieve this objective, we have chosen 40 boys in middle school represented in four groups, two controls group’s (CS1, CS2) ,and two experimental groups (ES1: training on skill of shooting, skill of defense against the shooter, ES2: experimental group training on skill of defense against the shooter, skill of shooting). For the statistical analysis, we have chosen (F & T) tests for the statistical differences, and test (R) for the correlation analysis. Based on the analyses statistics, we confirm the importance of classifying priorities of basketball basic skills during the motor learning process. Admit that the benefits of experimental group training are to economics in the time needed for acquiring new motor kinetic skills in basketball. In the priority of ES2 as successful dynamic motor learning method to enhance the basic skills among beginner’s basketball.Keywords: basic skills, basketball, motor learning, children
Procedia PDF Downloads 1683946 Comparative Evaluation of Equity Indicators in the Matikiw Community-Based Forest Management Project in Pakil, Laguna and the Minayutan and Bacong Sigsigan Community-Based Forest Management Project in Famy, Laguna
Authors: Katherine Arquio
Abstract:
Community-based Forest Management (CBFM) is one of the integrative programs that slowly turned the course of forest management from traditional corporate to community-based practice resulting to people empowerment. As such, one of its goals is to promote socio-economic welfare among the people in the community in which social equity is included. This study aims to look at the equity aspect of the program, particularly if there are equity differences between two CBFM sites- Matikiw in Pakil, Laguna and Minayutan and Bacong Sigsigan in Famy, Laguna. Equity indicators were identified first, since these will be the basis of the questions that will be asked on the survey, after this, the survey proper was conducted, and finally, the analysis. Two tailed t-test was used as statistical tool since the difference between the two sites is the focus of the study. Statistical analysis was done through the use of STATA program, a statistical software. There were 32 indicators identified and results showed that, out of these indicators, only 13 were found significantly different between the two. The 13 indicators were significantly observed only in Matikiw; the other 19 indicators were commonly observed in both areas and are conducive as equity indicators for the CBFM program.Keywords: social equity, CBFM, social forestry, equity indicators
Procedia PDF Downloads 3823945 Evaluation of Gesture-Based Password: User Behavioral Features Using Machine Learning Algorithms
Authors: Lakshmidevi Sreeramareddy, Komalpreet Kaur, Nane Pothier
Abstract:
Graphical-based passwords have existed for decades. Their major advantage is that they are easier to remember than an alphanumeric password. However, their disadvantage (especially recognition-based passwords) is the smaller password space, making them more vulnerable to brute force attacks. Graphical passwords are also highly susceptible to the shoulder-surfing effect. The gesture-based password method that we developed is a grid-free, template-free method. In this study, we evaluated the gesture-based passwords for usability and vulnerability. The results of the study are significant. We developed a gesture-based password application for data collection. Two modes of data collection were used: Creation mode and Replication mode. In creation mode (Session 1), users were asked to create six different passwords and reenter each password five times. In replication mode, users saw a password image created by some other user for a fixed duration of time. Three different duration timers, such as 5 seconds (Session 2), 10 seconds (Session 3), and 15 seconds (Session 4), were used to mimic the shoulder-surfing attack. After the timer expired, the password image was removed, and users were asked to replicate the password. There were 74, 57, 50, and 44 users participated in Session 1, Session 2, Session 3, and Session 4 respectfully. In this study, the machine learning algorithms have been applied to determine whether the person is a genuine user or an imposter based on the password entered. Five different machine learning algorithms were deployed to compare the performance in user authentication: namely, Decision Trees, Linear Discriminant Analysis, Naive Bayes Classifier, Support Vector Machines (SVMs) with Gaussian Radial Basis Kernel function, and K-Nearest Neighbor. Gesture-based password features vary from one entry to the next. It is difficult to distinguish between a creator and an intruder for authentication. For each password entered by the user, four features were extracted: password score, password length, password speed, and password size. All four features were normalized before being fed to a classifier. Three different classifiers were trained using data from all four sessions. Classifiers A, B, and C were trained and tested using data from the password creation session and the password replication with a timer of 5 seconds, 10 seconds, and 15 seconds, respectively. The classification accuracies for Classifier A using five ML algorithms are 72.5%, 71.3%, 71.9%, 74.4%, and 72.9%, respectively. The classification accuracies for Classifier B using five ML algorithms are 69.7%, 67.9%, 70.2%, 73.8%, and 71.2%, respectively. The classification accuracies for Classifier C using five ML algorithms are 68.1%, 64.9%, 68.4%, 71.5%, and 69.8%, respectively. SVMs with Gaussian Radial Basis Kernel outperform other ML algorithms for gesture-based password authentication. Results confirm that the shorter the duration of the shoulder-surfing attack, the higher the authentication accuracy. In conclusion, behavioral features extracted from the gesture-based passwords lead to less vulnerable user authentication.Keywords: authentication, gesture-based passwords, machine learning algorithms, shoulder-surfing attacks, usability
Procedia PDF Downloads 1023944 Identity Verification Using k-NN Classifiers and Autistic Genetic Data
Authors: Fuad M. Alkoot
Abstract:
DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN).Keywords: biometrics, genetic data, identity verification, k nearest neighbor
Procedia PDF Downloads 2533943 The Effect of Damping Treatment for Noise Control on Offshore Platforms Using Statistical Energy Analysis
Authors: Ji Xi, Cheng Song Chin, Ehsan Mesbahi
Abstract:
Structure-borne noise is an important aspect of offshore platform sound field. It can be generated either directly by vibrating machineries induced mechanical force, indirectly by the excitation of structure or excitation by incident airborne noise. Therefore, limiting of the transmission of vibration energy throughout the offshore platform is the key to control the structure-borne noise. This is usually done by introducing damping treatment to the steel structures. Two types of damping treatment using on-board are presented. By conducting a statistical energy analysis (SEA) simulation on a jack-up rig, the noise level in the source room, the neighboring rooms, and remote living quarter cabins are compared before and after the damping treatments been applied. The results demonstrated that, in the source neighboring room and living quarter area, there is a significant noise reduction with the damping treatment applied, whereas in the source room where air-borne sound predominates that of structure-borne sound, the impact is not obvious. The subsequent optimization design of damping treatment in the offshore platform can be made which enable acoustic professionals to implement noise control during the design stage for offshore crews’ hearing protection and habitant comfortability.Keywords: statistical energy analysis, damping treatment, noise control, offshore platform
Procedia PDF Downloads 5513942 Using AI for Analysing Political Leaders
Authors: Shuai Zhao, Shalendra D. Sharma, Jin Xu
Abstract:
This research uses advanced machine learning models to learn a number of hypotheses regarding political executives. Specifically, it analyses the impact these powerful leaders have on economic growth by using leaders’ data from the Archigos database from 1835 to the end of 2015. The data is processed by the AutoGluon, which was developed by Amazon. Automated Machine Learning (AutoML) and AutoGluon can automatically extract features from the data and then use multiple classifiers to train the data. Use a linear regression model and classification model to establish the relationship between leaders and economic growth (GDP per capita growth), and to clarify the relationship between their characteristics and economic growth from a machine learning perspective. Our work may show as a model or signal for collaboration between the fields of statistics and artificial intelligence (AI) that can light up the way for political researchers and economists.Keywords: comparative politics, political executives, leaders’ characteristics, artificial intelligence
Procedia PDF Downloads 853941 Statistical Analysis with Prediction Models of User Satisfaction in Software Project Factors
Authors: Katawut Kaewbanjong
Abstract:
We analyzed a volume of data and found significant user satisfaction in software project factors. A statistical significance analysis (logistic regression) and collinearity analysis determined the significance factors from a group of 71 pre-defined factors from 191 software projects in ISBSG Release 12. The eight prediction models used for testing the prediction potential of these factors were Neural network, k-NN, Naïve Bayes, Random forest, Decision tree, Gradient boosted tree, linear regression and logistic regression prediction model. Fifteen pre-defined factors were truly significant in predicting user satisfaction, and they provided 82.71% prediction accuracy when used with a neural network prediction model. These factors were client-server, personnel changes, total defects delivered, project inactive time, industry sector, application type, development type, how methodology was acquired, development techniques, decision making process, intended market, size estimate approach, size estimate method, cost recording method, and effort estimate method. These findings may benefit software development managers considerably.Keywords: prediction model, statistical analysis, software project, user satisfaction factor
Procedia PDF Downloads 1223940 Investigation of the Impact of Family Status and Blood Group on Individuals’ Addiction
Authors: Masoud Abbasalipour
Abstract:
In this study, the impact of family status on individuals, involving factors such as parents' literacy level, family size, individuals' blood group, and susceptibility to addiction, was investigated. Statistical tests were employed to scrutinize the relationships among these specified factors. The statistical population of the study consisted of 338 samples divided into two groups: individuals with addiction and those without addiction in the city of Amol. The addicted group was selected from individuals visiting the substance abuse treatment center in Amol, and the non-addicted group was randomly selected from individuals in urban and rural areas. The Chi-square test was used to examine the presence or absence of relationships among the variables, and Kramer's V test was employed to determine the strength of the relationship between them. Excel software facilitated the initial entry of data, and SPSS software was utilized for the desired statistical tests. The research results indicated a significant relationship between the variable of parents' education level and individuals' addiction. The analysis showed that the education level of their parents was significantly lower compared to non-addicted individuals. However, the variables of the number of family members and blood group did not significantly impact individuals' susceptibility to addiction.Keywords: addiction, blood group, parents' literacy level, family status
Procedia PDF Downloads 683939 Statistical Comparison of Machine and Manual Translation: A Corpus-Based Study of Gone with the Wind
Authors: Yanmeng Liu
Abstract:
This article analyzes and compares the linguistic differences between machine translation and manual translation, through a case study of the book Gone with the Wind. As an important carrier of human feeling and thinking, the literature translation poses a huge difficulty for machine translation, and it is supposed to expose distinct translation features apart from manual translation. In order to display linguistic features objectively, tentative uses of computerized and statistical evidence to the systematic investigation of large scale translation corpora by using quantitative methods have been deployed. This study compiles bilingual corpus with four versions of Chinese translations of the book Gone with the Wind, namely, Piao by Chunhai Fan, Piao by Huairen Huang, translations by Google Translation and Baidu Translation. After processing the corpus with the software of Stanford Segmenter, Stanford Postagger, and AntConc, etc., the study analyzes linguistic data and answers the following questions: 1. How does the machine translation differ from manual translation linguistically? 2. Why do these deviances happen? This paper combines translation study with the knowledge of corpus linguistics, and concretes divergent linguistic dimensions in translated text analysis, in order to present linguistic deviances in manual and machine translation. Consequently, this study provides a more accurate and more fine-grained understanding of machine translation products, and it also proposes several suggestions for machine translation development in the future.Keywords: corpus-based analysis, linguistic deviances, machine translation, statistical evidence
Procedia PDF Downloads 1413938 A Nonlocal Means Algorithm for Poisson Denoising Based on Information Geometry
Authors: Dongxu Chen, Yipeng Li
Abstract:
This paper presents an information geometry NonlocalMeans(NLM) algorithm for Poisson denoising. NLM estimates a noise-free pixel as a weighted average of image pixels, where each pixel is weighted according to the similarity between image patches in Euclidean space. In this work, every pixel is a Poisson distribution locally estimated by Maximum Likelihood (ML), all distributions consist of a statistical manifold. A NLM denoising algorithm is conducted on the statistical manifold where Fisher information matrix can be used for computing distribution geodesics referenced as the similarity between patches. This approach was demonstrated to be competitive with related state-of-the-art methods.Keywords: image denoising, Poisson noise, information geometry, nonlocal-means
Procedia PDF Downloads 2843937 Off-Topic Text Detection System Using a Hybrid Model
Authors: Usama Shahid
Abstract:
Be it written documents, news columns, or students' essays, verifying the content can be a time-consuming task. Apart from the spelling and grammar mistakes, the proofreader is also supposed to verify whether the content included in the essay or document is relevant or not. The irrelevant content in any document or essay is referred to as off-topic text and in this paper, we will address the problem of off-topic text detection from a document using machine learning techniques. Our study aims to identify the off-topic content from a document using Echo state network model and we will also compare data with other models. The previous study uses Convolutional Neural Networks and TFIDF to detect off-topic text. We will rearrange the existing datasets and take new classifiers along with new word embeddings and implement them on existing and new datasets in order to compare the results with the previously existing CNN model.Keywords: off topic, text detection, eco state network, machine learning
Procedia PDF Downloads 853936 Statistical Modeling of Mandarin Tone Sandhi: Neutralization of Underlying Pitch Targets
Authors: Si Chen, Caroline Wiltshire, Bin Li
Abstract:
This study statistically models the surface f0 contour and the underlying pitch target of a well-studied third sandhi tone of Mandarin Chinese. Although the growth curve analysis on the surface f0 contours indicates non-neutralization of this sandhi tone (T3) and the base T2, their underlying pitch targets do show neutralization. These results in Mandarin are also consistent with the perception of native speakers, where they cannot distinguish the third T3 from the base T2, compensating contextual variation. It is possible to use the proposed statistical procedure of testing underlying pitch targets to verify tone sandhi processes in other tonal languages.Keywords: growth curve analysis, Mandarin Chinese, tone sandhi, underlying pitch target
Procedia PDF Downloads 3353935 A Comparative Study on Automatic Feature Classification Methods of Remote Sensing Images
Authors: Lee Jeong Min, Lee Mi Hee, Eo Yang Dam
Abstract:
Geospatial feature extraction is a very important issue in the remote sensing research. In the meantime, the image classification based on statistical techniques, but, in recent years, data mining and machine learning techniques for automated image processing technology is being applied to remote sensing it has focused on improved results generated possibility. In this study, artificial neural network and decision tree technique is applied to classify the high-resolution satellite images, as compared to the MLC processing result is a statistical technique and an analysis of the pros and cons between each of the techniques.Keywords: remote sensing, artificial neural network, decision tree, maximum likelihood classification
Procedia PDF Downloads 3453934 Georgia Case: Tourism Expenses of International Visitors on the Basis of Growing Attractiveness
Authors: Nino Abesadze, Marine Mindorashvili, Nino Paresashvili
Abstract:
At present actual tourism indicators cannot be calculated in Georgia, making it impossible to perform their quantitative analysis. Therefore, the study conducted by us is highly important from a theoretical as well as practical standpoint. The main purpose of the article is to make complex statistical analysis of tourist expenses of foreign visitors and to calculate statistical attractiveness indices of the tourism potential of Georgia. During the research, the method involving random and proportional selection has been applied. Computer software SPSS was used to compute statistical data for corresponding analysis. Corresponding methodology of tourism statistics was implemented according to international standards. Important information was collected and grouped from major Georgian airports, and a representative population of foreign visitors and a rule of selection of respondents were determined. The results show a trend of growth in tourist numbers and the share of tourists from post-soviet countries are constantly increasing. The level of satisfaction with tourist facilities and quality of service has improved, but still we have a problem of disparity between the service quality and the prices. The design of tourist expenses of foreign visitors is diverse; competitiveness of tourist products of Georgian tourist companies is higher. Attractiveness of popular cities of Georgia has increased by 43%.Keywords: tourist, expenses, indexes, statistics, analysis
Procedia PDF Downloads 3313933 A Fully Automated New-Fangled VESTAL to Label Vertebrae and Intervertebral Discs
Authors: R. Srinivas, K. V. Ramana
Abstract:
This paper presents a novel method called VESTAL to label vertebrae and inter vertebral discs. Each vertebra has certain statistical features properties. To label vertebrae and discs, a new equation to model the path of spinal cord is derived using statistical properties of the spinal canal. VESTAL uses this equation for labeling vertebrae and discs. For each vertebrae and inter vertebral discs both posterior, interior width, height are measured. The calculated values are compared with real values which are measured using venires calipers and the comparison produced 95% efficiency and accurate results. The VESTAL is applied on 50 patients 350 MR images and obtained 100% accuracy in labeling.Keywords: spine, vertebrae, inter vertebral disc, labeling, statistics, texture, disc
Procedia PDF Downloads 3613932 Risks in Forestry Operations, Analysis of Fatal Accidents
Authors: Rino Gubiani, Gianfranco Pergher
Abstract:
The work focused on the statistical analysis of accidents in the forestry sector (2000-2020) in Friuli-Venezia Giulia region, located in the North-East of Italy. The aim of the work was to analyse the evolution of the casualties throughout time and to evaluate possible improvements in the sector. It was shown that even nowadays the rate of accidents in forestry work is higher compared with all the other sectors, including agriculture; moreover, it was highlighted that some accidents remained present throughout the whole analysed range, such as slipping on the soil, being hit by trees and falling down from the plants. The results showed that an increase in forestry exploitation could even increase the total number of accidents, if advanced technological machines, such as cable cranes, would not implemented, given the fact that there is also a significant number of old people (above 50 years old) working in the sector.Keywords: safety, forestry work, accidents, risk analysis, casualties, statistical analysis
Procedia PDF Downloads 1303931 Foot Recognition Using Deep Learning for Knee Rehabilitation
Authors: Rakkrit Duangsoithong, Jermphiphut Jaruenpunyasak, Alba Garcia
Abstract:
The use of foot recognition can be applied in many medical fields such as the gait pattern analysis and the knee exercises of patients in rehabilitation. Generally, a camera-based foot recognition system is intended to capture a patient image in a controlled room and background to recognize the foot in the limited views. However, this system can be inconvenient to monitor the knee exercises at home. In order to overcome these problems, this paper proposes to use the deep learning method using Convolutional Neural Networks (CNNs) for foot recognition. The results are compared with the traditional classification method using LBP and HOG features with kNN and SVM classifiers. According to the results, deep learning method provides better accuracy but with higher complexity to recognize the foot images from online databases than the traditional classification method.Keywords: foot recognition, deep learning, knee rehabilitation, convolutional neural network
Procedia PDF Downloads 1603930 Non-Destructive Visual-Statistical Approach to Detect Leaks in Water Mains
Authors: Alaa Al Hawari, Mohammad Khader, Tarek Zayed, Osama Moselhi
Abstract:
In this paper, an effective non-destructive, non-invasive approach for leak detection was proposed. The process relies on analyzing thermal images collected by an IR viewer device that captures thermo-grams. In this study a statistical analysis of the collected thermal images of the ground surface along the expected leak location followed by a visual inspection of the thermo-grams was performed in order to locate the leak. In order to verify the applicability of the proposed approach the predicted leak location from the developed approach was compared with the real leak location. The results showed that the expected leak location was successfully identified with an accuracy of more than 95%.Keywords: thermography, leakage, water pipelines, thermograms
Procedia PDF Downloads 3533929 Drying Kinects of Soybean Seeds
Authors: Amanda Rithieli Pereira Dos Santos, Rute Quelvia De Faria, Álvaro De Oliveira Cardoso, Anderson Rodrigo Da Silva, Érica Leão Fernandes Araújo
Abstract:
The study of the kinetics of drying has great importance for the mathematical modeling, allowing to know about the processes of transference of heat and mass between the products and to adjust dryers managing new technologies for these processes. The present work had the objective of studying the kinetics of drying of soybean seeds and adjusting different statistical models to the experimental data varying cultivar and temperature. Soybean seeds were pre-dried in a natural environment in order to reduce and homogenize the water content to the level of 14% (b.s.). Then, drying was carried out in a forced air circulation oven at controlled temperatures of 38, 43, 48, 53 and 58 ± 1 ° C, using two soybean cultivars, BRS 8780 and Sambaíba, until reaching a hygroscopic equilibrium. The experimental design was completely randomized in factorial 5 x 2 (temperature x cultivar) with 3 replicates. To the experimental data were adjusted eleven statistical models used to explain the drying process of agricultural products. Regression analysis was performed using the least squares Gauss-Newton algorithm to estimate the parameters. The degree of adjustment was evaluated from the analysis of the coefficient of determination (R²), the adjusted coefficient of determination (R² Aj.) And the standard error (S.E). The models that best represent the drying kinetics of soybean seeds are those of Midilli and Logarítmico.Keywords: curve of drying seeds, Glycine max L., moisture ratio, statistical models
Procedia PDF Downloads 626