Search results for: naive q-learning
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 156

Search results for: naive q-learning

96 Diversity of Voices: Audio Visual Continuous Speech Recognition with Traditional Approach

Authors: Partha Protim Majumder, Sajeeb Das, Sharun Akter Khushbu

Abstract:

Bengali is widely spoken in the world, but Bengali speech recognition has not received much attention. Here, we are conducting the toughest task because it must be performed in a noisy place in our study. Another challenge we overcome is dealing with speeches and collecting data on third genders, and our approach is to recognize the gender in speeches. All of the Bangla speech samples used in this study were short and were taken from real-life situations. We employed the male, female, and third-gender categories of speech. In this study, we derive the feature from the spoken word. We used MFCC(1-20), ZCR,rolloff,spec_cen, RMSE, and chroma_stft. Here, we used the algorithms Gboost, Random Forest, K-Nearest Neighbors (KNN), Decision Tree, Naive Bayes, and Logistic Regression (LR) to assess the performance of recognition metrics, and we got the highest performance from random forest in recognizing the gender of the speeches.

Keywords: MFCC, ZCR, Bengali, LR, RMSE, roll-off, Gboost

Procedia PDF Downloads 32
95 Optimizing Network Latency with Fast Path Assignment for Incoming Flows

Authors: Qing Lyu, Hang Zhu

Abstract:

Various flows in the network require to go through different types of middlebox. The improper placement of network middlebox and path assignment for flows could greatly increase the network latency and also decrease the performance of network. Minimizing the total end to end latency of all the ows requires to assign path for the incoming flows. In this paper, the flow path assignment problem in regard to the placement of various kinds of middlebox is studied. The flow path assignment problem is formulated to a linear programming problem, which is very time consuming. On the other hand, a naive greedy algorithm is studied. Which is very fast but causes much more latency than the linear programming algorithm. At last, the paper presents a heuristic algorithm named FPA, which takes bottleneck link information and estimated bandwidth occupancy into consideration, and achieves near optimal latency in much less time. Evaluation results validate the effectiveness of the proposed algorithm.

Keywords: flow path, latency, middlebox, network

Procedia PDF Downloads 179
94 Effects of Exposure to a Language on Perception of Non-Native Phonologically Contrastive Duration

Authors: Chuyu Huang, Itsuki Minemi, Kuanlin Chen, Yuki Hirose

Abstract:

It remains unclear how language speakers are able to perceive phonological contrasts that do not exist on their own. This experiment uses the vowel-length distinction in Japanese, which is phonologically contrastive and co-occurs with tonal change in some cases. For speakers whose first language does not distinguish vowel length, contrastive duration is usually misperceived, e.g., Mandarin speakers. Two alternative hypotheses for how Mandarin speakers would perceive a phonological contrast that does not exist in their language make different predictions. The stress parameter model does not have a clear prediction about the impact of tonal type. Mandarin speakers will likely be not able to perceive vowel length as well as Japanese native speakers do, but the performance might not correlate to tonal type because the prosody of their language is distinctive, which requires users to encode lexical prosody and notice subtle differences in word prosody. By contrast, cue-based phonetic models predict that Mandarin speakers may rely on pitch differences, a secondary cue, to perceive vowel length. Two groups of Mandarin speakers, including naive non-Japanese speakers and beginner learners, were recruited to participate in an AX discrimination task involving two Japanese sound stimuli that contain a phonologically contrastive environment. Participants were asked to indicate whether the two stimuli containing a vowel-length contrast (e.g., maapero vs. mapero) sound the same. The experiment was bifactorial. The first factor contrasted three syllabic positions (syllable position; initial/medial/final), as it would be likely to affect the perceptual difficulty, as seen in previous studies, and the second factor contrasted two pitch types (accent type): one with accentual change that could be distinguished with the lexical tones in Mandarin (the different condition), with the other group having no tonal distinction but only differing in vowel length (the same condition). The overall results showed that a significant main effect of accent type by applying a linear mixed-effects model (β = 1.48, SE = 0.35, p < 0.05), which implies that Mandarin speakers tend to more successfully recognize vowel-length differences when the long vowel counterpart takes on a tone that exists in Mandarin. The interaction between the accent type and the syllabic position is also significant (β = 2.30, SE = 0.91, p < 0.05), showing that vowel lengths in the different conditions are more difficult to recognize in the word-final case relative to the initial condition. The second statistical model, which compares naive speakers to beginners, was conducted with logistic regression to test the effects of the participant group. A significant difference was found between the two groups (β = 1.06, 95% CI = [0.36, 2.03], p < 0.05). This study shows that: (1) Mandarin speakers are likely to use pitch cues to perceive vowel length in a non-native language, which is consistent with the cue-based approaches; (2) an exposure effect was observed: the beginner group achieved a higher accuracy for long vowel perception, which implied the exposure effect despite the short period of language learning experience.

Keywords: cue-based perception, exposure effect, prosodic perception, vowel duration

Procedia PDF Downloads 195
93 Sentiment Analysis on the East Timor Accession Process to the ASEAN

Authors: Marcelino Caetano Noronha, Vosco Pereira, Jose Soares Pinto, Ferdinando Da C. Saores

Abstract:

One particularly popular social media platform is Youtube. It’s a video-sharing platform where users can submit videos, and other users can like, dislike or comment on the videos. In this study, we conduct a binary classification task on YouTube’s video comments and review from the users regarding the accession process of Timor Leste to become the eleventh member of the Association of South East Asian Nations (ASEAN). We scrape the data directly from the public YouTube video and apply several pre-processing and weighting techniques. Before conducting the classification, we categorized the data into two classes, namely positive and negative. In the classification part, we apply Support Vector Machine (SVM) algorithm. By comparing with Naïve Bayes Algorithm, the experiment showed SVM achieved 84.1% of Accuracy, 94.5% of Precision, and Recall 73.8% simultaneously.

Keywords: classification, YouTube, sentiment analysis, support sector machine

Procedia PDF Downloads 67
92 Incorporating Information Gain in Regular Expressions Based Classifiers

Authors: Rosa L. Figueroa, Christopher A. Flores, Qing Zeng-Treitler

Abstract:

A regular expression consists of sequence characters which allow describing a text path. Usually, in clinical research, regular expressions are manually created by programmers together with domain experts. Lately, there have been several efforts to investigate how to generate them automatically. This article presents a text classification algorithm based on regexes. The algorithm named REX was designed, and then, implemented as a simplified method to create regexes to classify Spanish text automatically. In order to classify ambiguous cases, such as, when multiple labels are assigned to a testing example, REX includes an information gain method Two sets of data were used to evaluate the algorithm’s effectiveness in clinical text classification tasks. The results indicate that the regular expression based classifier proposed in this work performs statically better regarding accuracy and F-measure than Support Vector Machine and Naïve Bayes for both datasets.

Keywords: information gain, regular expressions, smith-waterman algorithm, text classification

Procedia PDF Downloads 290
91 Sentiment Classification of Documents

Authors: Swarnadip Ghosh

Abstract:

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it determines whether a piece of writing is positive, negative or neutral.Sentiment analysis of documents holds great importance in today's world, when numerous information is stored in databases and in the world wide web. An efficient algorithm to illicit such information, would be beneficial for social, economic as well as medical purposes. In this project, we have developed an algorithm to classify a document into positive or negative. Using our algorithm, we obtained a feature set from the data, and classified the documents based on this feature set. It is important to note that, in the classification, we have not used the independence assumption, which is considered by many procedures like the Naive Bayes. This makes the algorithm more general in scope. Moreover, because of the sparsity and high dimensionality of such data, we did not use empirical distribution for estimation, but developed a method by finding degree of close clustering of the data points. We have applied our algorithm on a movie review data set obtained from IMDb and obtained satisfactory results.

Keywords: sentiment, Run's Test, cross validation, higher dimensional pmf estimation

Procedia PDF Downloads 371
90 Air Cargo Overbooking Model under Stochastic Weight and Volume Cancellation

Authors: Naragain Phumchusri, Krisada Roekdethawesab, Manoj Lohatepanont

Abstract:

Overbooking is an approach of selling more goods or services than available capacities because sellers anticipate that some buyers will not show-up or may cancel their bookings. At present, many airlines deploy overbooking strategy in order to deal with the uncertainty of their customers. Particularly, some airlines sell more cargo capacity than what they have available to freight forwarders with beliefs that some of them will cancel later. In this paper, we propose methods to find the optimal overbooking level of volume and weight for air cargo in order to minimize the total cost, containing cost of spoilage and cost of offloaded. Cancellations of volume and weight are jointly random variables with a known joint distribution. Heuristic approaches applying the idea of weight and volume independency is considered to find an appropriate answer to the full problem. Computational experiments are used to explore the performance of approaches presented in this paper, as compared to a naïve method under different scenarios.

Keywords: air cargo overbooking, offloading capacity, optimal overbooking level, revenue management, spoilage capacity

Procedia PDF Downloads 296
89 Evaluation of Robust Feature Descriptors for Texture Classification

Authors: Jia-Hong Lee, Mei-Yi Wu, Hsien-Tsung Kuo

Abstract:

Texture is an important characteristic in real and synthetic scenes. Texture analysis plays a critical role in inspecting surfaces and provides important techniques in a variety of applications. Although several descriptors have been presented to extract texture features, the development of object recognition is still a difficult task due to the complex aspects of texture. Recently, many robust and scaling-invariant image features such as SIFT, SURF and ORB have been successfully used in image retrieval and object recognition. In this paper, we have tried to compare the performance for texture classification using these feature descriptors with k-means clustering. Different classifiers including K-NN, Naive Bayes, Back Propagation Neural Network , Decision Tree and Kstar were applied in three texture image sets - UIUCTex, KTH-TIPS and Brodatz, respectively. Experimental results reveal SIFTS as the best average accuracy rate holder in UIUCTex, KTH-TIPS and SURF is advantaged in Brodatz texture set. BP neuro network works best in the test set classification among all used classifiers.

Keywords: texture classification, texture descriptor, SIFT, SURF, ORB

Procedia PDF Downloads 333
88 Machine Learning Automatic Detection on Twitter Cyberbullying

Authors: Raghad A. Altowairgi

Abstract:

With the wide spread of social media platforms, young people tend to use them extensively as the first means of communication due to their ease and modernity. But these platforms often create a fertile ground for bullies to practice their aggressive behavior against their victims. Platform usage cannot be reduced, but intelligent mechanisms can be implemented to reduce the abuse. This is where machine learning comes in. Understanding and classifying text can be helpful in order to minimize the act of cyberbullying. Artificial intelligence techniques have expanded to formulate an applied tool to address the phenomenon of cyberbullying. In this research, machine learning models are built to classify text into two classes; cyberbullying and non-cyberbullying. After preprocessing the data in 4 stages; removing characters that do not provide meaningful information to the models, tokenization, removing stop words, and lowering text. BoW and TF-IDF are used as the main features for the five classifiers, which are; logistic regression, Naïve Bayes, Random Forest, XGboost, and Catboost classifiers. Each of them scores 92%, 90%, 92%, 91%, 86% respectively.

Keywords: cyberbullying, machine learning, Bag-of-Words, term frequency-inverse document frequency, natural language processing, Catboost

Procedia PDF Downloads 96
87 Rollet vs Rocket: A New in-Space Propulsion Concept

Authors: Arthur Baraov

Abstract:

Nearly all rocket and spacecraft propulsion concepts in existence today can be linked one way or the other to one of the two ancient warfare devices: the gun and the sling. Chemical, thermoelectric, ion, nuclear thermal and electromagnetic rocket engines – all fall into the first group which, for obvious reasons, can be categorized as “hot” space propulsion concepts. Space elevator, orbital tower, rolling satellite, orbital skyhook, tether propulsion and gravitational assist – are examples of the second category which lends itself for the title “cold” space propulsion concepts. The “hot” space propulsion concepts skyrocketed – literally and figuratively – from the naïve ideas of Jules Verne to the manned missions to the Moon. On the other hand, with the notable exception of gravitational assist, hardly any of the “cold” space propulsion concepts made any progress in terms of practical application. Why is that? This article aims to show that the right answer to this question has the potential comparable by its implications and practical consequences to that of transition from Jules Verne’s stillborn and impractical conceptions of space flight to cogent and highly fertile ideas of Konstantin Tsiolkovsky and Yuri Kondratyuk.

Keywords: propulsion, rocket, rollet, spacecraft

Procedia PDF Downloads 507
86 Experimental Evaluation of Succinct Ternary Tree

Authors: Dmitriy Kuptsov

Abstract:

Tree data structures, such as binary or in general k-ary trees, are essential in computer science. The applications of these data structures can range from data search and retrieval to sorting and ranking algorithms. Naive implementations of these data structures can consume prohibitively large volumes of random access memory limiting their applicability in certain solutions. Thus, in these cases, more advanced representation of these data structures is essential. In this paper we present the design of the compact version of ternary tree data structure and demonstrate the results for the experimental evaluation using static dictionary problem. We compare these results with the results for binary and regular ternary trees. The conducted evaluation study shows that our design, in the best case, consumes up to 12 times less memory (for the dictionary used in our experimental evaluation) than a regular ternary tree and in certain configuration shows performance comparable to regular ternary trees. We have evaluated the performance of the algorithms using both 32 and 64 bit operating systems.

Keywords: algorithms, data structures, succinct ternary tree, per- formance evaluation

Procedia PDF Downloads 136
85 An Application to Predict the Best Study Path for Information Technology Students in Learning Institutes

Authors: L. S. Chathurika

Abstract:

Early prediction of student performance is an important factor to be gained academic excellence. Whatever the study stream in secondary education, students lay the foundation for higher studies during the first year of their degree or diploma program in Sri Lanka. The information technology (IT) field has certain improvements in the education domain by selecting specialization areas to show the talents and skills of students. These specializations can be software engineering, network administration, database administration, multimedia design, etc. After completing the first-year, students attempt to select the best path by considering numerous factors. The purpose of this experiment is to predict the best study path using machine learning algorithms. Five classification algorithms: decision tree, support vector machine, artificial neural network, Naïve Bayes, and logistic regression are selected and tested. The support vector machine obtained the highest accuracy, 82.4%. Then affecting features are recognized to select the best study path.

Keywords: algorithm, classification, evaluation, features, testing, training

Procedia PDF Downloads 96
84 Model for Introducing Products to New Customers through Decision Tree Using Algorithm C4.5 (J-48)

Authors: Komol Phaisarn, Anuphan Suttimarn, Vitchanan Keawtong, Kittisak Thongyoun, Chaiyos Jamsawang

Abstract:

This article is intended to analyze insurance information which contains information on the customer decision when purchasing life insurance pay package. The data were analyzed in order to present new customers with Life Insurance Perfect Pay package to meet new customers’ needs as much as possible. The basic data of insurance pay package were collect to get data mining; thus, reducing the scattering of information. The data were then classified in order to get decision model or decision tree using Algorithm C4.5 (J-48). In the classification, WEKA tools are used to form the model and testing datasets are used to test the decision tree for the accurate decision. The validation of this model in classifying showed that the accurate prediction was 68.43% while 31.25% were errors. The same set of data were then tested with other models, i.e. Naive Bayes and Zero R. The results showed that J-48 method could predict more accurately. So, the researcher applied the decision tree in writing the program used to introduce the product to new customers to persuade customers’ decision making in purchasing the insurance package that meets the new customers’ needs as much as possible.

Keywords: decision tree, data mining, customers, life insurance pay package

Procedia PDF Downloads 401
83 A Predictive Machine Learning Model of the Survival of Female-led and Co-Led Small and Medium Enterprises in the UK

Authors: Mais Khader, Xingjie Wei

Abstract:

This research sheds light on female entrepreneurs by providing new insights on the survival predictions of companies led by females in the UK. This study aims to build a predictive machine learning model of the survival of female-led & co-led small & medium enterprises (SMEs) in the UK over the period 2000-2020. The predictive model built utilised a combination of financial and non-financial features related to both companies and their directors to predict SMEs' survival. These features were studied in terms of their contribution to the resultant predictive model. Five machine learning models are used in the modelling: Decision tree, AdaBoost, Naïve Bayes, Logistic regression and SVM. The AdaBoost model had the highest performance of the five models, with an accuracy of 73% and an AUC of 80%. The results show high feature importance in predicting companies' survival for company size, management experience, financial performance, industry, region, and females' percentage in management.

Keywords: company survival, entrepreneurship, females, machine learning, SMEs

Procedia PDF Downloads 54
82 A Scalable Model of Fair Socioeconomic Relations Based on Blockchain and Machine Learning Algorithms-1: On Hyperinteraction and Intuition

Authors: Merey M. Sarsengeldin, Alexandr S. Kolokhmatov, Galiya Seidaliyeva, Alexandr Ozerov, Sanim T. Imatayeva

Abstract:

This series of interdisciplinary studies is an attempt to investigate and develop a scalable model of fair socioeconomic relations on the base of blockchain using positive psychology techniques and Machine Learning algorithms for data analytics. In this particular study, we use hyperinteraction approach and intuition to investigate their influence on 'wisdom of crowds' via created mobile application which was created for the purpose of this research. Along with the public blockchain and private Decentralized Autonomous Organization (DAO) which were elaborated by us on the base of Ethereum blockchain, a model of fair financial relations of members of DAO was developed. We developed a smart contract, so-called, Fair Price Protocol and use it for implementation of model. The data obtained from mobile application was analyzed by ML algorithms. A model was tested on football matches.

Keywords: blockchain, Naïve Bayes algorithm, hyperinteraction, intuition, wisdom of crowd, decentralized autonomous organization

Procedia PDF Downloads 138
81 The Regulation of the Cancer Epigenetic Landscape Lies in the Realm of the Long Non-coding RNAs

Authors: Ricardo Alberto Chiong Zevallos, Eduardo Moraes Rego Reis

Abstract:

Pancreatic adenocarcinoma (PDAC) patients have a less than 10% 5-year survival rate. PDAC has no defined diagnostic and prognostic biomarkers. Gemcitabine is the first-line drug in PDAC and several other cancers. Long non-coding RNAs (lncRNAs) contribute to the tumorigenesis and are potential biomarkers for PDAC. Although lncRNAs aren’t translated into proteins, they have important functions. LncRNAs can decoy or recruit proteins from the epigenetic machinery, act as microRNA sponges, participate in protein translocation through different cellular compartments, and even promote chemoresistance. The chromatin remodeling enzyme EZH2 is a histone methyltransferase that catalyzes the methylation of histone 3 at lysine 27, silencing local expression. EZH2 is ambivalent, it can also activate gene expression independently of its histone methyltransferase activity. EZH2 is overexpressed in several cancers and interacts with lncRNAs, being recruited to a specific locus. EZH2 can be recruited to activate an oncogene or silence a tumor suppressor. The lncRNAs misregulation in cancer can result in the differential recruitment of EZH2 and in a distinct epigenetic landscape, promoting chemoresistance. The relevance of the EZH2-lncRNAs interaction to chemoresistant PDAC was assessed by Real Time quantitative PCR (RT-qPCR) and RNA Immunoprecipitation (RIP) experiments with naïve and gemcitabine-resistant PDAC cells. The expression of several lncRNAs and EZH2 gene targets was evaluated contrasting naïve and resistant cells. Selection of candidate genes was made by bioinformatic analysis and literature curation. Indeed, the resistant cell line showed higher expression of chemoresistant-associated lncRNAs and protein coding genes. RIP detected lncRNAs interacting with EZH2 with varying intensity levels in the cell lines. During RIP, the nuclear fraction of the cells was incubated with an antibody for EZH2 and with magnetic beads. The RNA precipitated with the beads-antibody-EZH2 complex was isolated and reverse transcribed. The presence of candidate lncRNAs was detected by RT-qPCR, and the enrichment was calculated relative to INPUT (total lysate control sample collected before RIP). The enrichment levels varied across the several lncRNAs and cell lines. The EZH2-lncRNA interaction might be responsible for the regulation of chemoresistance-associated genes in multiple cancers. The relevance of the lncRNA-EZH2 interaction to PDAC was assessed by siRNA knockdown of a lncRNA, followed by the analysis of the EZH2 target expression by RT-qPCR. The chromatin immunoprecipitation (ChIP) of EZH2 and H3K27me3 followed by RT-qPCR with primers for EZH2 targets also assess the specificity of the EZH2 recruitment by the lncRNA. This is the first report of the interaction of EZH2 and lncRNAs HOTTIP and PVT1 in chemoresistant PDAC. HOTTIP and PVT1 were described as promoting chemoresistance in several cancers, but the role of EZH2 is not clarified. For the first time, the lncRNA LINC01133 was detected in a chemoresistant cancer. The interaction of EZH2 with LINC02577, LINC00920, LINC00941, and LINC01559 have never been reported in any context. The novel lncRNAs-EZH2 interactions regulate chemoresistant-associated genes in PDAC and might be relevant to other cancers. Therapies targeting EZH2 alone weren’t successful, and a combinatorial approach also targeting the lncRNAs interacting with it might be key to overcome chemoresistance in several cancers.

Keywords: epigenetics, chemoresistance, long non-coding RNAs, pancreatic cancer, histone modification

Procedia PDF Downloads 61
80 Cross Project Software Fault Prediction at Design Phase

Authors: Pradeep Singh, Shrish Verma

Abstract:

Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. The earlier we predict the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven data sets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.

Keywords: software metrics, fault prediction, cross project, within project.

Procedia PDF Downloads 309
79 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 301
78 Homing of B Cells via Afferent Lymphatics

Authors: Sara Pereira-Nogueira, Tim Worbs, Marc Permanyer-Bosser, Reinhold Förster

Abstract:

While the entry mechanism of lymphocytes into the lymph node via the blood are well described, it is still largely unknown how cells enter lymph nodes that arrive via afferent lymphatics. In order to address this, our group has established a micro-injection technique in mice through which cells are delivered directly into the lymphatic vessel immediately afferent to the popliteal lymph node. Injected cells can then be tracked via multi-colour fluorescence or 2-photon microscopy, and their localization can be analysed within the popliteal or downstream lymph nodes by immunohistology. Since naïve B cells express the chemokine receptor CXCR5 we intra-lymphatically co-injected B cells derived from wildtype and Cxcr5-deficient mice. While CXCR5 does not play a role in guiding B cells out of the subcapsular sinus, it affects their positioning within the lymph node parenchyma, since CXCR5-deficient B cells are impaired in migrating into the B cell follicle. The knowledge obtained by studying B-cell migration may prove beneficial in clinical settings regarding tumor metastasis or autoimmune diseases.

Keywords: afferent lymphatics, B cell migration, chemokine, intra-lymphatic injection

Procedia PDF Downloads 234
77 The Best Prediction Data Mining Model for Breast Cancer Probability in Women Residents in Kabul

Authors: Mina Jafari, Kobra Hamraee, Saied Hossein Hosseini

Abstract:

The prediction of breast cancer disease is one of the challenges in medicine. In this paper we collected 528 records of women’s information who live in Kabul including demographic, life style, diet and pregnancy data. There are many classification algorithm in breast cancer prediction and tried to find the best model with most accurate result and lowest error rate. We evaluated some other common supervised algorithms in data mining to find the best model in prediction of breast cancer disease among afghan women living in Kabul regarding to momography result as target variable. For evaluating these algorithms we used Cross Validation which is an assured method for measuring the performance of models. After comparing error rate and accuracy of three models: Decision Tree, Naive Bays and Rule Induction, Decision Tree with accuracy of 94.06% and error rate of %15 is found the best model to predicting breast cancer disease based on the health care records.

Keywords: decision tree, breast cancer, probability, data mining

Procedia PDF Downloads 108
76 Intrathecal Fentanyl with 0.5% Bupivacaine Heavy in Chronic Opium Abusers

Authors: Suneet Kathuria, Shikha Gupta, Kapil Dev, Sunil Katyal

Abstract:

Chronic use of opioids in opium abusers can cause poor pain control and increased analgaesic requirement. We compared the duration of spinal anaesthesia in chronic opium abusers and non-abusers. This prospective randomised study included 60 American Society of Anesthesiologists (ASA) Grade I or II adults undergoing surgery under spinal anaesthesia with 10 mg bupivacaine, and 25 μg fentanyl in non-opium abusers (Group A); and chronic opium abusers (Group B), and 40 μg fentanyl in chronic opium abusers (Group C). Patients were assessed for onset and duration of sensory and motor blockade and duration of effective analgesia. Mean time to onset of adequate analgesia in opium abusers was significantly longer in chronic opium abusers than in opium-naive patients. The duration of sensory block and motor block was significantly less in chronic opium abusers than in non-opium abusers. Duration of effective analgesia in groups A, B and C was 255.55 ± 26.84, 217.85 ± 15.15, and 268.20 ± 18.25 minutes, respectively; this difference was statistically significant. In chronic opium abusers, the duration of spinal anaesthesia is significantly shorter than that in opium nonabusers. The duration of spinal anaesthesia with bupivacaine and fentanyl in chronic opium abusers can be improved by increasing the intrathecal fentanyl dose from 25 μg to 40 μg.

Keywords: bupivacaine, chronic opium abusers, fentanyl, intrathecal

Procedia PDF Downloads 265
75 Polarity Classification of Social Media Comments in Turkish

Authors: Migena Ceyhan, Zeynep Orhan, Dimitrios Karras

Abstract:

People in modern societies are continuously sharing their experiences, emotions, and thoughts in different areas of life. The information reaches almost everyone in real-time and can have an important impact in shaping people’s way of living. This phenomenon is very well recognized and advantageously used by the market representatives, trying to earn the most from this means. Given the abundance of information, people and organizations are looking for efficient tools that filter the countless data into important information, ready to analyze. This paper is a modest contribution in this field, describing the process of automatically classifying social media comments in the Turkish language into positive or negative. Once data is gathered and preprocessed, feature sets of selected single words or groups of words are build according to the characteristics of language used in the texts. These features are used later to train, and test a system according to different machine learning algorithms (Naïve Bayes, Sequential Minimal Optimization, J48, and Bayesian Linear Regression). The resultant high accuracies can be important feedback for decision-makers to improve the business strategies accordingly.

Keywords: feature selection, machine learning, natural language processing, sentiment analysis, social media reviews

Procedia PDF Downloads 120
74 On Estimating the Low Income Proportion with Several Auxiliary Variables

Authors: Juan F. Muñoz-Rosas, Rosa M. García-Fernández, Encarnación Álvarez-Verdejo, Pablo J. Moya-Fernández

Abstract:

Poverty measurement is a very important topic in many studies in social sciences. One of the most important indicators when measuring poverty is the low income proportion. This indicator gives the proportion of people of a population classified as poor. This indicator is generally unknown, and for this reason, it is estimated by using survey data, which are obtained by official surveys carried out by many statistical agencies such as Eurostat. The main feature of the mentioned survey data is the fact that they contain several variables. The variable used to estimate the low income proportion is called as the variable of interest. The survey data may contain several additional variables, also named as the auxiliary variables, related to the variable of interest, and if this is the situation, they could be used to improve the estimation of the low income proportion. In this paper, we use Monte Carlo simulation studies to analyze numerically the performance of estimators based on several auxiliary variables. In this simulation study, we considered real data sets obtained from the 2011 European Union Survey on Income and Living Condition. Results derived from this study indicate that the estimators based on auxiliary variables are more accurate than the naive estimator.

Keywords: inclusion probability, poverty, poverty line, survey sampling

Procedia PDF Downloads 421
73 A Stochastic Volatility Model for Optimal Market-Making

Authors: Zubier Arfan, Paul Johnson

Abstract:

The electronification of financial markets and the rise of algorithmic trading has sparked a lot of interest from the mathematical community, for the market making-problem in particular. The research presented in this short paper solves the classic stochastic control problem in order to derive the strategy for a market-maker. It also shows how to calibrate and simulate the strategy with real limit order book data for back-testing. The ambiguity of limit-order priority in back-testing is dealt with by considering optimistic and pessimistic priority scenarios. The model, although it does outperform a naive strategy, assumes constant volatility, therefore, is not best suited to the LOB data. The Heston model is introduced to describe the price and variance process of the asset. The Trader's constant absolute risk aversion utility function is optimised by numerically solving a 3-dimensional Hamilton-Jacobi-Bellman partial differential equation to find the optimal limit order quotes. The results show that the stochastic volatility market-making model is more suitable for a risk-averse trader and is also less sensitive to calibration error than the constant volatility model.

Keywords: market-making, market-microsctrucure, stochastic volatility, quantitative trading

Procedia PDF Downloads 115
72 Hawking Radiation of Grumiller Black

Authors: Sherwan Kher Alden Yakub Alsofy

Abstract:

In this paper, we consider the relativistic Hamilton-Jacobi (HJ) equation and study the Hawking radiation (HR) of scalar particles from uncharged Grumiller black hole (GBH) which is affordable for testing in astrophysics. GBH is also known as Rindler modified Schwarzschild BH. Our aim is not only to investigate the effect of the Rindler parameter A on the Hawking temperature (TH ), but to examine whether there is any discrepancy between the computed horizon temperature and the standard TH as well. For this purpose, in addition to its naive coordinate system, we study on the three regular coordinate systems which are Painlev´-Gullstrand (PG), ingoing Eddington- Finkelstein (IEF) and Kruskal-Szekeres (KS) coordinates. In all coordinate systems, we calculate the tunneling probabilities of incoming and outgoing scalar particles from the event horizon by using the HJ equation. It has been shown in detail that the considered HJ method is concluded with the conventional TH in all these coordinate systems without giving rise to the famous factor- 2 problem. Furthermore, in the PG coordinates Parikh-Wilczek’s tunneling (PWT) method is employed in order to show how one can integrate the quantum gravity (QG) corrections to the semiclassical tunneling rate by including the effects of self-gravitation and back reaction. We then show how these corrections yield a modification in the TH.

Keywords: ingoing Eddington, Finkelstein, coordinates Parikh-Wilczek’s, Hamilton-Jacobi equation

Procedia PDF Downloads 586
71 Detection and Classification of Myocardial Infarction Using New Extracted Features from Standard 12-Lead ECG Signals

Authors: Naser Safdarian, Nader Jafarnia Dabanloo

Abstract:

In this paper we used four features i.e. Q-wave integral, QRS complex integral, T-wave integral and total integral as extracted feature from normal and patient ECG signals to detection and localization of myocardial infarction (MI) in left ventricle of heart. In our research we focused on detection and localization of MI in standard ECG. We use the Q-wave integral and T-wave integral because this feature is important impression in detection of MI. We used some pattern recognition method such as Artificial Neural Network (ANN) to detect and localize the MI. Because these methods have good accuracy for classification of normal and abnormal signals. We used one type of Radial Basis Function (RBF) that called Probabilistic Neural Network (PNN) because of its nonlinearity property, and used other classifier such as k-Nearest Neighbors (KNN), Multilayer Perceptron (MLP) and Naive Bayes Classification. We used PhysioNet database as our training and test data. We reached over 80% for accuracy in test data for localization and over 95% for detection of MI. Main advantages of our method are simplicity and its good accuracy. Also we can improve accuracy of classification by adding more features in this method. A simple method based on using only four features which extracted from standard ECG is presented which has good accuracy in MI localization.

Keywords: ECG signal processing, myocardial infarction, features extraction, pattern recognition

Procedia PDF Downloads 427
70 Stackelberg Security Game for Optimizing Security of Federated Internet of Things Platform Instances

Authors: Violeta Damjanovic-Behrendt

Abstract:

This paper presents an approach for optimal cyber security decisions to protect instances of a federated Internet of Things (IoT) platform in the cloud. The presented solution implements the repeated Stackelberg Security Game (SSG) and a model called Stochastic Human behaviour model with AttRactiveness and Probability weighting (SHARP). SHARP employs the Subjective Utility Quantal Response (SUQR) for formulating a subjective utility function, which is based on the evaluations of alternative solutions during decision-making. We augment the repeated SSG (including SHARP and SUQR) with a reinforced learning algorithm called Naïve Q-Learning. Naïve Q-Learning belongs to the category of active and model-free Machine Learning (ML) techniques in which the agent (either the defender or the attacker) attempts to find an optimal security solution. In this way, we combine GT and ML algorithms for discovering optimal cyber security policies. The proposed security optimization components will be validated in a collaborative cloud platform that is based on the Industrial Internet Reference Architecture (IIRA) and its recently published security model.

Keywords: security, internet of things, cloud computing, stackelberg game, machine learning, naive q-learning

Procedia PDF Downloads 326
69 Statistical Analysis with Prediction Models of User Satisfaction in Software Project Factors

Authors: Katawut Kaewbanjong

Abstract:

We analyzed a volume of data and found significant user satisfaction in software project factors. A statistical significance analysis (logistic regression) and collinearity analysis determined the significance factors from a group of 71 pre-defined factors from 191 software projects in ISBSG Release 12. The eight prediction models used for testing the prediction potential of these factors were Neural network, k-NN, Naïve Bayes, Random forest, Decision tree, Gradient boosted tree, linear regression and logistic regression prediction model. Fifteen pre-defined factors were truly significant in predicting user satisfaction, and they provided 82.71% prediction accuracy when used with a neural network prediction model. These factors were client-server, personnel changes, total defects delivered, project inactive time, industry sector, application type, development type, how methodology was acquired, development techniques, decision making process, intended market, size estimate approach, size estimate method, cost recording method, and effort estimate method. These findings may benefit software development managers considerably.

Keywords: prediction model, statistical analysis, software project, user satisfaction factor

Procedia PDF Downloads 89
68 Predicting Relative Performance of Sector Exchange Traded Funds Using Machine Learning

Authors: Jun Wang, Ge Zhang

Abstract:

Machine learning has been used in many areas today. It thrives at reviewing large volumes of data and identifying patterns and trends that might not be apparent to a human. Given the huge potential benefit and the amount of data available in the financial market, it is not surprising to see machine learning applied to various financial products. While future prices of financial securities are extremely difficult to forecast, we study them from a different angle. Instead of trying to forecast future prices, we apply machine learning algorithms to predict the direction of future price movement, in particular, whether a sector Exchange Traded Fund (ETF) would outperform or underperform the market in the next week or in the next month. We apply several machine learning algorithms for this prediction. The algorithms are Linear Discriminant Analysis (LDA), k-Nearest Neighbors (KNN), Decision Tree (DT), Gaussian Naive Bayes (GNB), and Neural Networks (NN). We show that these machine learning algorithms, most notably GNB and NN, have some predictive power in forecasting out-performance and under-performance out of sample. We also try to explore whether it is possible to utilize the predictions from these algorithms to outperform the buy-and-hold strategy of the S&P 500 index. The trading strategy to explore out-performance predictions does not perform very well, but the trading strategy to explore under-performance predictions can earn higher returns than simply holding the S&P 500 index out of sample.

Keywords: machine learning, ETF prediction, dynamic trading, asset allocation

Procedia PDF Downloads 57
67 Exploring Data Leakage in EEG Based Brain-Computer Interfaces: Overfitting Challenges

Authors: Khalida Douibi, Rodrigo Balp, Solène Le Bars

Abstract:

In the medical field, applications related to human experiments are frequently linked to reduced samples size, which makes the training of machine learning models quite sensitive and therefore not very robust nor generalizable. This is notably the case in Brain-Computer Interface (BCI) studies, where the sample size rarely exceeds 20 subjects or a few number of trials. To address this problem, several resampling approaches are often used during the data preparation phase, which is an overly critical step in a data science analysis process. One of the naive approaches that is usually applied by data scientists consists in the transformation of the entire database before the resampling phase. However, this can cause model’ s performance to be incorrectly estimated when making predictions on unseen data. In this paper, we explored the effect of data leakage observed during our BCI experiments for device control through the real-time classification of SSVEPs (Steady State Visually Evoked Potentials). We also studied potential ways to ensure optimal validation of the classifiers during the calibration phase to avoid overfitting. The results show that the scaling step is crucial for some algorithms, and it should be applied after the resampling phase to avoid data leackage and improve results.

Keywords: data leackage, data science, machine learning, SSVEP, BCI, overfitting

Procedia PDF Downloads 120