Search results for: clustering algorithms
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2408

Search results for: clustering algorithms

428 A Methodology of Using Fuzzy Logics and Data Analytics to Estimate the Life Cycle Indicators of Solar Photovoltaics

Authors: Thor Alexis Sazon, Alexander Guzman-Urbina, Yasuhiro Fukushima

Abstract:

This study outlines the method of how to develop a surrogate life cycle model based on fuzzy logic using three fuzzy inference methods: (1) the conventional Fuzzy Inference System (FIS), (2) the hybrid system of Data Analytics and Fuzzy Inference (DAFIS), which uses data clustering for defining the membership functions, and (3) the Adaptive-Neuro Fuzzy Inference System (ANFIS), a combination of fuzzy inference and artificial neural network. These methods were demonstrated with a case study where the Global Warming Potential (GWP) and the Levelized Cost of Energy (LCOE) of solar photovoltaic (PV) were estimated using Solar Irradiation, Module Efficiency, and Performance Ratio as inputs. The effects of using different fuzzy inference types, either Sugeno- or Mamdani-type, and of changing the number of input membership functions to the error between the calibration data and the model-generated outputs were also illustrated. The solution spaces of the three methods were consequently examined with a sensitivity analysis. ANFIS exhibited the lowest error while DAFIS gave slightly lower errors compared to FIS. Increasing the number of input membership functions helped with error reduction in some cases but, at times, resulted in the opposite. Sugeno-type models gave errors that are slightly lower than those of the Mamdani-type. While ANFIS is superior in terms of error minimization, it could generate solutions that are questionable, i.e. the negative GWP values of the Solar PV system when the inputs were all at the upper end of their range. This shows that the applicability of the ANFIS models highly depends on the range of cases at which it was calibrated. FIS and DAFIS generated more intuitive trends in the sensitivity runs. DAFIS demonstrated an optimal design point wherein increasing the input values does not improve the GWP and LCOE anymore. In the absence of data that could be used for calibration, conventional FIS presents a knowledge-based model that could be used for prediction. In the PV case study, conventional FIS generated errors that are just slightly higher than those of DAFIS. The inherent complexity of a Life Cycle study often hinders its widespread use in the industry and policy-making sectors. While the methodology does not guarantee a more accurate result compared to those generated by the Life Cycle Methodology, it does provide a relatively simpler way of generating knowledge- and data-based estimates that could be used during the initial design of a system.

Keywords: solar photovoltaic, fuzzy logic, inference system, artificial neural networks

Procedia PDF Downloads 129
427 A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators

Authors: Wei Zhang

Abstract:

With the rapid development of deep learning, neural network and deep learning algorithms play a significant role in various practical applications. Due to the high accuracy and good performance, Convolutional Neural Networks (CNNs) especially have become a research hot spot in the past few years. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses a significant challenge to construct a high-performance implementation of deep learning neural networks. Meanwhile, many of these application scenarios also have strict requirements on the performance and low-power consumption of hardware devices. Therefore, it is particularly critical to choose a moderate computing platform for hardware acceleration of CNNs. This article aimed to survey the recent advance in Field Programmable Gate Array (FPGA)-based acceleration of CNNs. Various designs and implementations of the accelerator based on FPGA under different devices and network models are overviewed, and the versions of Graphic Processing Units (GPUs), Application Specific Integrated Circuits (ASICs) and Digital Signal Processors (DSPs) are compared to present our own critical analysis and comments. Finally, we give a discussion on different perspectives of these acceleration and optimization methods on FPGA platforms to further explore the opportunities and challenges for future research. More helpfully, we give a prospect for future development of the FPGA-based accelerator.

Keywords: deep learning, field programmable gate array, FPGA, hardware accelerator, convolutional neural networks, CNN

Procedia PDF Downloads 96
426 Optimization of Municipal Solid Waste Management in Peshawar Using Mathematical Modelling and GIS with Focus on Incineration

Authors: Usman Jilani, Ibad Khurram, Irshad Hussain

Abstract:

Environmentally sustainable waste management is a challenging task as it involves multiple and diverse economic, environmental, technical and regulatory issues. Municipal Solid Waste Management (MSWM) is more challenging in developing countries like Pakistan due to lack of awareness, technology and human resources, insufficient funding, inefficient collection and transport mechanism resulting in the lack of a comprehensive waste management system. This work presents an overview of current MSWM practices in Peshawar, the provincial capital of Khyber Pakhtunkhwa, Pakistan and proposes a better and sustainable integrated solid waste management system with incineration (Waste to Energy) option. The diverted waste would otherwise generate revenue; minimize land fill requirement and negative impact on the environment. The proposed optimized solution utilizing scientific techniques (like mathematical modeling, optimization algorithms and GIS) as decision support tools enhances the technical & institutional efficiency leading towards a more sustainable waste management system through incorporating: - Improved collection mechanisms through optimized transportation / routing and, - Resource recovery through incineration and selection of most feasible sites for transfer stations, landfills and incineration plant. These proposed methods shift the linear waste management system towards a cyclic system and can also be used as a decision support tool by the WSSP (Water and Sanitation Services Peshawar), agency responsible for the MSWM in Peshawar.

Keywords: municipal solid waste management, incineration, mathematical modeling, optimization, GIS, Peshawar

Procedia PDF Downloads 346
425 Prediction of Solanum Lycopersicum Genome Encoded microRNAs Targeting Tomato Spotted Wilt Virus

Authors: Muhammad Shahzad Iqbal, Zobia Sarwar, Salah-ud-Din

Abstract:

Tomato spotted wilt virus (TSWV) belongs to the genus Tospoviruses (family Bunyaviridae). It is one of the most devastating pathogens of tomato (Solanum Lycopersicum) and heavily damages the crop yield each year around the globe. In this study, we retrieved 329 mature miRNA sequences from two microRNA databases (miRBase and miRSoldb) and checked the putative target sites in the downloaded-genome sequence of TSWV. A consensus of three miRNA target prediction tools (RNA22, miRanda and psRNATarget) was used to screen the false-positive microRNAs targeting sites in the TSWV genome. These tools calculated different target sites by calculating minimum free energy (mfe), site-complementarity, minimum folding energy and other microRNA-mRNA binding factors. R language was used to plot the predicted target-site data. All the genes having possible target sites for different miRNAs were screened by building a consensus table. Out of these 329 mature miRNAs predicted by three algorithms, only eight miRNAs met all the criteria/threshold specifications. MC-Fold and MC-Sym were used to predict three-dimensional structures of miRNAs and further analyzed in USCF chimera to visualize the structural and conformational changes before and after microRNA-mRNA interactions. The results of the current study show that the predicted eight miRNAs could further be evaluated by in vitro experiments to develop TSWV-resistant transgenic tomato plants in the future.

Keywords: tomato spotted wild virus (TSWV), Solanum lycopersicum, plant virus, miRNAs, microRNA target prediction, mRNA

Procedia PDF Downloads 120
424 An Intelligent Search and Retrieval System for Mining Clinical Data Repositories Based on Computational Imaging Markers and Genomic Expression Signatures for Investigative Research and Decision Support

Authors: David J. Foran, Nhan Do, Samuel Ajjarapu, Wenjin Chen, Tahsin Kurc, Joel H. Saltz

Abstract:

The large-scale data and computational requirements of investigators throughout the clinical and research communities demand an informatics infrastructure that supports both existing and new investigative and translational projects in a robust, secure environment. In some subspecialties of medicine and research, the capacity to generate data has outpaced the methods and technology used to aggregate, organize, access, and reliably retrieve this information. Leading health care centers now recognize the utility of establishing an enterprise-wide, clinical data warehouse. The primary benefits that can be realized through such efforts include cost savings, efficient tracking of outcomes, advanced clinical decision support, improved prognostic accuracy, and more reliable clinical trials matching. The overarching objective of the work presented here is the development and implementation of a flexible Intelligent Retrieval and Interrogation System (IRIS) that exploits the combined use of computational imaging, genomics, and data-mining capabilities to facilitate clinical assessments and translational research in oncology. The proposed System includes a multi-modal, Clinical & Research Data Warehouse (CRDW) that is tightly integrated with a suite of computational and machine-learning tools to provide insight into the underlying tumor characteristics that are not be apparent by human inspection alone. A key distinguishing feature of the System is a configurable Extract, Transform and Load (ETL) interface that enables it to adapt to different clinical and research data environments. This project is motivated by the growing emphasis on establishing Learning Health Systems in which cyclical hypothesis generation and evidence evaluation become integral to improving the quality of patient care. To facilitate iterative prototyping and optimization of the algorithms and workflows for the System, the team has already implemented a fully functional Warehouse that can reliably aggregate information originating from multiple data sources including EHR’s, Clinical Trial Management Systems, Tumor Registries, Biospecimen Repositories, Radiology PAC systems, Digital Pathology archives, Unstructured Clinical Documents, and Next Generation Sequencing services. The System enables physicians to systematically mine and review the molecular, genomic, image-based, and correlated clinical information about patient tumors individually or as part of large cohorts to identify patterns that may influence treatment decisions and outcomes. The CRDW core system has facilitated peer-reviewed publications and funded projects, including an NIH-sponsored collaboration to enhance the cancer registries in Georgia, Kentucky, New Jersey, and New York, with machine-learning based classifications and quantitative pathomics, feature sets. The CRDW has also resulted in a collaboration with the Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC) at the U.S. Department of Veterans Affairs to develop algorithms and workflows to automate the analysis of lung adenocarcinoma. Those studies showed that combining computational nuclear signatures with traditional WHO criteria through the use of deep convolutional neural networks (CNNs) led to improved discrimination among tumor growth patterns. The team has also leveraged the Warehouse to support studies to investigate the potential of utilizing a combination of genomic and computational imaging signatures to characterize prostate cancer. The results of those studies show that integrating image biomarkers with genomic pathway scores is more strongly correlated with disease recurrence than using standard clinical markers.

Keywords: clinical data warehouse, decision support, data-mining, intelligent databases, machine-learning.

Procedia PDF Downloads 88
423 Data Science in Military Decision-Making: A Semi-Systematic Literature Review

Authors: H. W. Meerveld, R. H. A. Lindelauf

Abstract:

In contemporary warfare, data science is crucial for the military in achieving information superiority. Yet, to the authors’ knowledge, no extensive literature survey on data science in military decision-making has been conducted so far. In this study, 156 peer-reviewed articles were analysed through an integrative, semi-systematic literature review to gain an overview of the topic. The study examined to what extent literature is focussed on the opportunities or risks of data science in military decision-making, differentiated per level of war (i.e. strategic, operational, and tactical level). A relatively large focus on the risks of data science was observed in social science literature, implying that political and military policymakers are disproportionally influenced by a pessimistic view on the application of data science in the military domain. The perceived risks of data science are, however, hardly addressed in formal science literature. This means that the concerns on the military application of data science are not addressed to the audience that can actually develop and enhance data science models and algorithms. Cross-disciplinary research on both the opportunities and risks of military data science can address the observed research gaps. Considering the levels of war, relatively low attention for the operational level compared to the other two levels was observed, suggesting a research gap with reference to military operational data science. Opportunities for military data science mostly arise at the tactical level. On the contrary, studies examining strategic issues mostly emphasise the risks of military data science. Consequently, domain-specific requirements for military strategic data science applications are hardly expressed. Lacking such applications may ultimately lead to a suboptimal strategic decision in today’s warfare.

Keywords: data science, decision-making, information superiority, literature review, military

Procedia PDF Downloads 134
422 Efficiency of Robust Heuristic Gradient Based Enumerative and Tunneling Algorithms for Constrained Integer Programming Problems

Authors: Vijaya K. Srivastava, Davide Spinello

Abstract:

This paper presents performance of two robust gradient-based heuristic optimization procedures based on 3n enumeration and tunneling approach to seek global optimum of constrained integer problems. Both these procedures consist of two distinct phases for locating the global optimum of integer problems with a linear or non-linear objective function subject to linear or non-linear constraints. In both procedures, in the first phase, a local minimum of the function is found using the gradient approach coupled with hemstitching moves when a constraint is violated in order to return the search to the feasible region. In the second phase, in one optimization procedure, the second sub-procedure examines 3n integer combinations on the boundary and within hypercube volume encompassing the result neighboring the result from the first phase and in the second optimization procedure a tunneling function is constructed at the local minimum of the first phase so as to find another point on the other side of the barrier where the function value is approximately the same. In the next cycle, the search for the global optimum commences in both optimization procedures again using this new-found point as the starting vector. The search continues and repeated for various step sizes along the function gradient as well as that along the vector normal to the violated constraints until no improvement in optimum value is found. The results from both these proposed optimization methods are presented and compared with one provided by popular MS Excel solver that is provided within MS Office suite and other published results.

Keywords: constrained integer problems, enumerative search algorithm, Heuristic algorithm, Tunneling algorithm

Procedia PDF Downloads 303
421 Thermodynamic Modeling and Exergoeconomic Analysis of an Isobaric Adiabatic Compressed Air Energy Storage System

Authors: Youssef Mazloum, Haytham Sayah, Maroun Nemer

Abstract:

The penetration of renewable energy sources into the electric grid is significantly increasing. However, the intermittence of these sources breaks the balance between supply and demand for electricity. Hence, the importance of the energy storage technologies, they permit restoring the balance and reducing the drawbacks of intermittence of the renewable energies. This paper discusses the modeling and the cost-effectiveness of an isobaric adiabatic compressed air energy storage (IA-CAES) system. The proposed system is a combination among a compressed air energy storage (CAES) system with pumped hydro storage system and thermal energy storage system. The aim of this combination is to overcome the disadvantages of the conventional CAES system such as the losses due to the storage pressure variation, the loss of the compression heat and the use of fossil fuel sources. A steady state model is developed to perform an energy and exergy analyses of the IA-CAES system and calculate the distribution of the exergy losses in the latter system. A sensitivity analysis is also carried out to estimate the effects of some key parameters on the system’s efficiency, such as the pinch of the heat exchangers, the isentropic efficiency of the rotating machinery and the pressure losses. The conducted sensitivity analysis is a local analysis since the sensibility of each parameter changes with the variation of the other parameters. Therefore, an exergoeconomic study is achieved as well as a cost optimization in order to reduce the electricity cost produced during the production phase. The optimizer used is OmOptim which is a genetic algorithms based optimizer.

Keywords: cost-effectiveness, Exergoeconomic analysis, isobaric adiabatic compressed air energy storage (IA-CAES) system, thermodynamic modeling

Procedia PDF Downloads 218
420 Variation among East Wollega Coffee (Coffea arabica L.) Landraces for Quality Attributes

Authors: Getachew Weldemichael, Sentayehu Alamerew, Leta Tulu, Gezahegn Berecha

Abstract:

Coffee quality improvement program is becoming the focus of coffee research, as the world coffee consumption pattern shifted to high-quality coffee. However, there is limited information on the genetic variation of C. Arabica for quality improvement in potential specialty coffee growing areas of Ethiopia. Therefore, this experiment was conducted with the objectives of determining the magnitude of variation among 105 coffee accessions collected from east Wollega coffee growing areas and assessing correlations between the different coffee qualities attributes. It was conducted in RCRD with three replications. Data on green bean physical characters (shape and make, bean color and odor) and organoleptic cup quality traits (aromatic intensity, aromatic quality, acidity, astringency, bitterness, body, flavor, and overall standard of the liquor) were recorded. Analysis of variance, clustering, genetic divergence, principal component and correlation analysis was performed using SAS software. The result revealed that there were highly significant differences (P<0.01) among the accessions for all quality attributes except for odor and bitterness. Among the tested accessions, EW104 /09, EW101 /09, EW58/09, EW77/09, EW35/09, EW71/09, EW68/09, EW96 /09, EW83/09 and EW72/09 had the highest total coffee quality values (the sum of bean physical and cup quality attributes). These genotypes could serve as a source of genes for green bean physical characters and cup quality improvement in Arabica coffee. Furthermore, cluster analysis grouped the coffee accessions into five clusters with significant inter-cluster distances implying that there is moderate diversity among the accessions and crossing accessions from these divergent inter-clusters would result in hetrosis and recombinants in segregating generations. The principal component analysis revealed that the first three principal components with eigenvalues greater than unity accounted for 83.1% of the total variability due to the variation of nine quality attributes considered for PC analysis, indicating that all quality attributes equally contribute to a grouping of the accessions in different clusters. Organoleptic cup quality attributes showed positive and significant correlations both at the genotypic and phenotypic levels, demonstrating the possibility of simultaneous improvement of the traits. Path coefficient analysis revealed that acidity, flavor, and body had a high positive direct effect on overall cup quality, implying that these traits can be used as indirect criteria to improve overall coffee quality. Therefore, it was concluded that there is considerable variation among the accessions, which need to be properly conserved for future improvement of the coffee quality. However, the variability observed for quality attributes must be further verified using biochemical and molecular analysis.

Keywords: accessions, Coffea arabica, cluster analysis, correlation, principal component

Procedia PDF Downloads 134
419 BiLex-Kids: A Bilingual Word Database for Children 5-13 Years Old

Authors: Aris R. Terzopoulos, Georgia Z. Niolaki, Lynne G. Duncan, Mark A. J. Wilson, Antonios Kyparissiadis, Jackie Masterson

Abstract:

As word databases for bilingual children are not available, researchers, educators and textbook writers must rely on monolingual databases. The aim of this study is thus to develop a bilingual word database, BiLex-kids, an online open access developmental word database for 5-13 year old bilingual children who learn Greek as a second language and have English as their dominant one. BiLex-kids is compiled from 120 Greek textbooks used in Greek-English bilingual education in the UK, USA and Australia, and provides word translations in the two languages, pronunciations in Greek, and psycholinguistic variables (e.g. Zipf, Frequency per million, Dispersion, Contextual Diversity, Neighbourhood size). After clearing the textbooks of non-relevant items (e.g. punctuation), algorithms were applied to extract the psycholinguistic indices for all words. As well as one total lexicon, the database produces values for all ages (one lexicon for each age) and for three age bands (one lexicon per age band: 5-8, 9-11, 12-13 years). BiLex-kids provides researchers with accurate figures for a wide range of psycholinguistic variables, making it a useful and reliable research tool for selecting stimuli to examine lexical processing among bilingual children. In addition, it offers children the opportunity to study word spelling, learn translations and listen to pronunciations in their second language. It further benefits educators in selecting age-appropriate words for teaching reading and spelling, while special educational needs teachers will have a resource to control the content of word lists when designing interventions for bilinguals with literacy difficulties.

Keywords: bilingual children, psycholinguistics, vocabulary development, word databases

Procedia PDF Downloads 286
418 Fake Accounts Detection in Twitter Based on Minimum Weighted Feature Set

Authors: Ahmed ElAzab, Amira M. Idrees, Mahmoud A. Mahmoud, Hesham Hefny

Abstract:

Social networking sites such as Twitter and Facebook attracts over 500 million users across the world, for those users, their social life, even their practical life, has become interrelated. Their interaction with social networking has affected their life forever. Accordingly, social networking sites have become among the main channels that are responsible for vast dissemination of different kinds of information during real time events. This popularity in Social networking has led to different problems including the possibility of exposing incorrect information to their users through fake accounts which results to the spread of malicious content during life events. This situation can result to a huge damage in the real world to the society in general including citizens, business entities, and others. In this paper, we present a classification method for detecting fake accounts on Twitter. The study determines the minimized set of the main factors that influence the detection of the fake accounts on Twitter, then the determined factors have been applied using different classification techniques, a comparison of the results for these techniques has been performed and the most accurate algorithm is selected according to the accuracy of the results. The study has been compared with different recent research in the same area, this comparison has proved the accuracy of the proposed study. We claim that this study can be continuously applied on Twitter social network to automatically detect the fake accounts, moreover, the study can be applied on different Social network sites such as Facebook with minor changes according to the nature of the social network which are discussed in this paper.

Keywords: fake accounts detection, classification algorithms, twitter accounts analysis, features based techniques

Procedia PDF Downloads 368
417 Rapid Classification of Soft Rot Enterobacteriaceae Phyto-Pathogens Pectobacterium and Dickeya Spp. Using Infrared Spectroscopy and Machine Learning

Authors: George Abu-Aqil, Leah Tsror, Elad Shufan, Shaul Mordechai, Mahmoud Huleihel, Ahmad Salman

Abstract:

Pectobacterium and Dickeya spp which negatively affect a wide range of crops are the main causes of the aggressive diseases of agricultural crops. These aggressive diseases are responsible for a huge economic loss in agriculture including a severe decrease in the quality of the stored vegetables and fruits. Therefore, it is important to detect these pathogenic bacteria at their early stages of infection to control their spread and consequently reduce the economic losses. In addition, early detection is vital for producing non-infected propagative material for future generations. The currently used molecular techniques for the identification of these bacteria at the strain level are expensive and laborious. Other techniques require a long time of ~48 h for detection. Thus, there is a clear need for rapid, non-expensive, accurate and reliable techniques for early detection of these bacteria. In this study, infrared spectroscopy, which is a well-known technique with all its features, was used for rapid detection of Pectobacterium and Dickeya spp. at the strain level. The bacteria were isolated from potato plants and tubers with soft rot symptoms and measured by infrared spectroscopy. The obtained spectra were analyzed using different machine learning algorithms. The performances of our approach for taxonomic classification among the bacterial samples were evaluated in terms of success rates. The success rates for the correct classification of the genus, species and strain levels were ~100%, 95.2% and 92.6% respectively.

Keywords: soft rot enterobacteriaceae (SRE), pectobacterium, dickeya, plant infections, potato, solanum tuberosum, infrared spectroscopy, machine learning

Procedia PDF Downloads 72
416 Scheduling in a Single-Stage, Multi-Item Compatible Process Using Multiple Arc Network Model

Authors: Bokkasam Sasidhar, Ibrahim Aljasser

Abstract:

The problem of finding optimal schedules for each equipment in a production process is considered, which consists of a single stage of manufacturing and which can handle different types of products, where changeover for handling one type of product to the other type incurs certain costs. The machine capacity is determined by the upper limit for the quantity that can be processed for each of the products in a set up. The changeover costs increase with the number of set ups and hence to minimize the costs associated with the product changeover, the planning should be such that similar types of products should be processed successively so that the total number of changeovers and in turn the associated set up costs are minimized. The problem of cost minimization is equivalent to the problem of minimizing the number of set ups or equivalently maximizing the capacity utilization in between every set up or maximizing the total capacity utilization. Further, the production is usually planned against customers’ orders, and generally different customers’ orders are assigned one of the two priorities – “normal” or “priority” order. The problem of production planning in such a situation can be formulated into a Multiple Arc Network (MAN) model and can be solved sequentially using the algorithm for maximizing flow along a MAN and the algorithm for maximizing flow along a MAN with priority arcs. The model aims to provide optimal production schedule with an objective of maximizing capacity utilization, so that the customer-wise delivery schedules are fulfilled, keeping in view the customer priorities. Algorithms have been presented for solving the MAN formulation of the production planning with customer priorities. The application of the model is demonstrated through numerical examples.

Keywords: scheduling, maximal flow problem, multiple arc network model, optimization

Procedia PDF Downloads 379
415 Design, Synthesis and Evaluation of 4-(Phenylsulfonamido)Benzamide Derivatives as Selective Butyrylcholinesterase Inhibitors

Authors: Sushil Kumar Singh, Ashok Kumar, Ankit Ganeshpurkar, Ravi Singh, Devendra Kumar

Abstract:

In spectrum of neurodegenerative diseases, Alzheimer’s disease (AD) is characterized by the presence of amyloid β plaques and neurofibrillary tangles in the brain. It results in cognitive and memory impairment due to loss of cholinergic neurons, which is considered to be one of the contributing factors. Donepezil, an acetylcholinesterase (AChE) inhibitor which also inhibits butyrylcholinesterase (BuChE) and improves the memory and brain’s cognitive functions, is the most successful and prescribed drug to treat the symptoms of AD. The present work is based on designing of the selective BuChE inhibitors using computational techniques. In this work, machine learning models were trained using classification algorithms followed by screening of diverse chemical library of compounds. The various molecular modelling and simulation techniques were used to obtain the virtual hits. The amide derivatives of 4-(phenylsulfonamido) benzoic acid were synthesized and characterized using 1H & 13C NMR, FTIR and mass spectrometry. The enzyme inhibition assays were performed on equine plasma BuChE and electric eel’s AChE by method developed by Ellman et al. Compounds 31, 34, 37, 42, 49, 52 and 54 were found to be active against equine BuChE. N-(2-chlorophenyl)-4-(phenylsulfonamido)benzamide and N-(2-bromophenyl)-4-(phenylsulfonamido)benzamide (compounds 34 and 37) displayed IC50 of 61.32 ± 7.21 and 42.64 ± 2.17 nM against equine plasma BuChE. Ortho-substituted derivatives were more active against BuChE. Further, the ortho-halogen and ortho-alkyl substituted derivatives were found to be most active among all with minimal AChE inhibition. The compounds were selective toward BuChE.

Keywords: Alzheimer disease, butyrylcholinesterase, machine learning, sulfonamides

Procedia PDF Downloads 112
414 Artificial Intelligence in Bioscience: The Next Frontier

Authors: Parthiban Srinivasan

Abstract:

With recent advances in computational power and access to enough data in biosciences, artificial intelligence methods are increasingly being used in drug discovery research. These methods are essentially a series of advanced statistics based exercises that review the past to indicate the likely future. Our goal is to develop a model that accurately predicts biological activity and toxicity parameters for novel compounds. We have compiled a robust library of over 150,000 chemical compounds with different pharmacological properties from literature and public domain databases. The compounds are stored in simplified molecular-input line-entry system (SMILES), a commonly used text encoding for organic molecules. We utilize an automated process to generate an array of numerical descriptors (features) for each molecule. Redundant and irrelevant descriptors are eliminated iteratively. Our prediction engine is based on a portfolio of machine learning algorithms. We found Random Forest algorithm to be a better choice for this analysis. We captured non-linear relationship in the data and formed a prediction model with reasonable accuracy by averaging across a large number of randomized decision trees. Our next step is to apply deep neural network (DNN) algorithm to predict the biological activity and toxicity properties. We expect the DNN algorithm to give better results and improve the accuracy of the prediction. This presentation will review all these prominent machine learning and deep learning methods, our implementation protocols and discuss these techniques for their usefulness in biomedical and health informatics.

Keywords: deep learning, drug discovery, health informatics, machine learning, toxicity prediction

Procedia PDF Downloads 332
413 Laser Registration and Supervisory Control of neuroArm Robotic Surgical System

Authors: Hamidreza Hoshyarmanesh, Hosein Madieh, Sanju Lama, Yaser Maddahi, Garnette R. Sutherland, Kourosh Zareinia

Abstract:

This paper illustrates the concept of an algorithm to register specified markers on the neuroArm surgical manipulators, an image-guided MR-compatible tele-operated robot for microsurgery and stereotaxy. Two range-finding algorithms, namely time-of-flight and phase-shift, are evaluated for registration and supervisory control. The time-of-flight approach is implemented in a semi-field experiment to determine the precise position of a tiny retro-reflective moving object. The moving object simulates a surgical tool tip. The tool is a target that would be connected to the neuroArm end-effector during surgery inside the magnet bore of the MR imaging system. In order to apply flight approach, a 905-nm pulsed laser diode and an avalanche photodiode are utilized as the transmitter and receiver, respectively. For the experiment, a high frequency time to digital converter was designed using a field-programmable gate arrays. In the phase-shift approach, a continuous green laser beam with a wavelength of 530 nm was used as the transmitter. Results showed that a positioning error of 0.1 mm occurred when the scanner-target point distance was set in the range of 2.5 to 3 meters. The effectiveness of this non-contact approach exhibited that the method could be employed as an alternative for conventional mechanical registration arm. Furthermore, the approach is not limited by physical contact and extension of joint angles.

Keywords: 3D laser scanner, intraoperative MR imaging, neuroArm, real time registration, robot-assisted surgery, supervisory control

Procedia PDF Downloads 257
412 Cirrhosis Mortality Prediction as Classification using Frequent Subgraph Mining

Authors: Abdolghani Ebrahimi, Diego Klabjan, Chenxi Ge, Daniela Ladner, Parker Stride

Abstract:

In this work, we use machine learning and novel data analysis techniques to predict the one-year mortality of cirrhotic patients. Data from 2,322 patients with liver cirrhosis are collected at a single medical center. Different machine learning models are applied to predict one-year mortality. A comprehensive feature space including demographic information, comorbidity, clinical procedure and laboratory tests is being analyzed. A temporal pattern mining technic called Frequent Subgraph Mining (FSM) is being used. Model for End-stage liver disease (MELD) prediction of mortality is used as a comparator. All of our models statistically significantly outperform the MELD-score model and show an average 10% improvement of the area under the curve (AUC). The FSM technic itself does not improve the model significantly, but FSM, together with a machine learning technique called an ensemble, further improves the model performance. With the abundance of data available in healthcare through electronic health records (EHR), existing predictive models can be refined to identify and treat patients at risk for higher mortality. However, due to the sparsity of the temporal information needed by FSM, the FSM model does not yield significant improvements. To the best of our knowledge, this is the first work to apply modern machine learning algorithms and data analysis methods on predicting one-year mortality of cirrhotic patients and builds a model that predicts one-year mortality significantly more accurate than the MELD score. We have also tested the potential of FSM and provided a new perspective of the importance of clinical features.

Keywords: machine learning, liver cirrhosis, subgraph mining, supervised learning

Procedia PDF Downloads 109
411 Fully Automated Methods for the Detection and Segmentation of Mitochondria in Microscopy Images

Authors: Blessing Ojeme, Frederick Quinn, Russell Karls, Shannon Quinn

Abstract:

The detection and segmentation of mitochondria from fluorescence microscopy are crucial for understanding the complex structure of the nervous system. However, the constant fission and fusion of mitochondria and image distortion in the background make the task of detection and segmentation challenging. In the literature, a number of open-source software tools and artificial intelligence (AI) methods have been described for analyzing mitochondrial images, achieving remarkable classification and quantitation results. However, the availability of combined expertise in the medical field and AI required to utilize these tools poses a challenge to its full adoption and use in clinical settings. Motivated by the advantages of automated methods in terms of good performance, minimum detection time, ease of implementation, and cross-platform compatibility, this study proposes a fully automated framework for the detection and segmentation of mitochondria using both image shape information and descriptive statistics. Using the low-cost, open-source python and openCV library, the algorithms are implemented in three stages: pre-processing, image binarization, and coarse-to-fine segmentation. The proposed model is validated using the mitochondrial fluorescence dataset. Ground truth labels generated using a Lab kit were also used to evaluate the performance of our detection and segmentation model. The study produces good detection and segmentation results and reports the challenges encountered during the image analysis of mitochondrial morphology from the fluorescence mitochondrial dataset. A discussion on the methods and future perspectives of fully automated frameworks conclude the paper.

Keywords: 2D, binarization, CLAHE, detection, fluorescence microscopy, mitochondria, segmentation

Procedia PDF Downloads 334
410 Simulation, Optimization, and Analysis Approach of Microgrid Systems

Authors: Saqib Ali

Abstract:

Sources are classified into two depending upon the factor of reviving. These sources, which cannot be revived into their original shape once they are consumed, are considered as nonrenewable energy resources, i.e., (coal, fuel) Moreover, those energy resources which are revivable to the original condition even after being consumed are known as renewable energy resources, i.e., (wind, solar, hydel) Renewable energy is a cost-effective way to generate clean and green electrical energy Now a day’s majority of the countries are paying heed to energy generation from RES Pakistan is mostly relying on conventional energy resources which are mostly nonrenewable in nature coal, fuel is one of the major resources, and with the advent of time their prices are increasing on the other hand RES have great potential in the country with the deployment of RES greater reliability and an effective power system can be obtained In this thesis, a similar concept is being used and a hybrid power system is proposed which is composed of intermixing of renewable and nonrenewable sources The Source side is composed of solar, wind, fuel cells which will be used in an optimal manner to serve load The goal is to provide an economical, reliable, uninterruptable power supply. This is achieved by optimal controller (PI, PD, PID, FOPID) Optimization techniques are applied to the controllers to achieve the desired results. Advanced algorithms (Particle swarm optimization, Flower Pollination Algorithm) will be used to extract the desired output from the controller Detailed comparison in the form of tables and results will be provided, which will highlight the efficiency of the proposed system.

Keywords: distributed generation, demand-side management, hybrid power system, micro grid, renewable energy resources, supply-side management

Procedia PDF Downloads 70
409 Design of Microwave Building Block by Using Numerical Search Algorithm

Authors: Haifeng Zhou, Tsungyang Liow, Xiaoguang Tu, Eujin Lim, Chao Li, Junfeng Song, Xianshu Luo, Ying Huang, Lianxi Jia, Lianwee Luo, Qing Fang, Mingbin Yu, Guoqiang Lo

Abstract:

With the development of technology, countries gradually allocated more and more frequency spectrums for civilization and commercial usage, especially those high radio frequency bands indicating high information capacity. The field effect becomes more and more prominent in microwave components as frequency increases, which invalidates the transmission line theory and complicate the design of microwave components. Here a modeling approach based on numerical search algorithm is proposed to design various building blocks for microwave circuits to avoid complicated impedance matching and equivalent electrical circuit approximation. Concretely, a microwave component is discretized to a set of segments along the microwave propagation path. Each of the segment is initialized with random dimensions, which constructs a multiple-dimension parameter space. Then numerical searching algorithms (e.g. Pattern search algorithm) are used to find out the ideal geometrical parameters. The optimal parameter set is achieved by evaluating the fitness of S parameters after a number of iterations. We had adopted this approach in our current projects and designed many microwave components including sharp bends, T-branches, Y-branches, microstrip-to-stripline converters and etc. For example, a stripline 90° bend was designed in 2.54 mm x 2.54 mm space for dual-band operation (Ka band and Ku band) with < 0.18 dB insertion loss and < -55 dB reflection. We expect that this approach can enrich the tool kits for microwave designers.

Keywords: microwave component, microstrip and stripline, bend, power division, the numerical search algorithm.

Procedia PDF Downloads 355
408 Evaluation of Yield and Yield Components of Malaysian Palm Oil Board-Senegal Oil Palm Germplasm Using Multivariate Tools

Authors: Khin Aye Myint, Mohd Rafii Yusop, Mohd Yusoff Abd Samad, Shairul Izan Ramlee, Mohd Din Amiruddin, Zulkifli Yaakub

Abstract:

The narrow base of genetic is the main obstacle of breeding and genetic improvement in oil palm industry. In order to broaden the genetic bases, the Malaysian Palm Oil Board has been extensively collected wild germplasm from its original area of 11 African countries which are Nigeria, Senegal, Gambia, Guinea, Sierra Leone, Ghana, Cameroon, Zaire, Angola, Madagascar, and Tanzania. The germplasm collections were established and maintained as a field gene bank in Malaysian Palm Oil Board (MPOB) Research Station in Kluang, Johor, Malaysia to conserve a wide range of oil palm genetic resources for genetic improvement of Malaysian oil palm industry. Therefore, assessing the performance and genetic diversity of the wild materials is very important for understanding the genetic structure of natural oil palm population and to explore genetic resources. Principal component analysis (PCA) and Cluster analysis are very efficient multivariate tools in the evaluation of genetic variation of germplasm and have been applied in many crops. In this study, eight populations of MPOB-Senegal oil palm germplasm were studied to explore the genetic variation pattern using PCA and cluster analysis. A total of 20 yield and yield component traits were used to analyze PCA and Ward’s clustering using SAS 9.4 version software. The first four principal components which have eigenvalue >1 accounted for 93% of total variation with the value of 44%, 19%, 18% and 12% respectively for each principal component. PC1 showed highest positive correlation with fresh fruit bunch (0.315), bunch number (0.321), oil yield (0.317), kernel yield (0.326), total economic product (0.324), and total oil (0.324) while PC 2 has the largest positive association with oil to wet mesocarp (0.397) and oil to fruit (0.458). The oil palm population were grouped into four distinct clusters based on 20 evaluated traits, this imply that high genetic variation existed in among the germplasm. Cluster 1 contains two populations which are SEN 12 and SEN 10, while cluster 2 has only one population of SEN 3. Cluster 3 consists of three populations which are SEN 4, SEN 6, and SEN 7 while SEN 2 and SEN 5 were grouped in cluster 4. Cluster 4 showed the highest mean value of fresh fruit bunch, bunch number, oil yield, kernel yield, total economic product, and total oil and Cluster 1 was characterized by high oil to wet mesocarp, and oil to fruit. The desired traits that have the largest positive correlation on extracted PCs could be utilized for the improvement of oil palm breeding program. The populations from different clusters with the highest cluster means could be used for hybridization. The information from this study can be utilized for effective conservation and selection of the MPOB-Senegal oil palm germplasm for the future breeding program.

Keywords: cluster analysis, genetic variability, germplasm, oil palm, principal component analysis

Procedia PDF Downloads 140
407 Harmonic Assessment and Mitigation in Medical Diagonesis Equipment

Authors: S. S. Adamu, H. S. Muhammad, D. S. Shuaibu

Abstract:

Poor power quality in electrical power systems can lead to medical equipment at healthcare centres to malfunction and present wrong medical diagnosis. Equipment such as X-rays, computerized axial tomography, etc. can pollute the system due to their high level of harmonics production, which may cause a number of undesirable effects like heating, equipment damages and electromagnetic interferences. The conventional approach of mitigation uses passive inductor/capacitor (LC) filters, which has some drawbacks such as, large sizes, resonance problems and fixed compensation behaviours. The current trends of solutions generally employ active power filters using suitable control algorithms. This work focuses on assessing the level of Total Harmonic Distortion (THD) on medical facilities and various ways of mitigation, using radiology unit of an existing hospital as a case study. The measurement of the harmonics is conducted with a power quality analyzer at the point of common coupling (PCC). The levels of measured THD are found to be higher than the IEEE 519-1992 standard limits. The system is then modelled as a harmonic current source using MATLAB/SIMULINK. To mitigate the unwanted harmonic currents a shunt active filter is developed using synchronous detection algorithm to extract the fundamental component of the source currents. Fuzzy logic controller is then developed to control the filter. The THD without the active power filter are validated using the measured values. The THD with the developed filter show that the harmonics are now within the recommended limits.

Keywords: power quality, total harmonics distortion, shunt active filters, fuzzy logic

Procedia PDF Downloads 451
406 Performance of the New Laboratory-Based Algorithm for HIV Diagnosis in Southwestern China

Authors: Yanhua Zhao, Chenli Rao, Dongdong Li, Chuanmin Tao

Abstract:

The Chinese Centers for Disease Control and Prevention (CCDC) issued a new laboratory-based algorithm for HIV diagnosis on April 2016, which initially screens with a combination HIV-1/HIV-2 antigen/antibody fourth-generation immunoassay (IA) followed, when reactive, an HIV-1/HIV-2 undifferentiated antibody IA in duplicate. Reactive specimens with concordant results undergo supplemental tests with western blots, or HIV-1 nucleic acid tests (NATs) and non-reactive specimens with discordant results receive HIV-1 NATs or p24 antigen tests or 2-4 weeks follow-up tests. However, little data evaluating the application of the new algorithm have been reported to date. The study was to evaluate the performance of new laboratory-based HIV diagnostic algorithm in an inpatient population of Southwest China over the initial 6 months by compared with the old algorithm. Plasma specimens collected from inpatients from May 1, 2016, to October 31, 2016, are submitted to the laboratory for screening HIV infection performed by both the new HIV testing algorithm and the old version. The sensitivity and specificity of the algorithms and the difference of the categorized numbers of plasmas were calculated. Under the new algorithm for HIV diagnosis, 170 of the total 52 749 plasma specimens were confirmed as positively HIV-infected (0.32%). The sensitivity and specificity of the new algorithm were 100% (170/170) and 100% (52 579/52 579), respectively; while 167 HIV-1 positive specimens were identified by the old algorithm with sensitivity 98.24% (167/170) and 100% (52 579/52 579), respectively. Three acute HIV-1 infections (AHIs) and two early HIV-1 infections (EHIs) were identified by the new algorithm; the former was missed by old procedure. Compared with the old version, the new algorithm produced fewer WB-indeterminate results (2 vs. 16, p = 0.001), which led to fewer follow-up tests. Therefore, the new HIV testing algorithm is more sensitive for detecting acute HIV-1 infections with maintaining the ability to verify the established HIV-1 infections and can dramatically decrease the greater number of WB-indeterminate specimens.

Keywords: algorithm, diagnosis, HIV, laboratory

Procedia PDF Downloads 373
405 Data-Driven Simulations Tools for Der and Battery Rich Power Grids

Authors: Ali Moradiamani, Samaneh Sadat Sajjadi, Mahdi Jalili

Abstract:

Power system analysis has been a major research topic in the generation and distribution sections, in both industry and academia, for a long time. Several load flow and fault analysis scenarios have been normally performed to study the performance of different parts of the grid in the context of, for example, voltage and frequency control. Software tools, such as PSCAD, PSSE, and PowerFactory DIgSILENT, have been developed to perform these analyses accurately. Distribution grid had been the passive part of the grid and had been known as the grid of consumers. However, a significant paradigm shift has happened with the emergence of Distributed Energy Resources (DERs) in the distribution level. It means that the concept of power system analysis needs to be extended to the distribution grid, especially considering self sufficient technologies such as microgrids. Compared to the generation and transmission levels, the distribution level includes significantly more generation/consumption nodes thanks to PV rooftop solar generation and battery energy storage systems. In addition, different consumption profile is expected from household residents resulting in a diverse set of scenarios. Emergence of electric vehicles will absolutely make the environment more complicated considering their charging (and possibly discharging) requirements. These complexities, as well as the large size of distribution grids, create challenges for the available power system analysis software. In this paper, we study the requirements of simulation tools in the distribution grid and how data-driven algorithms are required to increase the accuracy of the simulation results.

Keywords: smart grids, distributed energy resources, electric vehicles, battery storage systsms, simulation tools

Procedia PDF Downloads 65
404 Effects of the Age, Education, and Mental Illness Experience on Depressive Disorder Stigmatization

Authors: Soowon Park, Min-Ji Kim, Jun-Young Lee

Abstract:

Motivation: The stigma of mental illness has been studied in many disciplines, including social psychology, counseling psychology, sociology, psychiatry, public health care, and related areas, because individuals labeled as ‘mentally ill’ are often deprived of their rights and their life opportunities. To understand the factors that deepen the stigma of mental illness, it is important to understand the influencing factors of the stigma. Problem statement: Depression is a common disorder in adults, but the incidence of help-seeking is low. Researchers have believed that this poor help-seeking behavior is related to the stigma of mental illness, which results from low mental health literacy. However, it is uncertain that increasing mental health literacy decreases mental health stigmatization. Furthermore, even though decreasing stigmatization is important, the stigma of mental illness is still a stable and long-lasting phenomenon. Thus, factors other than knowledge about mental disorders have the power to maintain the stigma. Investigating the influencing factors that facilitate the stigma of psychiatric disease could help lower the social stigmatization. Approach: Face-to-face interviews were conducted with a multi-clustering sample. A total of 700 Korean participants (38% male), ranging in age from 18 to 78 (M(SD)age= 48.5(15.7)) answered demographical questions, Korean version of Link’s Perceived Devaluation and Discrimination (PDD) scale for the assessment of social stigmatization against depression, and the Korean version of the WHO-Composite International Diagnostic Interview for the assessment of mental disorders. Multiple-regression was conducted to find the predicting factors of social stigmatization against depression. Ages, sex, years of education, income, living location, and experience of mental illness were used as the predictors. Results: Predictors accounted for 14% of the variance in the stigma of depressive disorders (F(6, 693) = 20.27, p < .001). Among those, only age, years of education, and experience of mental illness significantly predicted social stigmatization against depression. The standardized regression coefficient of age had a negative association with stigmatization (β = -.20, p < .001), but years of education (β = .20, p < .001) and experience of mental illness (β = .08, p < .05) positively predicted depression stigmatization. Conclusions: The present study clearly demonstrates the association between personal factors and depressive disorder stigmatization. Younger age, more education, and self-stigma appeared to increase the stigmatization. Young, highly educated, and mentally ill people tend to reject patients with depressive disorder as friends, teachers, or babysitters; they also tend to think that those patients have lower intelligence and abilities. These results suggest the possibility that people from a high social class, or highly educated people, who have the power to make decisions, help maintain the social stigma against mental illness patients. To increase the awareness that people from high social classes have more stigmatization against depressive disorders will help decrease the biased attitudes against mentally ill patients.

Keywords: depressive disorder stigmatization, age, education, self-stigma

Procedia PDF Downloads 372
403 Fight against Money Laundering with Optical Character Recognition

Authors: Saikiran Subbagari, Avinash Malladhi

Abstract:

Anti Money Laundering (AML) regulations are designed to prevent money laundering and terrorist financing activities worldwide. Financial institutions around the world are legally obligated to identify, assess and mitigate the risks associated with money laundering and report any suspicious transactions to governing authorities. With increasing volumes of data to analyze, financial institutions seek to automate their AML processes. In the rise of financial crimes, optical character recognition (OCR), in combination with machine learning (ML) algorithms, serves as a crucial tool for automating AML processes by extracting the data from documents and identifying suspicious transactions. In this paper, we examine the utilization of OCR for AML and delve into various OCR techniques employed in AML processes. These techniques encompass template-based, feature-based, neural network-based, natural language processing (NLP), hidden markov models (HMMs), conditional random fields (CRFs), binarizations, pattern matching and stroke width transform (SWT). We evaluate each technique, discussing their strengths and constraints. Also, we emphasize on how OCR can improve the accuracy of customer identity verification by comparing the extracted text with the office of foreign assets control (OFAC) watchlist. We will also discuss how OCR helps to overcome language barriers in AML compliance. We also address the implementation challenges that OCR-based AML systems may face and offer recommendations for financial institutions based on the data from previous research studies, which illustrate the effectiveness of OCR-based AML.

Keywords: anti-money laundering, compliance, financial crimes, fraud detection, machine learning, optical character recognition

Procedia PDF Downloads 108
402 Numerical Simulations of Acoustic Imaging in Hydrodynamic Tunnel with Model Adaptation and Boundary Layer Noise Reduction

Authors: Sylvain Amailland, Jean-Hugh Thomas, Charles Pézerat, Romuald Boucheron, Jean-Claude Pascal

Abstract:

The noise requirements for naval and research vessels have seen an increasing demand for quieter ships in order to fulfil current regulations and to reduce the effects on marine life. Hence, new methods dedicated to the characterization of propeller noise, which is the main source of noise in the far-field, are needed. The study of cavitating propellers in closed-section is interesting for analyzing hydrodynamic performance but could involve significant difficulties for hydroacoustic study, especially due to reverberation and boundary layer noise in the tunnel. The aim of this paper is to present a numerical methodology for the identification of hydroacoustic sources on marine propellers using hydrophone arrays in a large hydrodynamic tunnel. The main difficulties are linked to the reverberation of the tunnel and the boundary layer noise that strongly reduce the signal-to-noise ratio. In this paper it is proposed to estimate the reflection coefficients using an inverse method and some reference transfer functions measured in the tunnel. This approach allows to reduce the uncertainties of the propagation model used in the inverse problem. In order to reduce the boundary layer noise, a cleaning algorithm taking advantage of the low rank and sparse structure of the cross-spectrum matrices of the acoustic and the boundary layer noise is presented. This approach allows to recover the acoustic signal even well under the boundary layer noise. The improvement brought by this method is visible on acoustic maps resulting from beamforming and DAMAS algorithms.

Keywords: acoustic imaging, boundary layer noise denoising, inverse problems, model adaptation

Procedia PDF Downloads 299
401 Development of a Sequential Multimodal Biometric System for Web-Based Physical Access Control into a Security Safe

Authors: Babatunde Olumide Olawale, Oyebode Olumide Oyediran

Abstract:

The security safe is a place or building where classified document and precious items are kept. To prevent unauthorised persons from gaining access to this safe a lot of technologies had been used. But frequent reports of an unauthorised person gaining access into security safes with the aim of removing document and items from the safes are pointers to the fact that there is still security gap in the recent technologies used as access control for the security safe. In this paper we try to solve this problem by developing a multimodal biometric system for physical access control into a security safe using face and voice recognition. The safe is accessed by the combination of face and speech pattern recognition and also in that sequential order. User authentication is achieved through the use of camera/sensor unit and a microphone unit both attached to the door of the safe. The user face was captured by the camera/sensor while the speech was captured by the use of the microphone unit. The Scale Invariance Feature Transform (SIFT) algorithm was used to train images to form templates for the face recognition system while the Mel-Frequency Cepitral Coefficients (MFCC) algorithm was used to train the speech recognition system to recognise authorise user’s speech. Both algorithms were hosted in two separate web based servers and for automatic analysis of our work; our developed system was simulated in a MATLAB environment. The results obtained shows that the developed system was able to give access to authorise users while declining unauthorised person access to the security safe.

Keywords: access control, multimodal biometrics, pattern recognition, security safe

Procedia PDF Downloads 299
400 StockTwits Sentiment Analysis on Stock Price Prediction

Authors: Min Chen, Rubi Gupta

Abstract:

Understanding and predicting stock market movements is a challenging problem. It is believed stock markets are partially driven by public sentiments, which leads to numerous research efforts to predict stock market trend using public sentiments expressed on social media such as Twitter but with limited success. Recently a microblogging website StockTwits is becoming increasingly popular for users to share their discussions and sentiments about stocks and financial market. In this project, we analyze the text content of StockTwits tweets and extract financial sentiment using text featurization and machine learning algorithms. StockTwits tweets are first pre-processed using techniques including stopword removal, special character removal, and case normalization to remove noise. Features are extracted from these preprocessed tweets through text featurization process using bags of words, N-gram models, TF-IDF (term frequency-inverse document frequency), and latent semantic analysis. Machine learning models are then trained to classify the tweets' sentiment as positive (bullish) or negative (bearish). The correlation between the aggregated daily sentiment and daily stock price movement is then investigated using Pearson’s correlation coefficient. Finally, the sentiment information is applied together with time series stock data to predict stock price movement. The experiments on five companies (Apple, Amazon, General Electric, Microsoft, and Target) in a duration of nine months demonstrate the effectiveness of our study in improving the prediction accuracy.

Keywords: machine learning, sentiment analysis, stock price prediction, tweet processing

Procedia PDF Downloads 126
399 Surface-Enhanced Raman Spectroscopy on Gold Nanoparticles in the Kidney Disease

Authors: Leonardo C. Pacheco-Londoño, Nataly J Galan-Freyle, Lisandro Pacheco-Lugo, Antonio Acosta-Hoyos, Elkin Navarro, Gustavo Aroca-Martinez, Karin Rondón-Payares, Alberto C. Espinosa-Garavito, Samuel P. Hernández-Rivera

Abstract:

At the Life Science Research Center at Simon Bolivar University, a primary focus is the diagnosis of various diseases, and the use of gold nanoparticles (Au-NPs) in diverse biomedical applications is continually expanding. In the present study, Au-NPs were employed as substrates for Surface-Enhanced Raman Spectroscopy (SERS) aimed at diagnosing kidney diseases arising from Lupus Nephritis (LN), preeclampsia (PC), and Hypertension (H). Discrimination models were developed for distinguishing patients with and without kidney diseases based on the SERS signals from urine samples by partial least squares-discriminant analysis (PLS-DA). A comparative study of the Raman signals across the three conditions was conducted, leading to the identification of potential metabolite signals. Model performance was assessed through cross-validation and external validation, determining parameters like sensitivity and specificity. Additionally, a secondary analysis was performed using machine learning (ML) models, wherein different ML algorithms were evaluated for their efficiency. Models’ validation was carried out using cross-validation and external validation, and other parameters were determined, such as sensitivity and specificity; the models showed average values of 0.9 for both parameters. Additionally, it is not possible to highlight this collaborative effort involved two university research centers and two healthcare institutions, ensuring ethical treatment and informed consent of patient samples.

Keywords: SERS, Raman, PLS-DA, kidney diseases

Procedia PDF Downloads 14