M-Tahar Kechadi


1 A Heuristics Approach for Fast Detecting Suspicious Money Laundering Cases in an Investment Bank

Authors: Nhien-An Le-Khac, Sammer Markos, M-Tahar Kechadi


Today, money laundering (ML) poses a serious threat not only to financial institutions but also to the nation. This criminal activity is becoming more and more sophisticated and seems to have moved from the cliché of drug trafficking to financing terrorism and surely not forgetting personal gain. Most international financial institutions have been implementing anti-money laundering solutions (AML) to fight investment fraud. However, traditional investigative techniques consume numerous man-hours. Recently, data mining approaches have been developed and are considered as well-suited techniques for detecting ML activities. Within the scope of a collaboration project for the purpose of developing a new solution for the AML Units in an international investment bank, we proposed a data mining-based solution for AML. In this paper, we present a heuristics approach to improve the performance for this solution. We also show some preliminary results associated with this method on analysing transaction datasets.

Keywords: Data Mining, Clustering, Heuristics, anti money laundering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3204


1 Recurrent Neural Networks for Classifying Outliers in Electronic Health Record Clinical Text

Authors: Duncan Wallace, M-Tahar Kechadi


In recent years, Machine Learning (ML) approaches have been successfully applied to an analysis of patient symptom data in the context of disease diagnosis, at least where such data is well codified. However, much of the data present in Electronic Health Records (EHR) are unlikely to prove suitable for classic ML approaches. Furthermore, as scores of data are widely spread across both hospitals and individuals, a decentralized, computationally scalable methodology is a priority. The focus of this paper is to develop a method to predict outliers in an out-of-hours healthcare provision center (OOHC). In particular, our research is based upon the early identification of patients who have underlying conditions which will cause them to repeatedly require medical attention. OOHC act as an ad-hoc delivery of triage and treatment, where interactions occur without recourse to a full medical history of the patient in question. Medical histories, relating to patients contacting an OOHC, may reside in several distinct EHR systems in multiple hospitals or surgeries, which are unavailable to the OOHC in question. As such, although a local solution is optimal for this problem, it follows that the data under investigation is incomplete, heterogeneous, and comprised mostly of noisy textual notes compiled during routine OOHC activities. Through the use of Deep Learning methodologies, the aim of this paper is to provide the means to identify patient cases, upon initial contact, which are likely to relate to such outliers. To this end, we compare the performance of Long Short-Term Memory, Gated Recurrent Units, and combinations of both with Convolutional Neural Networks. A further aim of this paper is to elucidate the discovery of such outliers by examining the exact terms which provide a strong indication of positive and negative case entries. While free-text is the principal data extracted from EHRs for classification, EHRs also contain normalized features. Although the specific demographical features treated within our corpus are relatively limited in scope, we examine whether it is beneficial to include such features among the inputs to our neural network, or whether these features are more successfully exploited in conjunction with a different form of a classifier. In this section, we compare the performance of randomly generated regression trees and support vector machines and determine the extent to which our classification program can be improved upon by using either of these machine learning approaches in conjunction with the output of our Recurrent Neural Network application. The output of our neural network is also used to help determine the most significant lexemes present within the corpus for determining high-risk patients. By combining the confidence of our classification program in relation to lexemes within true positive and true negative cases, with an inverse document frequency of the lexemes related to these cases, we can determine what features act as the primary indicators of frequent-attender and non-frequent-attender cases, providing a human interpretable appreciation of how our program classifies cases.

Keywords: Artificial Neural Networks, Machine Learning, Medical informatics, data-mining

Procedia PDF Downloads 5