Search results for: allele mining
1179 Frequent Item Set Mining for Big Data Using MapReduce Framework
Authors: Tamanna Jethava, Rahul Joshi
Abstract:
Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.Keywords: frequent item set mining, big data, Hadoop, MapReduce
Procedia PDF Downloads 4351178 The Distribution of HLA-B*15:01 and HLA-B*51:01 Alleles in Thai Population: Clinical Implementation and Diagnostic Process of COVID-19 Severity
Authors: Aleena Rena Onozuka, Patompong Satapornpong
Abstract:
Introduction: In a Human Leukocyte Antigen (HLA)’s immune response, HLA alleles (HLA class I and class II) play a crucial role in fighting against pathogens. HLA-B*15:01 allele had a significant association with asymptomatic COVID-19 infection (p-value = 5.67 x 10-5 ; OR = 2.40 and 95% CI = 1.54 - 3.64). There was also a notable linkage between HLA-B*51:01 allele and critically ill patients with COVID-19 (p-value = 0.007 and OR = 3.38). This study has described the distribution of HLA marker alleles in Thais and sub-groups. Objective: We want to investigate the prevalence of HLA-B*15:01 and HLA-B*51:01 alleles in the Thai population. Materials and Methods: 200 healthy Thai population were included in this study from the College of Pharmacy, Rangsit University. HLA-B alleles were genotyped using the sequence-specific oligonucleotides process (PCR-SSOs). Results: We found out that HLA-B*46:01 (12.00%), HLA-B*15:02 (9.25%), HLA-B*40:01 (6.50%), HLA-B*13:01 (6.25%), and HLA-B* 38:02 (5.50%) alleles were more common than other alleles in Thai population. HLA-B*46:01 and HLA-B*15:02 were the most common allele found across four regions. Moreover, the frequency of HLA-B*15:01 and HLA-B*51:01 alleles were similarly distributed in Thai population (0.50, 5.25 %) and (p-value > 0.05), respectively. The frequencies of HLA-B*15:01 and HLA-B*51:01 alleles were not significantly different from other populations compared to the Thai population. Conclusions: We can screen for HLA-B*15:01 and HLA-B*51:01 alleles to determine the symptoms of COVID-19 since they are universal HLA-B markers. Importantly, the database of HLA markers indicates the association between HLA frequency and populations. However, we need further research on larger numbers of COVID-19 patients and in different populations.Keywords: HLA-B*15:01, HLA-B*51:01, COVID-19, HLA-B alleles
Procedia PDF Downloads 1201177 Polymorphism in Myostatin Gene and Its Association with Growth Traits in Kurdi Sheep of Northern Khorasan
Authors: Masoud Alipanah, Sekineh Akbari, Gholamreza Dashab
Abstract:
Myostatin genes or factor 8 affecting on growth and making differentiation works (GDF8) as a moderator in the development of skeletal muscle inhibitor. If mutations occurs in the coding region of myostatin, alter its inhibitory role and the muscle growth is increased. In this study, blood samples were collected randomly from 60 Kurdish sheep in northern Khorasan and DNA extraction was performed using a modified salt. A fragment 337 bp from exon 3 myostatin gene and-specific primers by using a polymerase chain reaction (PCR) were amplified. In order to detect different forms of an allele at this locus HaeΙΙΙ restriction enzymes and PCR-RFLP analysis were used. Band patterns clarification was performed using agarose gel electrophoresis. The frequency of genotypes mm, Mm, and MM, were respectively detected, 0, 0.15 and 0.85. The allele frequency for alleles m and M, were respectively, 0.07 and 0.93. The statistical analyses indicated that m allele was significantly associated with body weight. The results of this study suggest that the Myostatin gene possibly is a candidate gene that affects growth traits in Kurdish sheep.Keywords: GDF8 gene, Kurdi Sheep of Northern Khorasan, polymorphism, weight traits
Procedia PDF Downloads 3401176 Apolipoprotein A1 -75 G to a Substitution and Its Relationship with Serum ApoA1 Levels among Indian Punjabi Population
Authors: Savjot Kaur, Mridula Mahajan, AJS Bhanwer, Santokh Singh, Kawaljit Matharoo
Abstract:
Background: Disorders of lipid metabolism and genetic predisposition are CAD risk factors. ApoA1 is the apolipoprotein component of anti-atherogenic high density lipoprotein (HDL) particles. The protective action of HDL and ApoA1 is attributed to their central role in reverse cholesterol transport (RCT). Aim: This study was aimed at identifying sequence variations in ApoA1 (-75G>A) and its association with serum ApoA1 levels. Methods: A total of 300 CAD patients and 300 Normal individuals (controls) were analyzed. PCR-RFLP method was used to determine the DNA polymorphism in the ApoA1 gene, PCR products digested with restriction enzyme MspI, followed by Agarose Gel Electrophoresis. Serum apolipoprotein A1 concentration was estimated with immunoturbidimetric method. Results: Deviation from Hardy- Weinberg Equilibrium (HWE) was observed for this gene variant. The A- allele frequency was higher among Coronary Artery disease patients (53.8) compared to controls (45.5), p= 0.004, O.R= 1.38(1.11-1.75). Under recessive model analysis (AA vs. GG+GA) AA genotype of ApoA1 G>A substitution conferred ~1 fold increased risk towards CAD susceptibility (p= 0.002, OR= 1.72(1.2-2.43). With serum ApoA1 levels < 107 A allele frequency was higher among CAD cases (50) as compared to controls (43.4) [p=0.23, OR= 1.2(0.84-2)] and there was zero % occurrence of A allele frequency in individuals with ApoA1 levels > 177. Conclusion: Serum ApoA1 levels were associated with ApoA1 promoter region variation and influence CAD risk. The individuals with the APOA1 -75 A allele confer excess hazard of developing CAD as a result of its effect on low serum concentrations of ApoA1.Keywords: apolipoprotein A1 (G>A) gene polymorphism, coronary artery disease (CAD), reverse cholesterol transport (RCT)
Procedia PDF Downloads 3151175 Genetics of Atopic Dermatitis: Role of Cytokine Genes Polymorphisms
Authors: Ghaleb Bin Huraib
Abstract:
Atopic dermatitis (AD), also known as atopic eczema, is a chronic inflammatory skin disease characterized by severe itching and recurrent, relapsing eczema-like skin lesions, affecting up to 20% of children and 10% of adults in industrialized countries. AD is a complex multifactorial disease, and its exact etiology and pathogenesis have not been fully elucidated. The aim of this study was to investigate the impact of gene polymorphisms of T helper cell subtype Th1 and Th2 cytokines, interferon-gamma (IFN-γ), interleukin-6 (IL-6) and transforming growth factor (TGF)-β1on AD susceptibility in a Saudi cohort. One hundred four unrelated patients with AD and 195 healthy controls were genotyped for IFN-γ (874A/T), IL-6 (174G/C) and TGF-β1 (509C/T) polymorphisms using ARMS-PCR and PCR-RFLP technique. The frequency of genotypes AA and AT of IFN-γ (874A/T) differed significantly among patients and controls (P 0.001). The genotype AT was increased while genotype AA was decreased in AD patients as compared to controls. AD patients also had a higher frequency of T-containing genotypes (AT+TT) than controls (P = 0.001). The frequencies of alleles T and A were statistically different in patients and controls (P = 0.04). The frequencies of genotype GG and allele G of IL-6 (174G/C) were significantly higher, while genotype GC and allele C were lower in AD patients than in controls. There was no significant difference in the frequencies of alleles and genotypes of TGF-β1 (509C/T) polymorphism between the patient and control groups. These results showed that susceptibility to AD is influenced by the presence or absence of genotypes of IFN-γ (874A/T) and IL-6 (174G/C) polymorphisms. It is concluded T-allele and T-containing genotypes (AT+TT) of IFN-γ (874A/T) and G-allele and GG genotype ofIL-6 (174G/C) polymorphisms are susceptible to AD in Saudis. On the other hand, the TGF-β1 (509C/T) polymorphism may not be associated with AD risk in our population; however, further studies with large sample sizes are required to confirm these results.Keywords: atopic dermatitis, Polymorphism, Interferon, IL-6
Procedia PDF Downloads 661174 Review of Different Machine Learning Algorithms
Authors: Syed Romat Ali Shah, Bilal Shoaib, Saleem Akhtar, Munib Ahmad, Shahan Sadiqui
Abstract:
Classification is a data mining technique, which is recognizedon Machine Learning (ML) algorithm. It is used to classifythe individual articlein a knownofinformation into a set of predefinemodules or group. Web mining is also a portion of that sympathetic of data mining methods. The main purpose of this paper to analysis and compare the performance of Naïve Bayse Algorithm, Decision Tree, K-Nearest Neighbor (KNN), Artificial Neural Network (ANN)and Support Vector Machine (SVM). This paper consists of different ML algorithm and their advantages and disadvantages and also define research issues.Keywords: Data Mining, Web Mining, classification, ML Algorithms
Procedia PDF Downloads 3031173 Object-Centric Process Mining Using Process Cubes
Authors: Anahita Farhang Ghahfarokhi, Alessandro Berti, Wil M.P. van der Aalst
Abstract:
Process mining provides ways to analyze business processes. Common process mining techniques consider the process as a whole. However, in real-life business processes, different behaviors exist that make the overall process too complex to interpret. Process comparison is a branch of process mining that isolates different behaviors of the process from each other by using process cubes. Process cubes organize event data using different dimensions. Each cell contains a set of events that can be used as an input to apply process mining techniques. Existing work on process cubes assume single case notions. However, in real processes, several case notions (e.g., order, item, package, etc.) are intertwined. Object-centric process mining is a new branch of process mining addressing multiple case notions in a process. To make a bridge between object-centric process mining and process comparison, we propose a process cube framework, which supports process cube operations such as slice and dice on object-centric event logs. To facilitate the comparison, the framework is integrated with several object-centric process discovery approaches.Keywords: multidimensional process mining, mMulti-perspective business processes, OLAP, process cubes, process discovery, process mining
Procedia PDF Downloads 2551172 Association of Lipoprotein Lipase Gene (HindIII rs320) Polymorphisms with Moderate Hypertriglyceridemia Secondary to Metabolic Syndrome
Authors: Meryem Abi-Ayad, Biagio Arcidiacono, Eusebio Chiefari, Daniela Foti, Mohamed Benyoucef, Antonio Brunetti
Abstract:
Lipoprotein Lipase (LPL) is a key enzyme for lipid metabolism; its genetic polymorphism can be a candidate for modulating lipids parameters in metabolic syndrome. The objective of the present study was to determine whether lipoproteins lipase polymorphisMetS (LPL-HindIII) could be associated with moderate hypertriglyceridemia (secondary to metabolism syndrome). The polymorphism Hind III (rs320) was assessed by PCR-RFLP in 51 MetS patients and 17 healthy controls from the hospital in Tlemcen. The logistic regression analyses showed no significant association with Hind III genotype and hypertriglyceridemia (TG ≥ 1,5g/l or TG lower treatment) (P=0,455), metabolic syndrome (P=0,455), hypertension (P=0,802) and type 2 diabetes (P=0,144). In terms of plasma biomarkers, although not statistically significant, there was a difference in TG levels (P > 0,05), which was lowest among carriers of the homogenous mutant allele (H-). In this study, there was no association between the rare allele (H-) and disease protection, and between the frequent allele (H+) and disease prevalence (hypertriglyceridemia, metabolic syndrome, hypertension, type 2 diabetes).Keywords: moderate secondary hypertriglyceridemia, metabolic syndrome, lipids, polymorphism lipoprotein lipase, HindIII(rs320)
Procedia PDF Downloads 3211171 Algorithms used in Spatial Data Mining GIS
Authors: Vahid Bairami Rad
Abstract:
Extracting knowledge from spatial data like GIS data is important to reduce the data and extract information. Therefore, the development of new techniques and tools that support the human in transforming data into useful knowledge has been the focus of the relatively new and interdisciplinary research area ‘knowledge discovery in databases’. Thus, we introduce a set of database primitives or basic operations for spatial data mining which are sufficient to express most of the spatial data mining algorithms from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new data mining algorithms and will also make them more portable. We introduced a database-oriented framework for spatial data mining which is based on the concepts of neighborhood graphs and paths. A small set of basic operations on these graphs and paths were defined as database primitives for spatial data mining. Furthermore, techniques to efficiently support the database primitives by a commercial DBMS were presented.Keywords: spatial data base, knowledge discovery database, data mining, spatial relationship, predictive data mining
Procedia PDF Downloads 4601170 Impact of HLA-C*03:04 Allele Frequency Screening Test in Preventing Dapsone-induced SCARs in Thais
Authors: Pear-Rarin Leelakunakorn, Patompong Satapornpong
Abstract:
Introduction: Dapsone is an anti-inflammatory and antibiotic drug that was widely used for the treatment of leprosy, acne fulminans, and dermatitis herpetiformis (DH). However, dapsone is the main cause that triggers severe cutaneous adverse reactions (SCARs), with a possibility of 0.4 to 3.6% of patients after initiating treatment. In fact, the mortality rate of dapsone-induced SCARs is approximately 9.9%. In previous studies, HLA-B*13:01 was strongly associated with dapsone-induced SCARs in Han Chinese, Thais, and Koreans. Nevertheless, the distribution of HLA-B*13:01 marker in each population might differ. Moreover, there were found that the association between HLA-C*03:04 and dapsone hypersensitivity syndrome in Han Chinese leprosy patients by OR = 9.00 and p-value = 2.23×10⁻¹⁹. Objective: The aim of this study was to investigate the distribution of HLA-C* 03:04 in Thailand's healthy population. Method: A total of 350 participants were HLA-C genotyping used sequence-specific oligonucleotides (PCR-SSOs). This study was approved by the Ethics Committee of Rangsit University Result : The most frequency of HLA -C alleles in Thais, consist of HLA -C* 01:02 (17.00 %), -C*08:01 (11.00%) , -C*07:02 (10.70%) , -C* 03:04 ( 9.10%) , -C* 03:02 (8.00%) , -C* 07:01 (6.30%), -C* 07:04 (4.60%), -C* 04:01 (4.40%) ,-C* 12:02 ( 4.30% ) ,and -C* 04:03(3.90%). Interestingly, HLA -C* 03:04 allele was similar to the distribution among Thais and other populations such as Eastern Europe (6.09%), Vietnam (7.42% ), East Croatia (2.25%), and Han Chinese (11.70%). Conclusion: Consequently, HLA-C*03:04 might serve as a pharmacogenetic marker for screening prior to initiation therapy with dapsone for prevention of dapsone-induced SCARs in Thai population.Keywords: HLA-C*03:04, SCARs, thai population, allele frequency
Procedia PDF Downloads 1291169 Data Mining Practices: Practical Studies on the Telecommunication Companies in Jordan
Authors: Dina Ahmad Alkhodary
Abstract:
This study aimed to investigate the practices of Data Mining on the telecommunication companies in Jordan, from the viewpoint of the respondents. In order to achieve the goal of the study, and test the validity of hypotheses, the researcher has designed a questionnaire to collect data from managers and staff members from main department in the researched companies. The results shows improvements stages of the telecommunications companies towered Data Mining.Keywords: data, mining, development, business
Procedia PDF Downloads 4971168 Healthcare Data Mining Innovations
Authors: Eugenia Jilinguirian
Abstract:
In the healthcare industry, data mining is essential since it transforms the field by collecting useful data from large datasets. Data mining is the process of applying advanced analytical methods to large patient records and medical histories in order to identify patterns, correlations, and trends. Healthcare professionals can improve diagnosis accuracy, uncover hidden linkages, and predict disease outcomes by carefully examining these statistics. Additionally, data mining supports personalized medicine by personalizing treatment according to the unique attributes of each patient. This proactive strategy helps allocate resources more efficiently, enhances patient care, and streamlines operations. However, to effectively apply data mining, however, and ensure the use of private healthcare information, issues like data privacy and security must be carefully considered. Data mining continues to be vital for searching for more effective, efficient, and individualized healthcare solutions as technology evolves.Keywords: data mining, healthcare, big data, individualised healthcare, healthcare solutions, database
Procedia PDF Downloads 661167 Association of Genetic Variants of Apolipoprotein A5 Gene with the Metabolic Syndrome in the Pakistani Population
Authors: Muhammad Fiaz, Muhammad Saqlain, Bernard M. Y. Cheung, S. M. Saqlan Naqvi, Ghazala Kaukab Raja
Abstract:
Background: Association of C allele of rs662799 SNP of APOA5 gene with metabolic syndrome (MetS) has been reported in different populations around the world. A case control study was conducted to explore the relationship of rs662799 variants (T/C) with the MetS and the associated risk phenotypes in a population of Pakistani origin. Methods: MetS was defined according to the IDF criteria. Blood samples were collected from the Pakistan Institute of Medical Sciences, Islamabad, Pakistan for biochemical profiling and DNA extraction. Genotyping of rs662799 was performed using mass ARRAY, iPEX Gold technology. A total of 712 unrelated case and control subjects were genotyped. Data were analyzed using Plink software and SPSS 16.0. Results: The risk allele C of rs662799 showed highly significant association with MetS (OR=1.5, Ρ=0.002). Among risk phenotypes, dyslipidemia, and obesity showed strong association with SNP (OR=1.49, p=0.03; OR =1.46, p=0.01) respectively in models adjusted for age and gender. Conclusion: The rs662799C allele is a significant risk marker for MetS in the local Pakistani population studied. The effect of the SNP is more on dyslipidemia than the other components of the MetS.Keywords: metabolic syndrome, APOA5, rs662799, dyslipidemia, obesity
Procedia PDF Downloads 5031166 Frequent Itemset Mining Using Rough-Sets
Authors: Usman Qamar, Younus Javed
Abstract:
Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and rough-sets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.Keywords: rough-sets, classification, feature selection, entropy, outliers, frequent itemset mining
Procedia PDF Downloads 4371165 A Modular Framework for Enabling Analysis for Educators with Different Levels of Data Mining Skills
Authors: Kyle De Freitas, Margaret Bernard
Abstract:
Enabling data mining analysis among a wider audience of educators is an active area of research within the educational data mining (EDM) community. The paper proposes a framework for developing an environment that caters for educators who have little technical data mining skills as well as for more advanced users with some data mining expertise. This framework architecture was developed through the review of the strengths and weaknesses of existing models in the literature. The proposed framework provides a modular architecture for future researchers to focus on the development of specific areas within the EDM process. Finally, the paper also highlights a strategy of enabling analysis through either the use of predefined questions or a guided data mining process and highlights how the developed questions and analysis conducted can be reused and extended over time.Keywords: educational data mining, learning management system, learning analytics, EDM framework
Procedia PDF Downloads 3261164 Important role of HLA-B*58:01 Allele and Distribution Among Healthy Thais: Avoid Severe Cutaneous Adverse Reactions
Authors: Jaomai Tungsiripat, Patompong Satapornpong
Abstract:
Allopurinol have been used to treat diseases that relating with the reduction of uric acid and be a treatment preventing the severity of, including gout, chronic kidney disease, chronic heart failure, and diabetes mellitus (type 2). However, allopurinol metabolites can cause a severe cutaneous adverse reaction (SCARs) consist of Drug Rash with Eosinophilia and Systemic Symptoms (DRESS) and Stevens-Johnson Syndrome(SJS)/Toxic Epidermal Necrolysis (TEN). Previous studies, we found only HLA-B*58:01 allele has a strongly association with allopurinol-induced SCARs in many populations: Han Chinese [P value = 4.7 x 10−24], European [P value <10−6], and Thai [P value <0.001].However, there was no update the frequency of HLA-B alleles and pharmacogenetics markers distribution in healthy Thais and support for screening before the initiation of treatment. The aim of this study was to investigate the prevalence of HLA-B*58:01 allele associated with allopurinol-induced SCARs in healthy Thai population. A retrospective study of 260 individual healthy subjects who living in Thailand. HLA-B were genotyped using sequence-specific oligonucleotides (PCR-SSOs).In this study, we identified the prevalence of HLA-B alleles consist ofHLA-B*46:01 (12.69%), HLA-B*15:02 (8.85%), HLA-B*13:01 (6.35%), HLA-B*40:01 (6.35%), HLA-B*38:02 (5.00%), HLA-B*51:01 (5.00%), HLA-B*58:01 (4.81%), HLA-B*44:03 (4.62%), HLA-B*18:01 (3.85%) and HLA-B*15:25 (3.08%). Therefore, the distribution of HLA-B*58:01 will support the clinical implementation and screening usage of allopurinol in Thai population.Keywords: allopurinol, HLA-B*58: 01, Thai population, SCARs
Procedia PDF Downloads 1401163 SIRT1 Gene Polymorphisms and Its Protein Level in Colorectal Cancer
Authors: Olfat Shaker, Miriam Wadie, Reham Ali, Ayman Yosry
Abstract:
Colorectal cancer (CRC) is a major cause of mortality and morbidity and accounts for over 9% of cancer incidence worldwide. Silent information regulator 2 homolog 1 (SIRT1) gene is located in the nucleus and exert its effects via modulation of histone and non-histone targets. They function in the cell via histone deacetylase (HDAC) and/or adenosine diphosphate ribosyl transferase (ADPRT) enzymatic activity. The aim of this work was to study the relationship between SIRT1 polymorphism and its protein level in colorectal cancer patients in comparison to control cases. This study includes 2 groups: thirty healthy subjects (control group) & one hundred CRC patients. All subjects were subjected to: SIRT-1 serum level was measured by ELISA and gene polymorphisms of rs12778366, rs375891 and rs3740051 were detected by real time PCR. For CRC patients clinical data were collected (size, site of tumor as well as its grading, obesity) CRC patients showed high significant increase in the mean level of serum SIRT-1 compared to control group (P<0.001). Mean serum level of SIRT-1 showed high significant increase in patients with tumor size ≥5 compared to the size < 5 cm (P<0.05). In CRC patients, percentage of T allele of rs12778366 was significantly lower than controls, CC genotype and C allele C of rs 375891 were significantly higher than control group. In CRC patients, the CC genotype of rs12778366, was 75% in rectosigmoid and 25% in cecum & ascending colon. According to tumor size, the percentage of CC genotype was 87.5% in tumor size ≥5 cm. Conclusion: serum level of SIRT-1 and T allele, C allele of rs12778366 and rs 375891 respectively can be used as diagnostic markers for CRC patients.Keywords: CRC, SIRT1, polymorphisms, ELISA
Procedia PDF Downloads 2181162 Assessment of Prevalent Diseases Caused by Mining Activities in the Northern Part of Mindanao Island, Philippines
Authors: Odinah Cuartero-Enteria, Kyla Rita Mercado, Jason Salamanes, Aian Pecasales, Sherwin Sabado
Abstract:
The northern part of Mindanao Island, Philippines has sizable reserve of mineral resources. Years ago, mining activities have been flourishing which resulted to both local economic gain but with environmental concerns. This study investigates the prevalent diseases by mining activities in these areas. The study was done using the secondary data gathered from the Rural Health Units (RHU) of the selected areas. The study further determined the prevalent diseases that existed in the three areas from years 2005, 2010 and 2015 indicating before the mining activities and when mining activities are present. The results show that areas which are far from mining activities have fewer cases of patients suffering from air-borne diseases. The top ten most common diseases such as pneumonia, tuberculosis, influenza, upper respiratory tract infection (URTI) and skin diseases were caused by air-borne due to air pollution. Hence, the places where mining activities are present contribute to the prevalent diseases. Thus, addressing the air pollution caused by mining activities is very important.Keywords: Philippines, Mindanao Island, mining activities, pollution, prevalent diseases
Procedia PDF Downloads 4731161 Cloud Computing in Data Mining: A Technical Survey
Authors: Ghaemi Reza, Abdollahi Hamid, Dashti Elham
Abstract:
Cloud computing poses a diversity of challenges in data mining operation arising out of the dynamic structure of data distribution as against the use of typical database scenarios in conventional architecture. Due to immense number of users seeking data on daily basis, there is a serious security concerns to cloud providers as well as data providers who put their data on the cloud computing environment. Big data analytics use compute intensive data mining algorithms (Hidden markov, MapReduce parallel programming, Mahot Project, Hadoop distributed file system, K-Means and KMediod, Apriori) that require efficient high performance processors to produce timely results. Data mining algorithms to solve or optimize the model parameters. The challenges that operation has to encounter is the successful transactions to be established with the existing virtual machine environment and the databases to be kept under the control. Several factors have led to the distributed data mining from normal or centralized mining. The approach is as a SaaS which uses multi-agent systems for implementing the different tasks of system. There are still some problems of data mining based on cloud computing, including design and selection of data mining algorithms.Keywords: cloud computing, data mining, computing models, cloud services
Procedia PDF Downloads 4791160 Mining Diagnostic Investigation Process
Authors: Sohail Imran, Tariq Mahmood
Abstract:
In complex healthcare diagnostic investigation process, medical practitioners have to focus on ways to standardize their processes to perform high quality care and optimize the time and costs. Process mining techniques can be applied to extract process related knowledge from data without considering causal and dynamic dependencies in business domain and processes. The application of process mining is effective in diagnostic investigation. It is very helpful where a treatment gives no dispositive evidence favoring it. In this paper, we applied process mining to discover important process flow of diagnostic investigation for hepatitis patients. This approach has some benefits which can enhance the quality and efficiency of diagnostic investigation processes.Keywords: process mining, healthcare, diagnostic investigation process, process flow
Procedia PDF Downloads 5231159 Analysis of Reliability of Mining Shovel Using Weibull Model
Authors: Anurag Savarnya
Abstract:
The reliability of the various parts of electric mining shovel has been assessed through the application of Weibull Model. The study was initiated to find reliability of components of electric mining shovel. The paper aims to optimize the reliability of components and increase the life cycle of component. A multilevel decomposition of the electric mining shovel was done and maintenance records were used to evaluate the failure data and appropriate system characterization was done to model the system in terms of reasonable number of components. The approach used develops a mathematical model to assess the reliability of the electric mining shovel components. The model can be used to predict reliability of components of the hydraulic mining shovel and system performance. Reliability is an inherent attribute to a system. When the life-cycle costs of a system are being analyzed, reliability plays an important role as a major driver of these costs and has considerable influence on system performance. It is an iterative process that begins with specification of reliability goals consistent with cost and performance objectives. The data were collected from an Indian open cast coal mine and the reliability of various components of the electric mining shovel has been assessed by following a Weibull Model.Keywords: reliability, Weibull model, electric mining shovel
Procedia PDF Downloads 5131158 An Adaptive Distributed Incremental Association Rule Mining System
Authors: Adewale O. Ogunde, Olusegun Folorunso, Adesina S. Sodiya
Abstract:
Most existing Distributed Association Rule Mining (DARM) systems are still facing several challenges. One of such challenges that have not received the attention of many researchers is the inability of existing systems to adapt to constantly changing databases and mining environments. In this work, an Adaptive Incremental Mining Algorithm (AIMA) is therefore proposed to address these problems. AIMA employed multiple mobile agents for the entire mining process. AIMA was designed to adapt to changes in the distributed databases by mining only the incremental database updates and using this to update the existing rules in order to improve the overall response time of the DARM system. In AIMA, global association rules were integrated incrementally from one data site to another through Results Integration Coordinating Agents. The mining agents in AIMA were made adaptive by defining mining goals with reasoning and behavioral capabilities and protocols that enabled them to either maintain or change their goals. AIMA employed Java Agent Development Environment Extension for designing the internal agents’ architecture. Results from experiments conducted on real datasets showed that the adaptive system, AIMA performed better than the non-adaptive systems with lower communication costs and higher task completion rates.Keywords: adaptivity, data mining, distributed association rule mining, incremental mining, mobile agents
Procedia PDF Downloads 3931157 CAG Repeat Polymorphism of Androgen Receptor and Female Sexual Functions in Egyptian Female Population
Authors: Azza Gaber Farag, Yasser Atta Shehata, Sara Elsayed Elghazouly, Mustafa Elsayed Elshaib, Nesreen Gamal Elden Elhelbawy
Abstract:
Background: Androgen receptor (AR) polymorphism in cytosine adenineguanine (CAG) repeat has an effect on the functional capacity of AR in males. However, little researches in this field are available regarding female sexual function. Aim: To investigate the possible link between polymorphism in the CAG repeat of AR gene and female sexual function in a sample of the Egyptian population. Materials and methods: 500 Egyptian married females completed a questionnaire regarding sociodemographic, reproductive, and sexual data. AR CAG repeat length was analyzed for those having female sexual dysfunctions (FSD) using real-time PCR. Results: The most sensitive domain to AR CAG repeat length was the orgasm domain that showed significant positive correlations with short allele (p=0.001), long allele (p=.015), biallellic mean (p=.000), and X weighted biallelic mean (p=.000). The satisfaction domain had significant positive correlations with the biallelic mean (p=.035), and the X weighted biallelic mean (p=. 032). However, the pain domain was of significant negative correlations with AR polymorphism of short allele (p=.002), biallelic mean (p=.013), and X weighted biallelic mean (p = . 011). Conclusions: AR polymorphism could represent a non-negligible aspect in female sexual function. The lower AR CAG repeat polymorphism was of significant impact on FSD, affecting mainly female orgasm followed by pain disorders that finally reflected On her sexual satisfaction.Keywords: female sexual dysfunction, androgen receptor, CAG repeat polymorphism, androgen
Procedia PDF Downloads 1821156 Data Stream Association Rule Mining with Cloud Computing
Authors: B. Suraj Aravind, M. H. M. Krishna Prasad
Abstract:
There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring, web click streams analysis, sensor data, data from satellites etc. Data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper proposes to introduce an improved data stream association rule mining algorithm by eliminating the limitation of resources. For this, the concept of cloud computing is used. Inclusion of this may lead to additional unknown problems which needs further research.Keywords: data stream, association rule mining, cloud computing, frequent itemsets
Procedia PDF Downloads 5011155 Distribution of HLA-DQA1 and HLA-DQB1 Alleles in Thais: Genetics Database Insight for COVID-19 Severity
Authors: Jinu Phonamontham
Abstract:
Coronavirus, also referred to as COVID-19, is a virus caused by the SARS-Cov-2 virus. The pandemic has caused over 10 million cases and 500,000 deaths worldwide through the end of June 2020. In a previous study, HLA-DQA1*01:02 allele was associated with COVID-19 disease (p-value = 0.0121). Furthermore, there was a statistical significance between HLA- DQB1*06:02 and COVID-19 in the Italian population by Bonferroni’s correction (p-value = 0.0016). Nevertheless, there is no data describing the distribution of HLA alleles as a valid marker for prediction of COVID-19 in the Thai population. We want to investigate the prevalence of HLA-DQA1*01:02 and HLA-DQB1*06:02 alleles that are associated with severe COVID-19 in the Thai population. In this study, we recruited 200 healthy Thai individuals. Genomic DNA samples were isolated from EDTA blood using Genomic DNA Mini Kit. HLA genotyping was conducted using the Lifecodes HLA SSO typing kits (Immucor, West Avenue, Stamford, USA). The frequency of HLA-DQA1 alleles in Thai population, consisting of HLA-DQA1*01:01 (27.75%), HLA-DQA1*01:02 (24.50%), HLA-DQA1*03:03 (13.00%), HLA-DQA1*06:01 (10.25%) and HLA-DQA1*02:01 (6.75%). Furthermore, the distributions of HLA-DQB1 alleles were HLA-DQB1*05:02 (21.50%), HLA-DQB1*03:01 (15.75%), HLA-DQB1*05:01 (14.50%), HLA-DQB1*03:03 (11.00%) and HLA-DQB1*02:02 (8.25%). Particularly, HLA- DQA1*01:02 (29.00%) allele was the highest frequency in the NorthEast group, but there was not significant difference when compared with the other regions in Thais (p-value = 0.4202). HLA-DQB1*06:02 allele was similarly distributed in Thai population and there was no significant difference between Thais and China (3.8%) and South Korea (6.4%) and Japan (8.2%) with p-value > 0.05. Whereas, South Africa (15.7%) has a significance with Thais by p-value of 0.0013. This study supports the specific genotyping of the HLA-DQA1*01:02 and HLA-DQB1*06:02 alleles to screen severe COVID-19 in Thai and many populations.Keywords: HLA-DQA1*01:02, HLA-DQB1*06:02, Asian, Thai population
Procedia PDF Downloads 991154 Data Mining As A Tool For Knowledge Management: A Review
Authors: Maram Saleh
Abstract:
Knowledge has become an essential resource in today’s economy and become the most important asset of maintaining competition advantage in organizations. The importance of knowledge has made organizations to manage their knowledge assets and resources through all multiple knowledge management stages such as: Knowledge Creation, knowledge storage, knowledge sharing and knowledge use. Researches on data mining are continues growing over recent years on both business and educational fields. Data mining is one of the most important steps of the knowledge discovery in databases process aiming to extract implicit, unknown but useful knowledge and it is considered as significant subfield in knowledge management. Data miming have the great potential to help organizations to focus on extracting the most important information on their data warehouses. Data mining tools and techniques can predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. This review paper explores the applications of data mining techniques in supporting knowledge management process as an effective knowledge discovery technique. In this paper, we identify the relationship between data mining and knowledge management, and then focus on introducing some application of date mining techniques in knowledge management for some real life domains.Keywords: Data Mining, Knowledge management, Knowledge discovery, Knowledge creation.
Procedia PDF Downloads 2081153 Indexing and Incremental Approach Using Map Reduce Bipartite Graph (MRBG) for Mining Evolving Big Data
Authors: Adarsh Shroff
Abstract:
Big data is a collection of dataset so large and complex that it becomes difficult to process using data base management tools. To perform operations like search, analysis, visualization on big data by using data mining; which is the process of extraction of patterns or knowledge from large data set. In recent years, the data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. This project uses i2MapReduce, an incremental processing extension to Map Reduce, the most widely used framework for mining big data. I2MapReduce performs key-value pair level incremental processing rather than task level re-computation, supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. To optimize the mining results, evaluate i2MapReduce using a one-step algorithm and three iterative algorithms with diverse computation characteristics for efficient mining.Keywords: big data, map reduce, incremental processing, iterative computation
Procedia PDF Downloads 3501152 Reviewing Privacy Preserving Distributed Data Mining
Authors: Sajjad Baghernezhad, Saeideh Baghernezhad
Abstract:
Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.Keywords: data mining, distributed data mining, privacy protection, privacy preserving
Procedia PDF Downloads 5251151 Review and Comparison of Associative Classification Data Mining Approaches
Authors: Suzan Wedyan
Abstract:
Data mining is one of the main phases in the Knowledge Discovery Database (KDD) which is responsible of finding hidden and useful knowledge from databases. There are many different tasks for data mining including regression, pattern recognition, clustering, classification, and association rule. In recent years a promising data mining approach called associative classification (AC) has been proposed, AC integrates classification and association rule discovery to build classification models (classifiers). This paper surveys and critically compares several AC algorithms with reference of the different procedures are used in each algorithm, such as rule learning, rule sorting, rule pruning, classifier building, and class allocation for test cases.Keywords: associative classification, classification, data mining, learning, rule ranking, rule pruning, prediction
Procedia PDF Downloads 5371150 Data Mining Techniques for Anti-Money Laundering
Authors: M. Sai Veerendra
Abstract:
Today, money laundering (ML) poses a serious threat not only to financial institutions but also to the nation. This criminal activity is becoming more and more sophisticated and seems to have moved from the cliché of drug trafficking to financing terrorism and surely not forgetting personal gain. Most of the financial institutions internationally have been implementing anti-money laundering solutions (AML) to fight investment fraud activities. However, traditional investigative techniques consume numerous man-hours. Recently, data mining approaches have been developed and are considered as well-suited techniques for detecting ML activities. Within the scope of a collaboration project on developing a new data mining solution for AML Units in an international investment bank in Ireland, we survey recent data mining approaches for AML. In this paper, we present not only these approaches but also give an overview on the important factors in building data mining solutions for AML activities.Keywords: data mining, clustering, money laundering, anti-money laundering solutions
Procedia PDF Downloads 537