Search results for: mining overburden
340 A Recognition Method for Spatio-Temporal Background in Korean Historical Novels
Authors: Seo-Hee Kim, Kee-Won Kim, Seung-Hoon Kim
Abstract:
The most important elements of a novel are the characters, events and background. The background represents the time, place and situation that character appears, and conveys event and atmosphere more realistically. If readers have the proper knowledge about background of novels, it may be helpful for understanding the atmosphere of a novel and choosing a novel that readers want to read. In this paper, we are targeting Korean historical novels because spatio-temporal background especially performs an important role in historical novels among the genre of Korean novels. To the best of our knowledge, we could not find previous study that was aimed at Korean novels. In this paper, we build a Korean historical national dictionary. Our dictionary has historical places and temple names of kings over many generations as well as currently existing spatial words or temporal words in Korean history. We also present a method for recognizing spatio-temporal background based on patterns of phrasal words in Korean sentences. Our rules utilize postposition for spatial background recognition and temple names for temporal background recognition. The knowledge of the recognized background can help readers to understand the flow of events and atmosphere, and can use to visualize the elements of novels.Keywords: data mining, Korean historical novels, Korean linguistic feature, spatio-temporal background
Procedia PDF Downloads 277339 Mass Production of Endemic Diatoms in Polk County, Florida Concomitant with Biofuel Extraction
Authors: Melba D. Horton
Abstract:
Algae are identified as an alternative source of biofuel because of their ubiquitous distribution in aquatic environments. Diatoms are unique forms of algae characterized by silicified cell walls which have gained prominence in various technological applications. Polk County is home to a multitude of ponds and lakes but has not been explored for the presence of diatoms. Considering the condition of the waters brought about by predominant phosphate mining activities in the area, this research was conducted to determine if endemic diatoms are present and explore their potential for low-cost mass production. Using custom-built photobioreactors, water samples from various lakes provided by the Polk County Parks and Recreation and from nearby ponds were used as the source of diatoms together with other algae obtained during collection. Results of the initial culture cycles were successful, but later an overgrowth of other algae crashed the diatom population. Experiments were conducted in the laboratory to tease out some factors possibly contributing to the die-off. Generally, the total biomass declines after two culture cycles and the causative factors need further investigation. The lipid yield is minimum; however, the high frustule production after die-off adds value to the overall benefit of the harvest.Keywords: diatoms, algae, biofuel, lipid, photobioreactor, frustule
Procedia PDF Downloads 188338 DNpro: A Deep Learning Network Approach to Predicting Protein Stability Changes Induced by Single-Site Mutations
Authors: Xiao Zhou, Jianlin Cheng
Abstract:
A single amino acid mutation can have a significant impact on the stability of protein structure. Thus, the prediction of protein stability change induced by single site mutations is critical and useful for studying protein function and structure. Here, we presented a deep learning network with the dropout technique for predicting protein stability changes upon single amino acid substitution. While using only protein sequence as input, the overall prediction accuracy of the method on a standard benchmark is >85%, which is higher than existing sequence-based methods and is comparable to the methods that use not only protein sequence but also tertiary structure, pH value and temperature. The results demonstrate that deep learning is a promising technique for protein stability prediction. The good performance of this sequence-based method makes it a valuable tool for predicting the impact of mutations on most proteins whose experimental structures are not available. Both the downloadable software package and the user-friendly web server (DNpro) that implement the method for predicting protein stability changes induced by amino acid mutations are freely available for the community to use.Keywords: bioinformatics, deep learning, protein stability prediction, biological data mining
Procedia PDF Downloads 467337 Recommender System Based on Mining Graph Databases for Data-Intensive Applications
Authors: Mostafa Gamal, Hoda K. Mohamed, Islam El-Maddah, Ali Hamdi
Abstract:
In recent years, many digital documents on the web have been created due to the rapid growth of ’social applications’ communities or ’Data-intensive applications’. The evolution of online-based multimedia data poses new challenges in storing and querying large amounts of data for online recommender systems. Graph data models have been shown to be more efficient than relational data models for processing complex data. This paper will explain the key differences between graph and relational databases, their strengths and weaknesses, and why using graph databases is the best technology for building a realtime recommendation system. Also, The paper will discuss several similarity metrics algorithms that can be used to compute a similarity score of pairs of nodes based on their neighbourhoods or their properties. Finally, the paper will discover how NLP strategies offer the premise to improve the accuracy and coverage of realtime recommendations by extracting the information from the stored unstructured knowledge, which makes up the bulk of the world’s data to enrich the graph database with this information. As the size and number of data items are increasing rapidly, the proposed system should meet current and future needs.Keywords: graph databases, NLP, recommendation systems, similarity metrics
Procedia PDF Downloads 104336 Thai Perception on Litecoin Value
Authors: Toby Gibbs, Suwaree Yordchim
Abstract:
This research analyzes factors affecting the success of Litecoin Value within Thailand and develops a guideline for self-reliance for effective business implementation. Samples in this study included 119 people through surveys. The results revealed four main factors affecting the success as follows: 1) Future Career training should be pursued in applied Litecoin development. 2) Didn't grasp the concept of a digital currency or see the benefit of a digital currency. 3) There is a great need to educate the next generation of learners on the benefits of Litecoin within the community. 4) A great majority didn't know what Litecoin was. The guideline for self-reliance planning consisted of 4 aspects: 1) Development planning: by arranging meet up groups to conduct further education on Litecoin and share solutions on adoption into every day usage. Local communities need to develop awareness of the usefulness of Litecoin and share the value of Litecoin among friends and family. 2) Computer Science and Business Management staff should develop skills to expand on the benefits of Litecoin within their departments. 3) Further research should be pursued on how Litecoin Value can improve business and tourism within Thailand. 4) Local communities should focus on developing Litecoin awareness by encouraging street vendors to accept Litecoin as another form of payment for services rendered.Keywords: litecoin, mining, confirmations, payment method
Procedia PDF Downloads 184335 Explainable Graph Attention Networks
Authors: David Pham, Yongfeng Zhang
Abstract:
Graphs are an important structure for data storage and computation. Recent years have seen the success of deep learning on graphs such as Graph Neural Networks (GNN) on various data mining and machine learning tasks. However, most of the deep learning models on graphs cannot easily explain their predictions and are thus often labelled as “black boxes.” For example, Graph Attention Network (GAT) is a frequently used GNN architecture, which adopts an attention mechanism to carefully select the neighborhood nodes for message passing and aggregation. However, it is difficult to explain why certain neighbors are selected while others are not and how the selected neighbors contribute to the final classification result. In this paper, we present a graph learning model called Explainable Graph Attention Network (XGAT), which integrates graph attention modeling and explainability. We use a single model to target both the accuracy and explainability of problem spaces and show that in the context of graph attention modeling, we can design a unified neighborhood selection strategy that selects appropriate neighbor nodes for both better accuracy and enhanced explainability. To justify this, we conduct extensive experiments to better understand the behavior of our model under different conditions and show an increase in both accuracy and explainability.Keywords: explainable AI, graph attention network, graph neural network, node classification
Procedia PDF Downloads 198334 Aligning the Sustainability Policy Areas for Decarbonisation and Value Addition at an Organisational Level
Authors: Bishal Baniya
Abstract:
This paper proposes the sustainability related policy areas for decarbonisation and value addition at an organizational level. General and public sector organizations around the world are usually significant in terms of consuming resources and producing waste – powered through their massive procurement capacity. However, these organizations also possess huge potential to cut resource use and emission as many of these organizations controls supply chain of goods/services. They can therefore be a trend setter and can easily lead other major economic sectors such as manufacturing, construction and mining, transportation, etc. in pursuit towards paradigm shift for sustainability. Whilst the environmental and social awareness has improved in recent years and they have identified policy areas to improve the organizational environmental performance, value addition to the core business of the organization hasn’t been understood and interpreted correctly. This paper therefore investigates ways to align sustainability policy measures in a way that it creates better value proposition relative to benchmark by accounting both eco and social efficiency. Preliminary analysis shows co-benefits other than resource and cost savings fosters the business cases for organizations and this can be achieved by better aligning the policy measures and engaging stakeholders.Keywords: policy measures, environmental performance, value proposition, organisational level
Procedia PDF Downloads 150333 Clustering of Association Rules of ISIS & Al-Qaeda Based on Similarity Measures
Authors: Tamanna Goyal, Divya Bansal, Sanjeev Sofat
Abstract:
In world-threatening terrorist attacks, where early detection, distinction, and prediction are effective diagnosis techniques and for functionally accurate and precise analysis of terrorism data, there are so many data mining & statistical approaches to assure accuracy. The computational extraction of derived patterns is a non-trivial task which comprises specific domain discovery by means of sophisticated algorithm design and analysis. This paper proposes an approach for similarity extraction by obtaining the useful attributes from the available datasets of terrorist attacks and then applying feature selection technique based on the statistical impurity measures followed by clustering techniques on the basis of similarity measures. On the basis of degree of participation of attributes in the rules, the associative dependencies between the attacks are analyzed. Consequently, to compute the similarity among the discovered rules, we applied a weighted similarity measure. Finally, the rules are grouped by applying using hierarchical clustering. We have applied it to an open source dataset to determine the usability and efficiency of our technique, and a literature search is also accomplished to support the efficiency and accuracy of our results.Keywords: association rules, clustering, similarity measure, statistical approaches
Procedia PDF Downloads 320332 A New DIDS Design Based on a Combination Feature Selection Approach
Authors: Adel Sabry Eesa, Adnan Mohsin Abdulazeez Brifcani, Zeynep Orman
Abstract:
Feature selection has been used in many fields such as classification, data mining and object recognition and proven to be effective for removing irrelevant and redundant features from the original data set. In this paper, a new design of distributed intrusion detection system using a combination feature selection model based on bees and decision tree. Bees algorithm is used as the search strategy to find the optimal subset of features, whereas decision tree is used as a judgment for the selected features. Both the produced features and the generated rules are used by Decision Making Mobile Agent to decide whether there is an attack or not in the networks. Decision Making Mobile Agent will migrate through the networks, moving from node to another, if it found that there is an attack on one of the nodes, it then alerts the user through User Interface Agent or takes some action through Action Mobile Agent. The KDD Cup 99 data set is used to test the effectiveness of the proposed system. The results show that even if only four features are used, the proposed system gives a better performance when it is compared with the obtained results using all 41 features.Keywords: distributed intrusion detection system, mobile agent, feature selection, bees algorithm, decision tree
Procedia PDF Downloads 408331 Strategies Used by the Saffron Producers of Taliouine (Morocco) to Adapt to Climate Change
Authors: Aziz Larbi, Widad Sadok
Abstract:
In Morocco, the mountainous regions extend over about 26% of the national territory where 30% of the total population live. They contain opportunities for agriculture, forestry, pastureland and mining. The production systems in these zones are characterised by crop diversification. However, these areas have become vulnerable to the effects of climate change. To understand these effects in relation to the population living in these areas, a study was carried out in the zone of Taliouine, in the Anti-Atlas. The vulnerability of crop productions to climate change was analysed and the different ways of adaptation adopted by farmers were identified. The work was done on saffron, the most profitable crop in the target area even though it requires much water. Our results show that the majority of the farmers surveyed had noticed variations in the climate of the region: irregularity of precipitation leading to a decrease in quantity and an uneven distribution throughout the year; rise in temperature; reduction in the cold period and less snow. These variations had impacts on the cropping system of saffron and its productivity. To cope with these effects, the farmers adopted various strategies: better management and use of water; diversification of agricultural activities; increase in the contribution of non-agricultural activities to their gross income; and seasonal migration.Keywords: climate change, Taliouine, saffron, perceptions, adaptation strategies
Procedia PDF Downloads 61330 Combination of Artificial Neural Network Model and Geographic Information System for Prediction Water Quality
Authors: Sirilak Areerachakul
Abstract:
Water quality has initiated serious management efforts in many countries. Artificial Neural Network (ANN) models are developed as forecasting tools in predicting water quality trend based on historical data. This study endeavors to automatically classify water quality. The water quality classes are evaluated using 6 factor indices. These factors are pH value (pH), Dissolved Oxygen (DO), Biochemical Oxygen Demand (BOD), Nitrate Nitrogen (NO3N), Ammonia Nitrogen (NH3N) and Total Coliform (T-Coliform). The methodology involves applying data mining techniques using multilayer perceptron (MLP) neural network models. The data consisted of 11 sites of Saen Saep canal in Bangkok, Thailand. The data is obtained from the Department of Drainage and Sewerage Bangkok Metropolitan Administration during 2007-2011. The results of multilayer perceptron neural network exhibit a high accuracy multilayer perception rate at 94.23% in classifying the water quality of Saen Saep canal in Bangkok. Subsequently, this encouraging result could be combined with GIS data improves the classification accuracy significantly.Keywords: artificial neural network, geographic information system, water quality, computer science
Procedia PDF Downloads 343329 A Conv-Long Short-term Memory Deep Learning Model for Traffic Flow Prediction
Authors: Ali Reza Sattarzadeh, Ronny J. Kutadinata, Pubudu N. Pathirana, Van Thanh Huynh
Abstract:
Traffic congestion has become a severe worldwide problem, affecting everyday life, fuel consumption, time, and air pollution. The primary causes of these issues are inadequate transportation infrastructure, poor traffic signal management, and rising population. Traffic flow forecasting is one of the essential and effective methods in urban congestion and traffic management, which has attracted the attention of researchers. With the development of technology, undeniable progress has been achieved in existing methods. However, there is a possibility of improvement in the extraction of temporal and spatial features to determine the importance of traffic flow sequences and extraction features. In the proposed model, we implement the convolutional neural network (CNN) and long short-term memory (LSTM) deep learning models for mining nonlinear correlations and their effectiveness in increasing the accuracy of traffic flow prediction in the real dataset. According to the experiments, the results indicate that implementing Conv-LSTM networks increases the productivity and accuracy of deep learning models for traffic flow prediction.Keywords: deep learning algorithms, intelligent transportation systems, spatiotemporal features, traffic flow prediction
Procedia PDF Downloads 171328 Advancement of Computer Science Research in Nigeria: A Bibliometric Analysis of the Past Three Decades
Authors: Temidayo O. Omotehinwa, David O. Oyewola, Friday J. Agbo
Abstract:
This study aims to gather a proper perspective of the development landscape of Computer Science research in Nigeria. Therefore, a bibliometric analysis of 4,333 bibliographic records of Computer Science research in Nigeria in the last 31 years (1991-2021) was carried out. The bibliographic data were extracted from the Scopus database and analyzed using VOSviewer and the bibliometrix R package through the biblioshiny web interface. The findings of this study revealed that Computer Science research in Nigeria has a growth rate of 24.19%. The most developed and well-studied research areas in the Computer Science field in Nigeria are machine learning, data mining, and deep learning. The social structure analysis result revealed that there is a need for improved international collaborations. Sparsely established collaborations are largely influenced by geographic proximity. The funding analysis result showed that Computer Science research in Nigeria is under-funded. The findings of this study will be useful for researchers conducting Computer Science related research. Experts can gain insights into how to develop a strategic framework that will advance the field in a more impactful manner. Government agencies and policymakers can also utilize the outcome of this research to develop strategies for improved funding for Computer Science research.Keywords: bibliometric analysis, biblioshiny, computer science, Nigeria, science mapping
Procedia PDF Downloads 112327 Walls, Barriers, and Fences to Informal Political Economy of Land Resource Accesses: A Case of Banyabunagana Along with Uganda–Congo Border, South Western Uganda, Kisoro District
Authors: Niringiye Fred
Abstract:
Banyabunagana has always had access to land resources for grazing animals, sand mining, and farmland across the border in the Democratic Republic of Congo during the pre-colonial and colonial times, usually on an informal arrangement facilitated by kinship ties and rent transactions for these resources. However, in recent periods, the government of the Democratic Republic of the Congo (DRC) has been pursuing a policy of constructing barriers such as walls and fences so that Banyabunagana communities do not access the land on the DRC side of the border. This is happening in the background of increased and intensified demand for land use on the side of the Ugandan community. This paper will attempt to discuss the reasons behind the construction of walls, fences, and other barriers which deny access to land for Banyabunagana communities in Bunagana Parish, Muramba Sub-county- Kisoro district, Uganda. The research will attempt to answer the following main questions, among others, whether there are the factors that explain the construction of walls and fences which could limit or deny access to the informal use of land and other resources and whether policy options to ensure continued access to land and other resources for local communities.Keywords: border, walls, fences, land resource access
Procedia PDF Downloads 124326 Atomic Absorption Spectroscopic Analysis of Heavy Metals in Cancerous Breast Tissues among Women in Jos, Nigeria
Authors: Opeyemi Peter Idowu
Abstract:
Breast cancer is prevalent in northern Nigerian women, most especially in Jos, Plateau State, owing to anthropogenic activities such as solid earth mineral mining as far back as 1904. In this study, atomic absorption spectrometry was used to determine the concentration of eight heavy metals (Cd, As, Cr, Cu, Fe, Pb, Ni, and Zn) in cancerous and non-cancerous breast tissues of Jos Nigerian Women. The levels of heavy metals ranged from 1.08 to 29.34 mg/kg, 0.29 to 10.76 mg/kg, 0.35 to 51.93 mg/kg, 5.15 to 62.93 mg/kg, 11.64 to 51.10 mg/kg, 0.42 to 83.16 mg/kg, 2.08 to 43.07 mg/kg and 1.67 to 71.53 mg/kg for Cd, As, Cr, Cu, Fe, Pb, Ni and Zn respectively. Using MATLAB R2016a, significant differences (tᵥ = 0.0041 - 0.0317) existed between the levels of all the heavy metals in cancerous and non-cancerous breast tissues except Fe. At 0.01 level of significance, a positive significant correlation existed between Pb and Fe, Pb and Cu, Pb and Fe, Ni and Fe, Cr and Pb, as well as Ni and Cr (r = 0.583 – 0.998) in cancerous breast tissues. Using ANOVA, significant differences also occurred in the levels of these heavy metals in cancerous breast tissues (p = 1.910510×10⁻²⁶). The relatively high levels of the cancer-induced heavy metals (Cd, As, Cr, and Pb) compared with control indicated contamination or exposure to heavy metals, which could be the major cause of cancer in these female subjects. This was evidence of contamination as a result of exposure by ingestion, inhalation, or other means to one anthropogenic activity of the other. Therapeutic measures such as gastric lavage, ascorbic acid consumption, and divalent cation treatment are all effective ways to manage heavy metal toxicity in the subjects to lower the risk of breast cancer.Keywords: breast cancer, heavy metals, spectroscopy, bio-accumulation
Procedia PDF Downloads 26325 A Methodology for Investigating Public Opinion Using Multilevel Text Analysis
Authors: William Xiu Shun Wong, Myungsu Lim, Yoonjin Hyun, Chen Liu, Seongi Choi, Dasom Kim, Kee-Young Kwahk, Namgyu Kim
Abstract:
Recently, many users have begun to frequently share their opinions on diverse issues using various social media. Therefore, numerous governments have attempted to establish or improve national policies according to the public opinions captured from various social media. In this paper, we indicate several limitations of the traditional approaches to analyze public opinion on science and technology and provide an alternative methodology to overcome these limitations. First, we distinguish between the science and technology analysis phase and the social issue analysis phase to reflect the fact that public opinion can be formed only when a certain science and technology is applied to a specific social issue. Next, we successively apply a start list and a stop list to acquire clarified and interesting results. Finally, to identify the most appropriate documents that fit with a given subject, we develop a new logical filter concept that consists of not only mere keywords but also a logical relationship among the keywords. This study then analyzes the possibilities for the practical use of the proposed methodology thorough its application to discover core issues and public opinions from 1,700,886 documents comprising SNS, blogs, news, and discussions.Keywords: big data, social network analysis, text mining, topic modeling
Procedia PDF Downloads 294324 Multi-Criteria Inventory Classification Process Based on Logical Analysis of Data
Authors: Diana López-Soto, Soumaya Yacout, Francisco Ángel-Bello
Abstract:
Although inventories are considered as stocks of money sitting on shelve, they are needed in order to secure a constant and continuous production. Therefore, companies need to have control over the amount of inventory in order to find the balance between excessive and shortage of inventory. The classification of items according to certain criteria such as the price, the usage rate and the lead time before arrival allows any company to concentrate its investment in inventory according to certain ranking or priority of items. This makes the decision making process for inventory management easier and more justifiable. The purpose of this paper is to present a new approach for the classification of new items based on the already existing criteria. This approach is called the Logical Analysis of Data (LAD). It is used in this paper to assist the process of ABC items classification based on multiple criteria. LAD is a data mining technique based on Boolean theory that is used for pattern recognition. This technique has been tested in medicine, industry, credit risk analysis, and engineering with remarkable results. An application on ABC inventory classification is presented for the first time, and the results are compared with those obtained when using the well-known AHP technique and the ANN technique. The results show that LAD presented very good classification accuracy.Keywords: ABC multi-criteria inventory classification, inventory management, multi-class LAD model, multi-criteria classification
Procedia PDF Downloads 881323 Evaluation of the Urban Regeneration Project: Land Use Transformation and SNS Big Data Analysis
Authors: Ju-Young Kim, Tae-Heon Moon, Jung-Hun Cho
Abstract:
Urban regeneration projects have been actively promoted in Korea. In particular, Jeonju Hanok Village is evaluated as one of representative cases in terms of utilizing local cultural heritage sits in the urban regeneration project. However, recently, there has been a growing concern in this area, due to the ‘gentrification’, caused by the excessive commercialization and surging tourists. This trend was changing land and building use and resulted in the loss of identity of the region. In this regard, this study analyzed the land use transformation between 2010 and 2016 to identify the commercialization trend in Jeonju Hanok Village. In addition, it conducted SNS big data analysis on Jeonju Hanok Village from February 14th, 2016 to March 31st, 2016 to identify visitors’ awareness of the village. The study results demonstrate that rapid commercialization was underway, unlikely the initial intention, so that planners and officials in city government should reconsider the project direction and rebuild deliberate management strategies. This study is meaningful in that it analyzed the land use transformation and SNS big data to identify the current situation in urban regeneration area. Furthermore, it is expected that the study results will contribute to the vitalization of regeneration area.Keywords: land use, SNS, text mining, urban regeneration
Procedia PDF Downloads 293322 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification
Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh
Abstract:
Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.Keywords: cancer classification, feature selection, deep learning, genetic algorithm
Procedia PDF Downloads 111321 CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm
Authors: Ghada Badr, Arwa Alturki
Abstract:
The biological function of an RNA molecule depends on its structure. The objective of the alignment is finding the homology between two or more RNA secondary structures. Knowing the common functionalities between two RNA structures allows a better understanding and a discovery of other relationships between them. Besides, identifying non-coding RNAs -that is not translated into a protein- is a popular application in which RNA structural alignment is the first step A few methods for RNA structure-to-structure alignment have been developed. Most of these methods are partial structure-to-structure, sequence-to-structure, or structure-to-sequence alignment. Less attention is given in the literature to the use of efficient RNA structure representation and the structure-to-structure alignment methods are lacking. In this paper, we introduce an O(N2) Component-based Pairwise RNA Structure Alignment (CompPSA) algorithm, where structures are given as a component-based representation and where N is the maximum number of components in the two structures. The proposed algorithm compares the two RNA secondary structures based on their weighted component features rather than on their base-pair details. Extensive experiments are conducted illustrating the efficiency of the CompPSA algorithm when compared to other approaches and on different real and simulated datasets. The CompPSA algorithm shows an accurate similarity measure between components. The algorithm gives the flexibility for the user to align the two RNA structures based on their weighted features (position, full length, and/or stem length). Moreover, the algorithm proves scalability and efficiency in time and memory performance.Keywords: alignment, RNA secondary structure, pairwise, component-based, data mining
Procedia PDF Downloads 458320 Using Closed Frequent Itemsets for Hierarchical Document Clustering
Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu
Abstract:
Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.Keywords: FIHC, documents clustering, ontology, closed frequent itemset
Procedia PDF Downloads 399319 Application of Latent Class Analysis and Self-Organizing Maps for the Prediction of Treatment Outcomes for Chronic Fatigue Syndrome
Authors: Ben Clapperton, Daniel Stahl, Kimberley Goldsmith, Trudie Chalder
Abstract:
Chronic fatigue syndrome (CFS) is a condition characterised by chronic disabling fatigue and other symptoms that currently can't be explained by any underlying medical condition. Although clinical trials support the effectiveness of cognitive behaviour therapy (CBT), the success rate for individual patients is modest. Patients vary in their response and little is known which factors predict or moderate treatment outcomes. The aim of the project is to develop a prediction model from baseline characteristics of patients, such as demographics, clinical and psychological variables, which may predict likely treatment outcome and provide guidance for clinical decision making and help clinicians to recommend the best treatment. The project is aimed at identifying subgroups of patients with similar baseline characteristics that are predictive of treatment effects using modern cluster analyses and data mining machine learning algorithms. The characteristics of these groups will then be used to inform the types of individuals who benefit from a specific treatment. In addition, results will provide a better understanding of for whom the treatment works. The suitability of different clustering methods to identify subgroups and their response to different treatments of CFS patients is compared.Keywords: chronic fatigue syndrome, latent class analysis, prediction modelling, self-organizing maps
Procedia PDF Downloads 226318 A Critical Geography of Reforestation Program in Ghana
Authors: John Narh
Abstract:
There is high rate of deforestation in Ghana due to agricultural expansion, illegal mining and illegal logging. While it is attempting to address the illegalities, Ghana has also initiated a reforestation program known as the Modified Taungya System (MTS). Within the MTS framework, farmers are allocated degraded forestland and provided with tree seedlings to practice agroforestry until the trees form canopy. Yet, the political, ecological and economic models that inform the selection of tree species, the motivations of participating farmers as well as the factors that accounts for differential access to the land and performance of farmers engaged in the program lie underexplored. Using a sequential explanatory mixed methods approach in five forest-fringe communities in the Eastern Region of Ghana, the study reveals that economic factors and Ghana’s commitment to international conventions on the environment underpin the selection of tree species for the MTS program. Social network and access to remittances play critical roles in having access to, and enhances poor farmers’ chances in the program respectively. Farmers are more motivated by the access to degraded forestland to cultivate food crops than having a share in the trees that they plant. As such, in communities where participating farmers are not informed about their benefit in the tree that they plant, the program is largely unsuccessful.Keywords: translocality, deforestation, forest management, social network
Procedia PDF Downloads 97317 Leaching Properties of Phosphate Rocks in the Nile River
Authors: Abdelkader T. Ahmed
Abstract:
Phosphate Rocks (PR) are natural sediment rocks. These rocks contain several chemical compositions of heavy metals and radioactive elements. Mining and transportation these rocks beside or through the natural water streams may lead to water contamination. When PR is in contact with water in the field, as a consequence of precipitation events, changes in water table or sinking in water streams, elements such as salts and heavy metals, may be released to the water. In this work, the leaching properties of PR in Nile River water was investigated by experimental lab work. The study focused on evaluating potential environmental impacts of some constituents, including phosphors, cadmium, curium and lead of PR on the water quality of Nile by applying tank leaching tests. In these tests the potential impact of changing conditions, such as phosphate content in PR, liquid to solid ratio (L/S) and pH value, was studied on the long-term release of heavy metals and salts. Experimental results showed that cadmium and lead were released in very low concentrations but curium and phosphors were in high concentrations. Results showed also that the release rate from PR for all constituents was low even in long periods.Keywords: leaching tests, Nile river, phosphate rocks, water quality
Procedia PDF Downloads 322316 Convergence and Stability in Federated Learning with Adaptive Differential Privacy Preservation
Authors: Rizwan Rizwan
Abstract:
This paper provides an overview of Federated Learning (FL) and its application in enhancing data security, privacy, and efficiency. FL utilizes three distinct architectures to ensure privacy is never compromised. It involves training individual edge devices and aggregating their models on a server without sharing raw data. This approach not only provides secure models without data sharing but also offers a highly efficient privacy--preserving solution with improved security and data access. Also we discusses various frameworks used in FL and its integration with machine learning, deep learning, and data mining. In order to address the challenges of multi--party collaborative modeling scenarios, a brief review FL scheme combined with an adaptive gradient descent strategy and differential privacy mechanism. The adaptive learning rate algorithm adjusts the gradient descent process to avoid issues such as model overfitting and fluctuations, thereby enhancing modeling efficiency and performance in multi-party computation scenarios. Additionally, to cater to ultra-large-scale distributed secure computing, the research introduces a differential privacy mechanism that defends against various background knowledge attacks.Keywords: federated learning, differential privacy, gradient descent strategy, convergence, stability, threats
Procedia PDF Downloads 30315 Balance Transfer of Heavy Metals in Marine Environments Subject to Natural and Anthropogenic Inputs: A Case Study on the Mejerda River Delta
Authors: Mohamed Amine Helali, Walid Oueslati, Ayed Added
Abstract:
Sedimentation rates and total fluxes of heavy metals (Fe, Mn, Pb, Zn and Cu) was measured in three different depths (10m, 20m and 40m) during March and August 2012, offshore of the Mejerda River outlet (Gulf of Tunis, Tunisia). The sedimentation rates are estimated from the fluxes of the suspended particulate matter at 7.32, 5.45 and 4.39 mm y⁻¹ respectively at 10m, 20m and 40m depth. Heavy metals sequestration in sediments was determined by chemical speciation and the total metal contents in each core collected from 10, 20 and 40m depth. Heavy metals intake to the sediment was measured also from the suspended particulate matter, while the fluxes from the sediment to the water column was determined using the benthic chambers technique and from the diffusive fluxes in the pore water. Results shown that iron is the only metal for which the balance transfer between intake/uptake (45 to 117 / 1.8 to 5.8 g m² y⁻¹) and sequestration (277 to 378 g m² y⁻¹) was negative, at the opposite of the Lead which intake fluxes (360 to 480 mg m² y⁻¹) are more than sequestration fluxes (50 to 92 mg m² y⁻¹). The balance transfer is neutral for Mn, Zn, and Cu. These clearly indicate that the contributions of Mejerda have consistently varied over time, probably due to the migration of the River mouth and to the changes in the mining activity in the Mejerda catchment and the recent human activities which affect the delta area.Keywords: delta, fluxes, heavy metals, sediments, sedimentation rates
Procedia PDF Downloads 202314 Data-Driven Decision Making: A Reference Model for Organizational, Educational and Competency-Based Learning Systems
Authors: Emanuel Koseos
Abstract:
Data-Driven Decision Making (DDDM) refers to making decisions that are based on historical data in order to inform practice, develop strategies and implement policies that benefit organizational settings. In educational technology, DDDM facilitates the implementation of differential educational learning approaches such as Educational Data Mining (EDM) and Competency-Based Education (CBE), which commonly target university classrooms. There is a current need for DDDM models applied to middle and secondary schools from a concern for assessing the needs, progress and performance of students and educators with respect to regional standards, policies and evolution of curriculums. To address these concerns, we propose a DDDM reference model developed using educational key process initiatives as inputs to a machine learning framework implemented with statistical software (SAS, R) to provide a best-practices, complex-free and automated approach for educators at their regional level. We assessed the efficiency of the model over a six-year period using data from 45 schools and grades K-12 in the Langley, BC, Canada regional school district. We concluded that the model has wider appeal, such as business learning systems.Keywords: competency-based learning, data-driven decision making, machine learning, secondary schools
Procedia PDF Downloads 172313 Human Digital Twin for Personal Conversation Automation Using Supervised Machine Learning Approaches
Authors: Aya Salama
Abstract:
Digital Twin is an emerging research topic that attracted researchers in the last decade. It is used in many fields, such as smart manufacturing and smart healthcare because it saves time and money. It is usually related to other technologies such as Data Mining, Artificial Intelligence, and Machine Learning. However, Human digital twin (HDT), in specific, is still a novel idea that still needs to prove its feasibility. HDT expands the idea of Digital Twin to human beings, which are living beings and different from the inanimate physical entities. The goal of this research was to create a Human digital twin that is responsible for real-time human replies automation by simulating human behavior. For this reason, clustering, supervised classification, topic extraction, and sentiment analysis were studied in this paper. The feasibility of the HDT for personal replies generation on social messaging applications was proved in this work. The overall accuracy of the proposed approach in this paper was 63% which is a very promising result that can open the way for researchers to expand the idea of HDT. This was achieved by using Random Forest for clustering the question data base and matching new questions. K-nearest neighbor was also applied for sentiment analysis.Keywords: human digital twin, sentiment analysis, topic extraction, supervised machine learning, unsupervised machine learning, classification, clustering
Procedia PDF Downloads 87312 Development of Prediction Models of Day-Ahead Hourly Building Electricity Consumption and Peak Power Demand Using the Machine Learning Method
Authors: Dalin Si, Azizan Aziz, Bertrand Lasternas
Abstract:
To encourage building owners to purchase electricity at the wholesale market and reduce building peak demand, this study aims to develop models that predict day-ahead hourly electricity consumption and demand using artificial neural network (ANN) and support vector machine (SVM). All prediction models are built in Python, with tool Scikit-learn and Pybrain. The input data for both consumption and demand prediction are time stamp, outdoor dry bulb temperature, relative humidity, air handling unit (AHU), supply air temperature and solar radiation. Solar radiation, which is unavailable a day-ahead, is predicted at first, and then this estimation is used as an input to predict consumption and demand. Models to predict consumption and demand are trained in both SVM and ANN, and depend on cooling or heating, weekdays or weekends. The results show that ANN is the better option for both consumption and demand prediction. It can achieve 15.50% to 20.03% coefficient of variance of root mean square error (CVRMSE) for consumption prediction and 22.89% to 32.42% CVRMSE for demand prediction, respectively. To conclude, the presented models have potential to help building owners to purchase electricity at the wholesale market, but they are not robust when used in demand response control.Keywords: building energy prediction, data mining, demand response, electricity market
Procedia PDF Downloads 316311 Geomechanical Technologies for Assessing Three-Dimensional Stability of Underground Excavations Utilizing Remote-Sensing, Finite Element Analysis, and Scientific Visualization
Authors: Kwang Chun, John Kemeny
Abstract:
Light detection and ranging (LiDAR) has been a prevalent remote-sensing technology applied in the geological fields due to its high precision and ease of use. One of the major applications is to use the detailed geometrical information of underground structures as a basis for the generation of a three-dimensional numerical model that can be used in a geotechnical stability analysis such as FEM or DEM. To date, however, straightforward techniques in reconstructing the numerical model from the scanned data of the underground structures have not been well established or tested. In this paper, we propose a comprehensive approach integrating all the various processes, from LiDAR scanning to finite element numerical analysis. The study focuses on converting LiDAR 3D point clouds of geologic structures containing complex surface geometries into a finite element model. This methodology has been applied to Kartchner Caverns in Arizona, where detailed underground and surface point clouds can be used for the analysis of underground stability. Numerical simulations were performed using the finite element code Abaqus and presented by 3D computing visualization solution, ParaView. The results are useful in studying the stability of all types of underground excavations including underground mining and tunneling.Keywords: finite element analysis, LiDAR, remote-sensing, scientific visualization, underground stability
Procedia PDF Downloads 174