Search results for: biological data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27305

Search results for: biological data mining

26465 Artificial Intelligence Methods in Estimating the Minimum Miscibility Pressure Required for Gas Flooding

Authors: Emad A. Mohammed

Abstract:

Utilizing the capabilities of Data Mining and Artificial Intelligence in the prediction of the minimum miscibility pressure (MMP) required for multi-contact miscible (MCM) displacement of reservoir petroleum by hydrocarbon gas flooding using Fuzzy Logic models and Artificial Neural Network models will help a lot in giving accurate results. The factors affecting the (MMP) as it is proved from the literature and from the dataset are as follows: XC2-6: Intermediate composition in the oil-containing C2-6, CO2 and H2S, in mole %, XC1: Amount of methane in the oil (%),T: Temperature (°C), MwC7+: Molecular weight of C7+ (g/mol), YC2+: Mole percent of C2+ composition in injected gas (%), MwC2+: Molecular weight of C2+ in injected gas. Fuzzy Logic and Neural Networks have been used widely in prediction and classification, with relatively high accuracy, in different fields of study. It is well known that the Fuzzy Inference system can handle uncertainty within the inputs such as in our case. The results of this work showed that our proposed models perform better with higher performance indices than other emprical correlations.

Keywords: MMP, gas flooding, artificial intelligence, correlation

Procedia PDF Downloads 144
26464 Leachate Discharges: Review Treatment Techniques

Authors: Abdelkader Anouzla, Soukaina Bouaouda, Roukaya Bouyakhsass, Salah Souabi, Abdeslam Taleb

Abstract:

During storage and under the combined action of rainwater and natural fermentation, these wastes produce over 800.000 m3 of landfill leachates. Due to population growth and changing global economic activities, the amount of waste constantly generated increases, making more significant volumes of leachate. Leachate, when leaching into the soil, can negatively impact soil, surface water, groundwater, and the overall environment and human life. The leachate must first be treated because of its high pollutant load before being released into the environment. This article reviews the different leachate treatments in September 2022 techniques. Different techniques can be used for this purpose, such as biological, physical-chemical, and membrane methods. Young leachate is biodegradable; in contrast, these biological processes lose their effectiveness with leachate aging. They are characterized by high ammonia nitrogen concentrations that inhibit their activity. Most physical-chemical treatments serve as pre-treatment or post-treatment to complement conventional treatment processes or remove specific contaminants. After the introduction, the different types of pollutants present in leachates and their impacts have been made, followed by a discussion highlighting the advantages and disadvantages of the various treatments, whether biological, physicochemical, or membrane. From this work, due to their simplicity and reasonable cost compared to other treatment procedures, biological treatments offer the most suitable alternative to limit the effects produced by the pollutants in landfill leachates.

Keywords: landfill leachate, landfill pollution, impact, wastewater

Procedia PDF Downloads 89
26463 Medicompills Architecture: A Mathematical Precise Tool to Reduce the Risk of Diagnosis Errors on Precise Medicine

Authors: Adriana Haulica

Abstract:

Powered by Machine Learning, Precise medicine is tailored by now to use genetic and molecular profiling, with the aim of optimizing the therapeutic benefits for cohorts of patients. As the majority of Machine Language algorithms come from heuristics, the outputs have contextual validity. This is not very restrictive in the sense that medicine itself is not an exact science. Meanwhile, the progress made in Molecular Biology, Bioinformatics, Computational Biology, and Precise Medicine, correlated with the huge amount of human biology data and the increase in computational power, opens new healthcare challenges. A more accurate diagnosis is needed along with real-time treatments by processing as much as possible from the available information. The purpose of this paper is to present a deeper vision for the future of Artificial Intelligence in Precise medicine. In fact, actual Machine Learning algorithms use standard mathematical knowledge, mostly Euclidian metrics and standard computation rules. The loss of information arising from the classical methods prevents obtaining 100% evidence on the diagnosis process. To overcome these problems, we introduce MEDICOMPILLS, a new architectural concept tool of information processing in Precise medicine that delivers diagnosis and therapy advice. This tool processes poly-field digital resources: global knowledge related to biomedicine in a direct or indirect manner but also technical databases, Natural Language Processing algorithms, and strong class optimization functions. As the name suggests, the heart of this tool is a compiler. The approach is completely new, tailored for omics and clinical data. Firstly, the intrinsic biological intuition is different from the well-known “a needle in a haystack” approach usually used when Machine Learning algorithms have to process differential genomic or molecular data to find biomarkers. Also, even if the input is seized from various types of data, the working engine inside the MEDICOMPILLS does not search for patterns as an integrative tool. This approach deciphers the biological meaning of input data up to the metabolic and physiologic mechanisms, based on a compiler with grammars issued from bio-algebra-inspired mathematics. It translates input data into bio-semantic units with the help of contextual information iteratively until Bio-Logical operations can be performed on the base of the “common denominator “rule. The rigorousness of MEDICOMPILLS comes from the structure of the contextual information on functions, built to be analogous to mathematical “proofs”. The major impact of this architecture is expressed by the high accuracy of the diagnosis. Detected as a multiple conditions diagnostic, constituted by some main diseases along with unhealthy biological states, this format is highly suitable for therapy proposal and disease prevention. The use of MEDICOMPILLS architecture is highly beneficial for the healthcare industry. The expectation is to generate a strategic trend in Precise medicine, making medicine more like an exact science and reducing the considerable risk of errors in diagnostics and therapies. The tool can be used by pharmaceutical laboratories for the discovery of new cures. It will also contribute to better design of clinical trials and speed them up.

Keywords: bio-semantic units, multiple conditions diagnosis, NLP, omics

Procedia PDF Downloads 70
26462 A Review on Big Data Movement with Different Approaches

Authors: Nay Myo Sandar

Abstract:

With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.

Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques

Procedia PDF Downloads 86
26461 The “Bright Side” of COVID-19: Effects of Livestream Affordances on Consumer Purchase Willingness: Explicit IT Affordances Perspective

Authors: Isaac Owusu Asante, Yushi Jiang, Hailin Tao

Abstract:

Live streaming marketing, the new electronic commerce element, became an optional marketing channel following the COVID-19 pandemic. Many sellers have leveraged the features presented by live streaming to increase sales. Studies on live streaming have focused on gaming and consumers’ loyalty to brands through live streaming, using interview questionnaires. This study, however, was conducted to measure real-time observable interactions between consumers and sellers. Based on the affordance theory, this study conceptualized constructs representing the interactive features and examined how they drive consumers’ purchase willingness during live streaming sessions using 1238 datasets from Amazon Live, following the manual observation of transaction records. Using structural equation modeling, the ordinary least square regression suggests that live viewers, new followers, live chats, and likes positively affect purchase willingness. The Sobel and Monte Carlo tests show that new followers, live chats, and likes significantly mediate the relationship between live viewers and purchase willingness. The study introduces a new way of measuring interactions in live streaming commerce and proposes a way to manually gather data on consumer behaviors in live streaming platforms when the application programming interface (API) of such platforms does not support data mining algorithms.

Keywords: livestreaming marketing, live chats, live viewers, likes, new followers, purchase willingness

Procedia PDF Downloads 81
26460 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation

Procedia PDF Downloads 333
26459 The Effect of Additive Acid on the Phytoremediation Efficiency

Authors: G. Hosseini, A. Sadighzadeh, M. Rahimnejad, N. Hosseini, Z. Jamalzadeh

Abstract:

Metal pollutants, especially heavy metals from anthropogenic sources such as metallurgical industries’ waste including mining, smelting, casting or production of nuclear fuel, including mining, concentrate production and uranium processing ends in the environment contamination (water and soil) and risk to human health around the facilities of this type of industrial activity. There are different methods that can be used to remove these contaminants from water and soil. These are very expensive and time-consuming. In this case, the people have been forced to leave the area and the decontamination is not done. For example, in the case of Chernobyl accident, an area of 30 km around the plant was emptied of human life. A very efficient and cost-effective method for decontamination of the soil and the water is phytoremediation. In this method, the plants preferentially native plants which are more adaptive to the regional climate are well used. In this study, three types of plants including Alfalfa, Sunflower and wheat were used to Barium decontamination. Alfalfa and Sunflower were not grown good enough in Saghand mine’s soil sample. This can be due to non-native origin of these plants. But, Wheat rise in Saghand Uranium Mine soil sample was satisfactory. In this study, we have investigated the effect of 4 types of acids inclusive nitric acid, oxalic acid, acetic acid and citric acid on the removal efficiency of Barium by Wheat. Our results indicate the increase of Barium absorption in the presence of citric acid in the soil. In this paper, we will present our research and laboratory results.

Keywords: phytoremediation, heavy metal, wheat, soil

Procedia PDF Downloads 338
26458 Biological Aquaculture System (BAS) Design and Water Quality on Marble Goby (Oxyeleotris marmoratus): A Water Recirculating Technology

Authors: AnnWon Chew, Nik Norulaini Nik Ab Rahman, Mohd Omar Ab Kadir, C. C. Chen, Jaafar Chua

Abstract:

This paper presents an innovative process to solve the ammonia, nitrite and nitrate build-up problem in recirculating system using Biological Aquaculture System (BAS). The novel aspects of the process lie in a series of bioreactors that specially arrange and design to meet the required conditions for water purification. The BAS maximizes the utilization of bio-balls as the ideal surface for beneficial microbes to flourish. It also serves as a physical barrier that traps organic particles, which in turn becomes source for the microbes to perform their work. The operation in the proposed system gives a low concentration and average range of good maintain excellent water quality, i.e., with low levels of ammonia, nitrite, nitrate, a suitable pH range for aquaculture and low turbidity. The BAS thus provides a solution for sustainable small-scale, urban aquaculture operation with a high recovery water and minimal waste disposal.

Keywords: ammonia, bioreactor, Biological Aquaculture System (BAS), bio-balls, water recirculating technology

Procedia PDF Downloads 592
26457 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: artificial neural networks, breast cancer, classifiers, cervical cancer, f-score, machine learning, precision, recall

Procedia PDF Downloads 277
26456 The Regulation of Reputational Information in the Sharing Economy

Authors: Emre Bayamlıoğlu

Abstract:

This paper aims to provide an account of the legal and the regulative aspects of the algorithmic reputation systems with a special emphasis on the sharing economy (i.e., Uber, Airbnb, Lyft) business model. The first section starts with an analysis of the legal and commercial nature of the tripartite relationship among the parties, namely, the host platform, individual sharers/service providers and the consumers/users. The section further examines to what extent an algorithmic system of reputational information could serve as an alternative to legal regulation. Shortcomings are explained and analyzed with specific examples from Airbnb Platform which is a pioneering success in the sharing economy. The following section focuses on the issue of governance and control of the reputational information. The section first analyzes the legal consequences of algorithmic filtering systems to detect undesired comments and how a delicate balance could be struck between the competing interests such as freedom of speech, privacy and the integrity of the commercial reputation. The third section deals with the problem of manipulation by users. Indeed many sharing economy businesses employ certain techniques of data mining and natural language processing to verify consistency of the feedback. Software agents referred as "bots" are employed by the users to "produce" fake reputation values. Such automated techniques are deceptive with significant negative effects for undermining the trust upon which the reputational system is built. The third section is devoted to explore the concerns with regard to data mobility, data ownership, and the privacy. Reputational information provided by the consumers in the form of textual comment may be regarded as a writing which is eligible to copyright protection. Algorithmic reputational systems also contain personal data pertaining both the individual entrepreneurs and the consumers. The final section starts with an overview of the notion of reputation as a communitarian and collective form of referential trust and further provides an evaluation of the above legal arguments from the perspective of public interest in the integrity of reputational information. The paper concludes with certain guidelines and design principles for algorithmic reputation systems, to address the above raised legal implications.

Keywords: sharing economy, design principles of algorithmic regulation, reputational systems, personal data protection, privacy

Procedia PDF Downloads 465
26455 Pattern Discovery from Student Feedback: Identifying Factors to Improve Student Emotions in Learning

Authors: Angelina A. Tzacheva, Jaishree Ranganathan

Abstract:

Interest in (STEM) Science Technology Engineering Mathematics education especially Computer Science education has seen a drastic increase across the country. This fuels effort towards recruiting and admitting a diverse population of students. Thus the changing conditions in terms of the student population, diversity and the expected teaching and learning outcomes give the platform for use of Innovative Teaching models and technologies. It is necessary that these methods adapted should also concentrate on raising quality of such innovations and have positive impact on student learning. Light-Weight Team is an Active Learning Pedagogy, which is considered to be low-stake activity and has very little or no direct impact on student grades. Emotion plays a major role in student’s motivation to learning. In this work we use the student feedback data with emotion classification using surveys at a public research institution in the United States. We use Actionable Pattern Discovery method for this purpose. Actionable patterns are patterns that provide suggestions in the form of rules to help the user achieve better outcomes. The proposed method provides meaningful insight in terms of changes that can be incorporated in the Light-Weight team activities, resources utilized in the course. The results suggest how to enhance student emotions to a more positive state, in particular focuses on the emotions ‘Trust’ and ‘Joy’.

Keywords: actionable pattern discovery, education, emotion, data mining

Procedia PDF Downloads 98
26454 Evaluation and Assessment of Bioinformatics Methods and Their Applications

Authors: Fatemeh Nokhodchi Bonab

Abstract:

Bioinformatics, in its broad sense, involves application of computer processes to solve biological problems. A wide range of computational tools are needed to effectively and efficiently process large amounts of data being generated as a result of recent technological innovations in biology and medicine. A number of computational tools have been developed or adapted to deal with the experimental riches of complex and multivariate data and transition from data collection to information or knowledge. These bioinformatics tools are being evaluated and applied in various medical areas including early detection, risk assessment, classification, and prognosis of cancer. The goal of these efforts is to develop and identify bioinformatics methods with optimal sensitivity, specificity, and predictive capabilities. The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems.

Keywords: methods, applications, transcriptional regulatory systems, techniques

Procedia PDF Downloads 127
26453 Plackett-Burman Design to Evaluate the Influence of Operating Parameters on Anaerobic Orthophosphate Release from Enhanced Biological Phosphorus Removal Sludge

Authors: Reza Salehi, Peter L. Dold, Yves Comeau

Abstract:

The aim of the present study was to investigate the effect of a total of 6 operating parameters including pH (X1), temperature (X2), stirring speed (X3), chemical oxygen demand (COD) (X4), volatile suspended solids (VSS) (X5) and time (X6) on anaerobic orthophosphate release from enhanced biological phosphorus removal (EBPR) sludge. An 8-run Plackett Burman design was applied and the statistical analysis of the experimental data was performed using Minitab16.2.4 software package. The Analysis of variance (ANOVA) results revealed that temperature, COD, VSS and time had a significant effect with p-values of less than 0.05 whereas pH and stirring speed were identified as non-significant parameters, but influenced orthophosphate release from the EBPR sludge. The mathematic expression obtained by the first-order multiple linear regression model between orthophosphate release from the EBPR sludge (Y) and the operating parameters (X1-X6) was Y=18.59+1.16X1-3.11X2-0.81X3+3.79X4+9.89X5+4.01X6. The model p-value and coefficient of determination (R2) value were 0.026 and of 99.87%, respectively, which indicates the model is significant and the predicted values of orthophosphate release from the EBPR sludge have been excellently correlated with the observed values.

Keywords: anaerobic, operating parameters, orthophosphate release, Plackett-Burman design

Procedia PDF Downloads 279
26452 Performance Analysis of Proprietary and Non-Proprietary Tools for Regression Testing Using Genetic Algorithm

Authors: K. Hema Shankari, R. Thirumalaiselvi, N. V. Balasubramanian

Abstract:

The present paper addresses to the research in the area of regression testing with emphasis on automated tools as well as prioritization of test cases. The uniqueness of regression testing and its cyclic nature is pointed out. The difference in approach between industry, with business model as basis, and academia, with focus on data mining, is highlighted. Test Metrics are discussed as a prelude to our formula for prioritization; a case study is further discussed to illustrate this methodology. An industrial case study is also described in the paper, where the number of test cases is so large that they have to be grouped as Test Suites. In such situations, a genetic algorithm proposed by us can be used to reconfigure these Test Suites in each cycle of regression testing. The comparison is made between a proprietary tool and an open source tool using the above-mentioned metrics. Our approach is clarified through several tables.

Keywords: APFD metric, genetic algorithm, regression testing, RFT tool, test case prioritization, selenium tool

Procedia PDF Downloads 436
26451 Phytoseiid Mite Species (Acari: Mesostigmata) on Blackberry Plants in Florida and Georgia, USA

Authors: Rana Akyazi, Cal Welbourn, Oscar E. Liburd

Abstract:

The family Phytoseiidae are the most common plant inhabiting group of predatory mites. They are generally considered to be important biological control agents of pest mites on many crops world-wide. Several species of these mites are commercially available in many countries. This study was carried out to determine phytoseiid mite species on nine different blackberry varieties (Arapaho, Choctaw, Kiowa, Nachez, Navaho, Osage, Ouachita, Von, Watchita). The survey was conducted from June to October 2016. Leaf samples were collected monthly from selected organic and conventional commercial blackberry (Rubus spp.) farms in Florida and Georgia, USA. Nine phytoseiid mite (Acari: Mesostigmata) species were determined during the study. The results also showed that the incidence of Phytoseiidae was greater in organic than in conventional blackberries. Future survey studies can provide detection of new species, which may hold potential for biological control of economically important pests in key fruit crops.

Keywords: biological control, mite, Phytoseiidae, predator, Rubus spp.

Procedia PDF Downloads 403
26450 Monitoring the Pollution Status of the Goan Coast Using Genotoxicity Biomarkers in the Bivalve, Meretrix ovum

Authors: Avelyno D'Costa, S. K. Shyama, M. K. Praveen Kumar

Abstract:

The coast of Goa, India receives constant anthropogenic stress through its major rivers which carry mining rejects of iron and manganese ores from upstream mining sites and petroleum hydrocarbons from shipping and harbor-related activities which put the aquatic fauna such as bivalves at risk. The present study reports the pollution status of the Goan coast by the above xenobiotics employing genotoxicity studies. This is further supplemented by the quantification of total petroleum hydrocarbons (TPHs) and various trace metals (iron, manganese, copper, cadmium, and lead) in gills of the estuarine clam, Meretrix ovum as well as from the surrounding water and sediment, over a two-year sampling period, from January 2013 to December 2014. Bivalves were collected from a probable unpolluted site at Palolem and a probable polluted site at Vasco, based upon the anthropogenic activities at these sites. Genotoxicity was assessed in the gill cells using the comet assay and micronucleus test. The quantity of TPHs and trace metals present in gill tissue, water and sediments were analyzed using spectrofluorometry and atomic absorption spectrophotometry (AAS), respectively. The statistical significance of data was analyzed employing Student’s t-test. The relationship between DNA damage and pollutant concentrations was evaluated using multiple regression analysis. Significant DNA damage was observed in the bivalves collected from Vasco which is a region of high industrial activity. Concentrations of TPHs and trace metals (iron, manganese, and cadmium) were also found to be significantly high in gills of the bivalves collected from Vasco compared to those collected from Palolem. Further, the concentrations of these pollutants were also found to be significantly high in the water and sediments at Vasco compared to that of Palolem. This may be due to the lack of industrial activity at Palolem. A high positive correlation was observed between the pollutant levels and DNA damage in the bivalves collected from Vasco suggesting the genotoxic nature of these pollutants. Further, M. ovum can be used as a bioindicator species for monitoring the level of pollution of the estuarine/coastal regions by TPHs and trace metals.

Keywords: comet assay, metals, micronucleus test, total petroleum Hydrocarbons

Procedia PDF Downloads 237
26449 Chemical Composition and Biological Properties of Algerian Honeys

Authors: Ouchemoukh Salim, Amessis-Ouchemoukh Nadia, Guenaoui Nawel, Moumeni Lynda, Zaidi Hicham, Otmani Amar, Sadou Dyhia

Abstract:

Honey is a hive food rich in carbohydrates and water and it also has a lot of nutrients (enzymes, minerals, organic acids, phytochemicals...). It is used in different nutritional and therapeutic fields. Algerian honey was studied for its physicochemical parameters, nutritional values (moisture, brix, pH, electrical conductivity, and amounts of HMF, proteins, proline, total phenolic compounds and flavonoids) and some biological activities (antioxidant, anti-inflammatory and enzymatic anti-browning). The antioxidant activities of the samples were estimated using different methods (ABTS, DPPH free radicals scavenging, reducing power, and chelating ferrous activity). All honeys were acidic (3.45≤pH≤4.65). The color varied from mimosa yellow to dark brown. The specific rotation was levorotatory in most honey samples, and the electrical conductivity, hydroxymethylfurfural, and proline values agreed with the international honey requirements. For anti-inflammatory activity, the results showed that the inhibiting capacity of the denaturation of the BSA of the honey analyzed varied from 15 to 75 % with a maximum of activity at the concentration of 0,5 mg/ml. All honey exhibited enzymatic anti-browning on different slices of fruits. In fact, the results showed that the controls have the greatest browning unit compared to the honeys studied and PPO and POD enzymes had the lowest enzyme activity. High significant correlations were found between the color of honey, its antioxidant content and its biological activities (antioxidant, anti-inflammatory and enzymatic anti-browning). The dark color of honey is a good indicator of the best biological properties, therefore, the best nutritional and therapeutic values.

Keywords: honey, physico-chemical parameters, bioactive compounds, biological properties

Procedia PDF Downloads 55
26448 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: big data analysis, document classification, multi-category, text mining, topic analysis

Procedia PDF Downloads 272
26447 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 89
26446 Learning about the Strengths and Weaknesses of Urban Climate Action Plans

Authors: Prince Dacosta Aboagye, Ayyoob Sharifi

Abstract:

Cities respond to climate concerns mainly through their climate action plans (CAPs). A comprehensive content analysis of the dynamics in existing urban CAPs is not well represented in the literature. This literature void presents a difficulty in appreciating the strengths and weaknesses of urban CAPs. Here, we perform a qualitative content analysis (QCA) on CAPs from 278 cities worldwide and use text-mining tools to map and visualize the relevant data. Our analysis showed a decline in the number of CAPs developed and published following the global COVID-19 lockdown period. Evidently, megacities are leading the deep decarbonisation agenda. We also observed a transition from developing mainly mitigation-focused CAPs pre-COP21 to both mitigation and adaptation CAPs. A lack of inclusiveness in local climate planning was common among European and North American cities. The evidence is a catalyst for understanding the trends in existing urban CAPs to shape future urban climate planning.

Keywords: urban, climate action plans, strengths, weaknesses

Procedia PDF Downloads 96
26445 Analysis of Cardiac Health Using Chaotic Theory

Authors: Chandra Mukherjee

Abstract:

The prevalent knowledge of the biological systems is based on the standard scientific perception of natural equilibrium, determination and predictability. Recently, a rethinking of concepts was presented and a new scientific perspective emerged that involves complexity theory with deterministic chaos theory, nonlinear dynamics and theory of fractals. The unpredictability of the chaotic processes probably would change our understanding of diseases and their management. The mathematical definition of chaos is defined by deterministic behavior with irregular patterns that obey mathematical equations which are critically dependent on initial conditions. The chaos theory is the branch of sciences with an interest in nonlinear dynamics, fractals, bifurcations, periodic oscillations and complexity. Recently, the biomedical interest for this scientific field made these mathematical concepts available to medical researchers and practitioners. Any biological network system is considered to have a nominal state, which is recognized as a homeostatic state. In reality, the different physiological systems are not under normal conditions in a stable state of homeostatic balance, but they are in a dynamically stable state with a chaotic behavior and complexity. Biological systems like heart rhythm and brain electrical activity are dynamical systems that can be classified as chaotic systems with sensitive dependence on initial conditions. In biological systems, the state of a disease is characterized by a loss of the complexity and chaotic behavior, and by the presence of pathological periodicity and regulatory behavior. The failure or the collapse of nonlinear dynamics is an indication of disease rather than a characteristic of health.

Keywords: HRV, HRVI, LF, HF, DII

Procedia PDF Downloads 425
26444 Hydrogeophysical Investigations And Mapping of Ingress Channels Along The Blesbokspruit Stream In The East Rand Basin Of The Witwatersrand, South Africa

Authors: Melvin Sethobya, Sithule Xanga, Sechaba Lenong, Lunga Nolakana, Gbenga Adesola

Abstract:

Mining has been the cornerstone of the South African economy for the last century. Most of the gold mining in South Africa was conducted within the Witwatersrand basin, which contributed to the rapid growth of the city of Johannesburg and capitulated the city to becoming the business and wealth capital of the country. But with gradual depletion of resources, a stoppage in the extraction of underground water from mines and other factors relating to survival of the mining operations over a lengthy period, most of the mines were abandoned and left to pollute the local waterways and groundwater with toxins, heavy metal residue and increased acid mine drainage ensued. The Department of Mineral Resources and Energy commissioned a project whose aim is to monitor, maintain, and mitigate the adverse environmental impacts of polluted water mine water flowing into local streams affecting local ecosystems and livelihoods downstream. As part of mitigation efforts, the diagnosis and monitoring of groundwater or surface water polluted sites has become important. Geophysical surveys, in particular, Resistivity and Magnetics surveys, were selected as some of most suitable techniques for investigation of local ingress points along of one the major streams cutting through the Witwatersrand basin, namely the Blesbokspruit, which is found in the eastern part of the basin. The aim of the surveys was to provide information that could be used to assist in determining possible water loss/ ingress from the Blesbokspriut stream. Modelling of geophysical surveys results offered an in-depth insight into the interaction and pathways of polluted water through mapping of possible ingress channels near the Blesbokspruit. The resistivity - depth profile of the surveyed site exhibit a three(3) layered model with low resistivity values (10 to 200 Ω.m) overburden, which is underlain by a moderate resistivity weathered layer (>300 Ω.m), which sits on a more resistive crystalline bedrock (>500 Ω.m). Two locations of potential ingress channels were mapped across the two traverses at the site. The magnetic survey conducted at the site mapped a major NE-SW trending regional linearment with a strong magnetic signature, which was modeled to depth beyond 100m, with the potential to act as a conduit for dispersion of stream water away from the stream, as it shared a similar orientation with the potential ingress channels as mapped using the resistivity method.

Keywords: eletrictrical resistivity, magnetics survey, blesbokspruit, ingress

Procedia PDF Downloads 63
26443 Evaluation of the Improve Vacuum Blood Collection Tube for Laboratory Tests

Authors: Yoon Kyung Song, Seung Won Han, Sang Hyun Hwang, Do Hoon Lee

Abstract:

Laboratory tests is a significant part for the diagnosis, prognosis, treatment of diseases. Blood collection is a simple process, but can be a potential cause of pre-analytical errors. Vacuum blood collection tubes used to collect and store the blood specimens is necessary for accurate test results. The purpose of this study was to validate Improve serum separator tube(SST) (Guanzhou Improve Medical Instruments Co., Ltd, China) for routine clinical chemistry laboratory testing. Blood specimens were collected from 100 volunteers in three different serum vacuum tubes (Greiner SST , Becton Dickinson SST , Improve SST). The specimens were evaluated for 16 routine chemistry tests using TBA-200FR NEO (Toshiba Medical Co. JAPAN). The results were statistically analyzed by paired t-test and Bland-Altman plot. For stability test, the initial results for each tube were compared with results of 72 hours preserved specimens. Their clinical availability was evaluated by biological Variation of Ricos data bank. Paired t-test analysis revealed that AST, ALT, K, Cl showed statistically same results but calcium (CA), phosphorus(PHOS), glucose(GLU), BUN, uric acid(UA), cholesterol(CHOL), total protein(TP), albumin(ALB), total bilirubin(TB), ALP, creatinine(CRE), sodium(NA) were different(P < 0.05) between Improve SST and Greiner SST. Also, CA, PHOS, TP, TB, AST, ALT, NA, K, Cl showed statistically the same results but GLU, BUN, UA, CHOL, ALB, ALP, CRE were different between Improve SST and Becton Dickinson SST. All statistically different cases were clinically acceptable by biological Variation of Ricos data bank. Improve SST tubes showed satisfactory results compared with Greiner SST and Becton Dickinson SST. We concluded that the tubes are acceptable for routine clinical chemistry laboratory testing.

Keywords: blood collection, Guanzhou Improve, SST, vacuum tube

Procedia PDF Downloads 244
26442 A Ratio-Weighted Decision Tree Algorithm for Imbalance Dataset Classification

Authors: Doyin Afolabi, Phillip Adewole, Oladipupo Sennaike

Abstract:

Most well-known classifiers, including the decision tree algorithm, can make predictions on balanced datasets efficiently. However, the decision tree algorithm tends to be biased towards imbalanced datasets because of the skewness of the distribution of such datasets. To overcome this problem, this study proposes a weighted decision tree algorithm that aims to remove the bias toward the majority class and prevents the reduction of majority observations in imbalance datasets classification. The proposed weighted decision tree algorithm was tested on three imbalanced datasets- cancer dataset, german credit dataset, and banknote dataset. The specificity, sensitivity, and accuracy metrics were used to evaluate the performance of the proposed decision tree algorithm on the datasets. The evaluation results show that for some of the weights of our proposed decision tree, the specificity, sensitivity, and accuracy metrics gave better results compared to that of the ID3 decision tree and decision tree induced with minority entropy for all three datasets.

Keywords: data mining, decision tree, classification, imbalance dataset

Procedia PDF Downloads 136
26441 Research on the Risks of Railroad Receiving and Dispatching Trains Operators: Natural Language Processing Risk Text Mining

Authors: Yangze Lan, Ruihua Xv, Feng Zhou, Yijia Shan, Longhao Zhang, Qinghui Xv

Abstract:

Receiving and dispatching trains is an important part of railroad organization, and the risky evaluation of operating personnel is still reflected by scores, lacking further excavation of wrong answers and operating accidents. With natural language processing (NLP) technology, this study extracts the keywords and key phrases of 40 relevant risk events about receiving and dispatching trains and reclassifies the risk events into 8 categories, such as train approach and signal risks, dispatching command risks, and so on. Based on the historical risk data of personnel, the K-Means clustering method is used to classify the risk level of personnel. The result indicates that the high-risk operating personnel need to strengthen the training of train receiving and dispatching operations towards essential trains and abnormal situations.

Keywords: receiving and dispatching trains, natural language processing, risk evaluation, K-means clustering

Procedia PDF Downloads 91
26440 Monitoring Air Pollution Effects on Children for Supporting Public Health Policy: Preliminary Results of MAPEC_LIFE Project

Authors: Elisabetta Ceretti, Silvia Bonizzoni, Alberto Bonetti, Milena Villarini, Marco Verani, Maria Antonella De Donno, Sara Bonetta, Umberto Gelatti

Abstract:

Introduction: Air pollution is a global problem. In 2013, the International Agency for Research on Cancer (IARC) classified air pollution and particulate matter as carcinogenic to human. The study of the health effects of air pollution in children is very important because they are a high-risk group in terms of the health effects of air pollution and early exposure during childhood can increase the risk of developing chronic diseases in adulthood. The MAPEC_LIFE (Monitoring Air Pollution Effects on Children for supporting public health policy) is a project founded by EU Life+ Programme which intends to evaluate the associations between air pollution and early biological effects in children and to propose a model for estimating the global risk of early biological effects due to air pollutants and other factors in children. Methods: The study was carried out on 6-8-year-old children living in five Italian towns in two different seasons. Two biomarkers of early biological effects, primary DNA damage detected with the comet assay and frequency of micronuclei, were investigated in buccal cells of children. Details of children diseases, socio-economic status, exposures to other pollutants and life-style were collected using a questionnaire administered to children’s parents. Child exposure to urban air pollution was assessed by analysing PM0.5 samples collected in the school areas for PAHs and nitro-PAHs concentration, lung toxicity and in vitro genotoxicity on bacterial and human cells. Data on the chemical features of the urban air during the study period were obtained from the Regional Agency for Environmental Protection. The project created also the opportunity to approach the issue of air pollution with the children, trying to raise their awareness on air quality, its health effects and some healthy behaviors by means of an educational intervention in the schools. Results: 1315 children were recruited for the study and participate in the first sampling campaign in the five towns. The second campaign, on the same children, is still ongoing. The preliminary results of the tests on buccal mucosa cells of children will be presented during the conference as well as the preliminary data about the chemical composition and the toxicity and genotoxicity features of PM0.5 samples. The educational package was tested on 250 children of the primary school and showed to be very useful, improving children knowledge about air pollution and its effects and stimulating their interest. Conclusions: The associations between levels of air pollutants, air mutagenicity and biomarkers of early effects will be investigated. A tentative model to calculate the global absolute risk of having early biological effects for air pollution and other variables together will be proposed and may be useful to support policy-making and community interventions to protect children from possible health effects of air pollutants.

Keywords: air pollution exposure, biomarkers of early effects, children, public health policy

Procedia PDF Downloads 330
26439 Credit Risk Assessment Using Rule Based Classifiers: A Comparative Study

Authors: Salima Smiti, Ines Gasmi, Makram Soui

Abstract:

Credit risk is the most important issue for financial institutions. Its assessment becomes an important task used to predict defaulter customers and classify customers as good or bad payers. To this objective, numerous techniques have been applied for credit risk assessment. However, to our knowledge, several evaluation techniques are black-box models such as neural networks, SVM, etc. They generate applicants’ classes without any explanation. In this paper, we propose to assess credit risk using rules classification method. Our output is a set of rules which describe and explain the decision. To this end, we will compare seven classification algorithms (JRip, Decision Table, OneR, ZeroR, Fuzzy Rule, PART and Genetic programming (GP)) where the goal is to find the best rules satisfying many criteria: accuracy, sensitivity, and specificity. The obtained results confirm the efficiency of the GP algorithm for German and Australian datasets compared to other rule-based techniques to predict the credit risk.

Keywords: credit risk assessment, classification algorithms, data mining, rule extraction

Procedia PDF Downloads 181
26438 To Handle Data-Driven Software Development Projects Effectively

Authors: Shahnewaz Khan

Abstract:

Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.

Keywords: data, data-driven projects, data science, NLP, software project

Procedia PDF Downloads 83
26437 An Automated Approach to the Nozzle Configuration of Polycrystalline Diamond Compact Drill Bits for Effective Cuttings Removal

Authors: R. Suresh, Pavan Kumar Nimmagadda, Ming Zo Tan, Shane Hart, Sharp Ugwuocha

Abstract:

Polycrystalline diamond compact (PDC) drill bits are extensively used in the oil and gas industry as well as the mining industry. Industry engineers continually improve upon PDC drill bit designs and hydraulic conditions. Optimized injection nozzles play a key role in improving the drilling performance and efficiency of these ever changing PDC drill bits. In the first part of this study, computational fluid dynamics (CFD) modelling is performed to investigate the hydrodynamic characteristics of drilling fluid flow around the PDC drill bit. An Open-source CFD software – OpenFOAM simulates the flow around the drill bit, based on the field input data. A specifically developed console application integrates the entire CFD process including, domain extraction, meshing, and solving governing equations and post-processing. The results from the OpenFOAM solver are then compared with that of the ANSYS Fluent software. The data from both software programs agree. The second part of the paper describes the parametric study of the PDC drill bit nozzle to determine the effect of parameters such as number of nozzles, nozzle velocity, nozzle radial position and orientations on the flow field characteristics and bit washing patterns. After analyzing a series of nozzle configurations, the best configuration is identified and recommendations are made for modifying the PDC bit design.

Keywords: ANSYS Fluent, computational fluid dynamics, nozzle configuration, OpenFOAM, PDC dill bit

Procedia PDF Downloads 420
26436 Geological Structure Identification in Semilir Formation: An Correlated Geological and Geophysical (Very Low Frequency) Data for Zonation Disaster with Current Density Parameters and Geological Surface Information

Authors: E. M. Rifqi Wilda Pradana, Bagus Bayu Prabowo, Meida Riski Pujiyati, Efraim Maykhel Hagana Ginting, Virgiawan Arya Hangga Reksa

Abstract:

The VLF (Very Low Frequency) method is an electromagnetic method that uses low frequencies between 10-30 KHz which results in a fairly deep penetration. In this study, the VLF method was used for zonation of disaster-prone areas by identifying geological structures in the form of faults. Data acquisition was carried out in Trimulyo Region, Jetis District, Bantul Regency, Special Region of Yogyakarta, Indonesia with 8 measurement paths. This study uses wave transmitters from Japan and Australia to obtain Tilt and Elipt values that can be used to create RAE (Rapat Arus Ekuivalen or Current Density) sections that can be used to identify areas that are easily crossed by electric current. This section will indicate the existence of a geological structure in the form of faults in the study area which is characterized by a high RAE value. In data processing of VLF method, it is obtained Tilt vs Elliptical graph and Moving Average (MA) Tilt vs Moving Average (MA) Elipt graph of each path that shows a fluctuating pattern and does not show any intersection at all. Data processing uses Matlab software and obtained areas with low RAE values that are 0%-6% which shows medium with low conductivity and high resistivity and can be interpreted as sandstone, claystone, and tuff lithology which is part of the Semilir Formation. Whereas a high RAE value of 10% -16% which shows a medium with high conductivity and low resistivity can be interpreted as a fault zone filled with fluid. The existence of the fault zone is strengthened by the discovery of a normal fault on the surface with strike N550W and dip 630E at coordinates X= 433256 and Y= 9127722 so that the activities of residents in the zone such as housing, mining activities and other activities can be avoided to reduce the risk of natural disasters.

Keywords: current density, faults, very low frequency, zonation

Procedia PDF Downloads 175