Search results for: atomic data
24468 Evaluation of Compatibility between Produced and Injected Waters and Identification of the Causes of Well Plugging in a Southern Tunisian Oilfield
Authors: Sonia Barbouchi, Meriem Samcha
Abstract:
Scale deposition during water injection into aquifer of oil reservoirs is a serious problem experienced in the oil production industry. One of the primary causes of scale formation and injection well plugging is mixing two waters which are incompatible. Considered individually, the waters may be quite stable at system conditions and present no scale problems. However, once they are mixed, reactions between ions dissolved in the individual waters may form insoluble products. The purpose of this study is to identify the causes of well plugging in a southern Tunisian oilfield, where fresh water has been injected into the producing wells to counteract the salinity of the formation waters and inhibit the deposition of halite. X-ray diffraction (XRD) mineralogical analysis has been carried out on scale samples collected from the blocked well. Two samples collected from both formation water and injected water were analysed using inductively coupled plasma atomic emission spectroscopy, ion chromatography and other standard laboratory techniques. The results of complete waters analysis were the typical input parameters, to determine scaling tendency. Saturation indices values related to CaCO3, CaSO4, BaSO4 and SrSO4 scales were calculated for the water mixtures at different share, under various conditions of temperature, using a computerized scale prediction model. The compatibility study results showed that mixing the two waters tends to increase the probability of barite deposition. XRD analysis confirmed the compatibility study results, since it proved that the analysed deposits consisted predominantly of barite with minor galena. At the studied temperatures conditions, the tendency for barite scale is significantly increasing with the increase of fresh water share in the mixture. The future scale inhibition and removal strategies to be implemented in the concerned oilfield are being derived in a large part from the results of the present study.Keywords: compatibility study, produced water, scaling, water injection
Procedia PDF Downloads 16924467 A Web Service Based Sensor Data Management System
Authors: Rose A. Yemson, Ping Jiang, Oyedeji L. Inumoh
Abstract:
The deployment of wireless sensor network has rapidly increased, however with the increased capacity and diversity of sensors, and applications ranging from biological, environmental, military etc. generates tremendous volume of data’s where more attention is placed on the distributed sensing and little on how to manage, analyze, retrieve and understand the data generated. This makes it more quite difficult to process live sensor data, run concurrent control and update because sensor data are either heavyweight, complex, and slow. This work will focus on developing a web service platform for automatic detection of sensors, acquisition of sensor data, storage of sensor data into a database, processing of sensor data using reconfigurable software components. This work will also create a web service based sensor data management system to monitor physical movement of an individual wearing wireless network sensor technology (SunSPOT). The sensor will detect movement of that individual by sensing the acceleration in the direction of X, Y and Z axes accordingly and then send the sensed reading to a database that will be interfaced with an internet platform. The collected sensed data will determine the posture of the person such as standing, sitting and lying down. The system is designed using the Unified Modeling Language (UML) and implemented using Java, JavaScript, html and MySQL. This system allows real time monitoring an individual closely and obtain their physical activity details without been physically presence for in-situ measurement which enables you to work remotely instead of the time consuming check of an individual. These details can help in evaluating an individual’s physical activity and generate feedback on medication. It can also help in keeping track of any mandatory physical activities required to be done by the individuals. These evaluations and feedback can help in maintaining a better health status of the individual and providing improved health care.Keywords: HTML, java, javascript, MySQL, sunspot, UML, web-based, wireless network sensor
Procedia PDF Downloads 21224466 Unlocking Health Insights: Studying Data for Better Care
Authors: Valentina Marutyan
Abstract:
Healthcare data mining is a rapidly developing field at the intersection of technology and medicine that has the potential to change our understanding and approach to providing healthcare. Healthcare and data mining is the process of examining huge amounts of data to extract useful information that can be applied in order to improve patient care, treatment effectiveness, and overall healthcare delivery. This field looks for patterns, trends, and correlations in a variety of healthcare datasets, such as electronic health records (EHRs), medical imaging, patient demographics, and treatment histories. To accomplish this, it uses advanced analytical approaches. Predictive analysis using historical patient data is a major area of interest in healthcare data mining. This enables doctors to get involved early to prevent problems or improve results for patients. It also assists in early disease detection and customized treatment planning for every person. Doctors can customize a patient's care by looking at their medical history, genetic profile, current and previous therapies. In this way, treatments can be more effective and have fewer negative consequences. Moreover, helping patients, it improves the efficiency of hospitals. It helps them determine the number of beds or doctors they require in regard to the number of patients they expect. In this project are used models like logistic regression, random forests, and neural networks for predicting diseases and analyzing medical images. Patients were helped by algorithms such as k-means, and connections between treatments and patient responses were identified by association rule mining. Time series techniques helped in resource management by predicting patient admissions. These methods improved healthcare decision-making and personalized treatment. Also, healthcare data mining must deal with difficulties such as bad data quality, privacy challenges, managing large and complicated datasets, ensuring the reliability of models, managing biases, limited data sharing, and regulatory compliance. Finally, secret code of data mining in healthcare helps medical professionals and hospitals make better decisions, treat patients more efficiently, and work more efficiently. It ultimately comes down to using data to improve treatment, make better choices, and simplify hospital operations for all patients.Keywords: data mining, healthcare, big data, large amounts of data
Procedia PDF Downloads 7824465 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features
Authors: Bushra Zafar, Usman Qamar
Abstract:
Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection
Procedia PDF Downloads 31824464 Study of Electro-Chemical Properties of ZnO Nanowires for Various Application
Authors: Meera A. Albloushi, Adel B. Gougam
Abstract:
The development in the field of piezoelectrics has led to a renewed interest in ZnO nanowires (NWs) as a promising material in the nanogenerator devices category. It can be used as a power source for self-powered electronic systems with higher density, higher efficiency, longer lifetime, as well as lower cost of fabrication. Highly aligned ZnO nanowires seem to exhibit a higher performance compared with nonaligned ones. The purpose of this study was to develop ZnO nanowires and to investigate their electrical and chemical properties for various applications. They were grown on silicon (100) and glass substrates. We have used a low temperature and non-hazardous method: aqueous chemical growth (ACG). ZnO (non-doped) and AZO (Aluminum doped) seed layers were deposited using RF magnetron sputteringunder Argon pressure of 3 mTorr and deposition power of 180 W, the times of growth were selected to obtain thicknesses in the range of 30 to 125 nm. Some of the films were subsequently annealed. The substrates were immersed tilted in an equimolar solution composed of zinc nitrate and hexamine (HMTA) of 0.02 M and 0.05 M in the temperature range of 80 to 90 ᵒC for 1.5 to 2 hours. The X-ray diffractometer shows strong peaks at 2Ө = 34.2ᵒ of ZnO films which indicates that the films have a preferred c-axis wurtzite hexagonal (002) orientation. The surface morphology of the films is investigated by atomic force microscope (AFM) which proved the uniformity of the film since the roughness is within 5 nm range. The scanning electron microscopes(SEM) (Quanta FEG 250, Quanta 3D FEG, Nova NanoSEM 650) are used to characterize both ZnO film and NWs. SEM images show forest of ZnO NWs grown vertically and have a range of length up to 2000 nm and diameter of 20-300 nm. The SEM images prove that the role of the seed layer is to enhance the vertical alignment of ZnO NWs at the pH solution of 5-6. Also electrical and optical properties of the NWs are carried out using Electrical Force Microscopy (EFM). After growing the ZnO NWs, developing the nano-generator is the second step of this study in order to determine the energy conversion efficiency and the power output.Keywords: ZnO nanowires(NWs), aqueous chemical growth (ACG), piezoelectric NWs, harvesting enery
Procedia PDF Downloads 32324463 Sol-Gel Derived 58S Bioglass Substituted by Li and Mg: A Comparative Evaluation on in vitro Bioactivity, MC3T3 Proliferation and Antibacterial Efficiency
Authors: Amir Khaleghipour, Amirhossein Moghanian, Elhamalsadat Ghaffari
Abstract:
Modified bioactive glass has been considered as a promising multifunctional candidate in bone repair and regeneration due to its attractive properties. The present study mainly aims to evaluate how the individual substitution of lithium (L-BG) and magnesium (M-BG) for calcium can affect the in vitro bioactivity of sol-gel derived substituted 58S bioactive glass (BG); and to present one composition in both of the 60SiO₂–(36-x)CaO–4P₂O₅–(x)Li₂O and 60SiO₂–(36-x)CaO–4P₂O₅–(x)MgO quaternary systems (where x= 0, 5, 10 mol.%) with improved biocompatibility, enhanced alkaline phosphatase (ALP) activity, and the most efficient antibacterial activity against methicillin-resistant Staphylococcus aureus bacteria. To address these aims, and study the effect of CaO/Li₂O and CaO/MgO substitution up to 10 mol % in 58S-BGs, the samples were characterized by X-ray diffraction, Fourier transform infrared spectroscopy, inductively coupled plasma atomic emission spectrometry and scanning electron microscopy after immersion in simulated body fluid up to 14 days. Results indicated that substitution of either CaO/ Li₂O and CaO/ MgO had a retarding effect on in vitro hydroxyapatite (HA) formation due to the lower supersaturation degree for nucleation of HA compared with 58s-BG. Meanwhile, magnesium had a more pronounced effect. The 3-(4, 5dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) and alkaline phosphatase (ALP) assays showed that both substitutions of CaO/ Li₂O and CaO/ MgO up to 5mol % in 58s-BGs led to increased biocompatibility and stimulated proliferation of the pre-osteoblast MC3T3 cells with respect to the control. On the other hand, substitution of either Li or Mg for Ca in the 58s BG composition resulted in improved bactericidal efficiency against MRSA bacteria. Taken together, sample 58s-BG with 5 mol % CaO/Li₂O substitution (BG-5L) was considered as a multifunctional biomaterial in bone repair/regeneration with improved biocompatibility, enhanced ALP activity as well enhanced antibacterial efficiency against methicillin-resistant Staphylococcus aureus (MRSA) bacteria among all of the synthesized L-BGs and M-BGs.Keywords: alkaline, alkaline earth, bioactivity, biomedical applications, sol-gel processes
Procedia PDF Downloads 19024462 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education
Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue
Abstract:
In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education
Procedia PDF Downloads 10924461 Foundation of the Information Model for Connected-Cars
Authors: Hae-Won Seo, Yong-Gu Lee
Abstract:
Recent progress in the next generation of automobile technology is geared towards incorporating information technology into cars. Collectively called smart cars are bringing intelligence to cars that provides comfort, convenience and safety. A branch of smart cars is connected-car system. The key concept in connected-cars is the sharing of driving information among cars through decentralized manner enabling collective intelligence. This paper proposes a foundation of the information model that is necessary to define the driving information for smart-cars. Road conditions are modeled through a unique data structure that unambiguously represent the time variant traffics in the streets. Additionally, the modeled data structure is exemplified in a navigational scenario and usage using UML. Optimal driving route searching is also discussed using the proposed data structure in a dynamically changing road conditions.Keywords: connected-car, data modeling, route planning, navigation system
Procedia PDF Downloads 37524460 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data
Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad
Abstract:
Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction
Procedia PDF Downloads 34024459 Automated Multisensory Data Collection System for Continuous Monitoring of Refrigerating Appliances Recycling Plants
Authors: Georgii Emelianov, Mikhail Polikarpov, Fabian Hübner, Jochen Deuse, Jochen Schiemann
Abstract:
Recycling refrigerating appliances plays a major role in protecting the Earth's atmosphere from ozone depletion and emissions of greenhouse gases. The performance of refrigerator recycling plants in terms of material retention is the subject of strict environmental certifications and is reviewed periodically through specialized audits. The continuous collection of Refrigerator data required for the input-output analysis is still mostly manual, error-prone, and not digitalized. In this paper, we propose an automated data collection system for recycling plants in order to deduce expected material contents in individual end-of-life refrigerating appliances. The system utilizes laser scanner measurements and optical data to extract attributes of individual refrigerators by applying transfer learning with pre-trained vision models and optical character recognition. Based on Recognized features, the system automatically provides material categories and target values of contained material masses, especially foaming and cooling agents. The presented data collection system paves the way for continuous performance monitoring and efficient control of refrigerator recycling plants.Keywords: automation, data collection, performance monitoring, recycling, refrigerators
Procedia PDF Downloads 16524458 Sales Patterns Clustering Analysis on Seasonal Product Sales Data
Authors: Soojin Kim, Jiwon Yang, Sungzoon Cho
Abstract:
As a seasonal product is only in demand for a short time, inventory management is critical to profits. Both markdowns and stockouts decrease the return on perishable products; therefore, researchers have been interested in the distribution of seasonal products with the aim of maximizing profits. In this study, we propose a data-driven seasonal product sales pattern analysis method for individual retail outlets based on observed sales data clustering; the proposed method helps in determining distribution strategies.Keywords: clustering, distribution, sales pattern, seasonal product
Procedia PDF Downloads 59824457 Probability Sampling in Matched Case-Control Study in Drug Abuse
Authors: Surya R. Niraula, Devendra B Chhetry, Girish K. Singh, S. Nagesh, Frederick A. Connell
Abstract:
Background: Although random sampling is generally considered to be the gold standard for population-based research, the majority of drug abuse research is based on non-random sampling despite the well-known limitations of this kind of sampling. Method: We compared the statistical properties of two surveys of drug abuse in the same community: one using snowball sampling of drug users who then identified “friend controls” and the other using a random sample of non-drug users (controls) who then identified “friend cases.” Models to predict drug abuse based on risk factors were developed for each data set using conditional logistic regression. We compared the precision of each model using bootstrapping method and the predictive properties of each model using receiver operating characteristics (ROC) curves. Results: Analysis of 100 random bootstrap samples drawn from the snowball-sample data set showed a wide variation in the standard errors of the beta coefficients of the predictive model, none of which achieved statistical significance. One the other hand, bootstrap analysis of the random-sample data set showed less variation, and did not change the significance of the predictors at the 5% level when compared to the non-bootstrap analysis. Comparison of the area under the ROC curves using the model derived from the random-sample data set was similar when fitted to either data set (0.93, for random-sample data vs. 0.91 for snowball-sample data, p=0.35); however, when the model derived from the snowball-sample data set was fitted to each of the data sets, the areas under the curve were significantly different (0.98 vs. 0.83, p < .001). Conclusion: The proposed method of random sampling of controls appears to be superior from a statistical perspective to snowball sampling and may represent a viable alternative to snowball sampling.Keywords: drug abuse, matched case-control study, non-probability sampling, probability sampling
Procedia PDF Downloads 49324456 Bioinformatics High Performance Computation and Big Data
Authors: Javed Mohammed
Abstract:
Right now, bio-medical infrastructure lags well behind the curve. Our healthcare system is dispersed and disjointed; medical records are a bit of a mess; and we do not yet have the capacity to store and process the crazy amounts of data coming our way from widespread whole-genome sequencing. And then there are privacy issues. Despite these infrastructure challenges, some researchers are plunging into bio medical Big Data now, in hopes of extracting new and actionable knowledge. They are doing delving into molecular-level data to discover bio markers that help classify patients based on their response to existing treatments; and pushing their results out to physicians in novel and creative ways. Computer scientists and bio medical researchers are able to transform data into models and simulations that will enable scientists for the first time to gain a profound under-standing of the deepest biological functions. Solving biological problems may require High-Performance Computing HPC due either to the massive parallel computation required to solve a particular problem or to algorithmic complexity that may range from difficult to intractable. Many problems involve seemingly well-behaved polynomial time algorithms (such as all-to-all comparisons) but have massive computational requirements due to the large data sets that must be analyzed. High-throughput techniques for DNA sequencing and analysis of gene expression have led to exponential growth in the amount of publicly available genomic data. With the increased availability of genomic data traditional database approaches are no longer sufficient for rapidly performing life science queries involving the fusion of data types. Computing systems are now so powerful it is possible for researchers to consider modeling the folding of a protein or even the simulation of an entire human body. This research paper emphasizes the computational biology's growing need for high-performance computing and Big Data. It illustrates this article’s indispensability in meeting the scientific and engineering challenges of the twenty-first century, and how Protein Folding (the structure and function of proteins) and Phylogeny Reconstruction (evolutionary history of a group of genes) can use HPC that provides sufficient capability for evaluating or solving more limited but meaningful instances. This article also indicates solutions to optimization problems, and benefits Big Data and Computational Biology. The article illustrates the Current State-of-the-Art and Future-Generation Biology of HPC Computing with Big Data.Keywords: high performance, big data, parallel computation, molecular data, computational biology
Procedia PDF Downloads 36524455 Investigation of Correlation Between Radon Concentration and Metals in Produced Water from Oilfield Activities
Authors: Nacer Hamza
Abstract:
Naturally radiation exposure that present due to the cosmic ray or the naturel occurring radioactives materials(NORMs) that originated in the earth's crust and are present everywhere in the environment(1) , a significant concentration of NORMs reported in the produced water which comes out during the oil extraction process, so that the management of this produced water is a challenge for oil and gas companies which include either minimization of produced water which considered as the best way in the term of environment based in the fact that ,the lower water produced the lower cost in treating this water , recycling and reuse by reinjected produced water that fulfills some requirements to enhance oil recovery or disposal in the case that the produced water cannot be minimize or reuse. In the purpose of produced water management, the investigation of NORMs activity concentration present in it considered as the main step for more understanding of the radionuclide’s distribution. Many studies reported the present of NORMs in produced water and investigated the correlation between 〖Ra〗^226and the different metals present in produced water(2) including Cations and anions〖Na〗^+,〖Cl〗^-, 〖Fe〗^(2+), 〖Ca〗^(2+) . and lead, nickel, zinc, cadmium, and copper commonly exist as heavy metal in oil and gas field produced water(3). However, there are no real interesting to investigate the correlation between 〖Rn〗^222and the different metals exist in produced water. methods using, in first to measure the radon concentration activity in produced water samples is a RAD7 .RAD7 is a radiometer instrument based on the solid state detectors(4) which is a type of semi-conductor detector for alpha particles emitting from Rn and their progenies, in second the concentration of different metals presents in produced water measure using an atomic absorption spectrometry AAS. Then to investigate the correlation between the 〖Rn〗^222concentration activity and the metals concentration in produced water a statistical method is Pearson correlation analysis which based in the correlation coefficient obtained between the 〖Rn〗^222 and metals. Such investigation is important to more understanding how the radionuclides act in produced water based on this correlation with metals , in first due to the fact that 〖Rn〗^222decays through the sequence 〖Po〗^218, 〖Pb〗^214, 〖Bi〗^214, 〖Po〗^214, and〖Pb〗^210, those daughters are metals thus they will precipitate with metals present in produced water, secondly the short half-life of 〖Rn〗^222 (3.82 days) lead to faster precipitation of its progenies with metals in produced water.Keywords: norms, radon concentration, produced water, heavy metals
Procedia PDF Downloads 15024454 Evaluating the Effectiveness of Science Teacher Training Programme in National Colleges of Education: a Preliminary Study, Perceptions of Prospective Teachers
Authors: A. S. V Polgampala, F. Huang
Abstract:
This is an overview of what is entailed in an evaluation and issues to be aware of when class observation is being done. This study examined the effects of evaluating teaching practice of a 7-day ‘block teaching’ session in a pre -service science teacher training program at a reputed National College of Education in Sri Lanka. Effects were assessed in three areas: evaluation of the training process, evaluation of the training impact, and evaluation of the training procedure. Data for this study were collected by class observation of 18 teachers during 9th February to 16th of 2017. Prospective teachers of science teaching, the participants of the study were evaluated based on newly introduced format by the NIE. The data collected was analyzed qualitatively using the Miles and Huberman procedure for analyzing qualitative data: data reduction, data display and conclusion drawing/verification. It was observed that the trainees showed their confidence in teaching those competencies and skills. Teacher educators’ dissatisfaction has been a great impact on evaluation process.Keywords: evaluation, perceptions & perspectives, pre-service, science teachering
Procedia PDF Downloads 31524453 Detecting Venomous Files in IDS Using an Approach Based on Data Mining Algorithm
Authors: Sukhleen Kaur
Abstract:
In security groundwork, Intrusion Detection System (IDS) has become an important component. The IDS has received increasing attention in recent years. IDS is one of the effective way to detect different kinds of attacks and malicious codes in a network and help us to secure the network. Data mining techniques can be implemented to IDS, which analyses the large amount of data and gives better results. Data mining can contribute to improving intrusion detection by adding a level of focus to anomaly detection. So far the study has been carried out on finding the attacks but this paper detects the malicious files. Some intruders do not attack directly, but they hide some harmful code inside the files or may corrupt those file and attack the system. These files are detected according to some defined parameters which will form two lists of files as normal files and harmful files. After that data mining will be performed. In this paper a hybrid classifier has been used via Naive Bayes and Ripper classification methods. The results show how the uploaded file in the database will be tested against the parameters and then it is characterised as either normal or harmful file and after that the mining is performed. Moreover, when a user tries to mine on harmful file it will generate an exception that mining cannot be made on corrupted or harmful files.Keywords: data mining, association, classification, clustering, decision tree, intrusion detection system, misuse detection, anomaly detection, naive Bayes, ripper
Procedia PDF Downloads 41424452 Generalized Approach to Linear Data Transformation
Authors: Abhijith Asok
Abstract:
This paper presents a generalized approach for the simple linear data transformation, Y=bX, through an integration of multidimensional coordinate geometry, vector space theory and polygonal geometry. The scaling is performed by adding an additional ’Dummy Dimension’ to the n-dimensional data, which helps plot two dimensional component-wise straight lines on pairs of dimensions. The end result is a set of scaled extensions of observations in any of the 2n spatial divisions, where n is the total number of applicable dimensions/dataset variables, created by shifting the n-dimensional plane along the ’Dummy Axis’. The derived scaling factor was found to be dependent on the coordinates of the common point of origin for diverging straight lines and the plane of extension, chosen on and perpendicular to the ’Dummy Axis’, respectively. This result indicates the geometrical interpretation of a linear data transformation and hence, opportunities for a more informed choice of the factor ’b’, based on a better choice of these coordinate values. The paper follows on to identify the effect of this transformation on certain popular distance metrics, wherein for many, the distance metric retained the same scaling factor as that of the features.Keywords: data transformation, dummy dimension, linear transformation, scaling
Procedia PDF Downloads 29924451 Blockchain Platform Configuration for MyData Operator in Digital and Connected Health
Authors: Minna Pikkarainen, Yueqiang Xu
Abstract:
The integration of digital technology with existing healthcare processes has been painfully slow, a huge gap exists between the fields of strictly regulated official medical care and the quickly moving field of health and wellness technology. We claim that the promises of preventive healthcare can only be fulfilled when this gap is closed – health care and self-care becomes seamless continuum “correct information, in the correct hands, at the correct time allowing individuals and professionals to make better decisions” what we call connected health approach. Currently, the issues related to security, privacy, consumer consent and data sharing are hindering the implementation of this new paradigm of healthcare. This could be solved by following MyData principles stating that: Individuals should have the right and practical means to manage their data and privacy. MyData infrastructure enables decentralized management of personal data, improves interoperability, makes it easier for companies to comply with tightening data protection regulations, and allows individuals to change service providers without proprietary data lock-ins. This paper tackles today’s unprecedented challenges of enabling and stimulating multiple healthcare data providers and stakeholders to have more active participation in the digital health ecosystem. First, the paper systematically proposes the MyData approach for healthcare and preventive health data ecosystem. In this research, the work is targeted for health and wellness ecosystems. Each ecosystem consists of key actors, such as 1) individual (citizen or professional controlling/using the services) i.e. data subject, 2) services providing personal data (e.g. startups providing data collection apps or data collection devices), 3) health and wellness services utilizing aforementioned data and 4) services authorizing the access to this data under individual’s provided explicit consent. Second, the research extends the existing four archetypes of orchestrator-driven healthcare data business models for the healthcare industry and proposes the fifth type of healthcare data model, the MyData Blockchain Platform. This new architecture is developed by the Action Design Research approach, which is a prominent research methodology in the information system domain. The key novelty of the paper is to expand the health data value chain architecture and design from centralization and pseudo-decentralization to full decentralization, enabled by blockchain, thus the MyData blockchain platform. The study not only broadens the healthcare informatics literature but also contributes to the theoretical development of digital healthcare and blockchain research domains with a systemic approach.Keywords: blockchain, health data, platform, action design
Procedia PDF Downloads 10024450 High-Frequency Acoustic Microscopy Imaging of Pellet/Cladding Interface in Nuclear Fuel Rods
Authors: H. Saikouk, D. Laux, Emmanuel Le Clézio, B. Lacroix, K. Audic, R. Largenton, E. Federici, G. Despaux
Abstract:
Pressurized Water Reactor (PWR) fuel rods are made of ceramic pellets (e.g. UO2 or (U,Pu) O2) assembled in a zirconium cladding tube. By design, an initial gap exists between these two elements. During irradiation, they both undergo transformations leading progressively to the closure of this gap. A local and non destructive examination of the pellet/cladding interface could constitute a useful help to identify the zones where the two materials are in contact, particularly at high burnups when a strong chemical bonding occurs under nominal operating conditions in PWR fuel rods. The evolution of the pellet/cladding bonding during irradiation is also an area of interest. In this context, the Institute of Electronic and Systems (IES- UMR CNRS 5214), in collaboration with the Alternative Energies and Atomic Energy Commission (CEA), is developing a high frequency acoustic microscope adapted to the control and imaging of the pellet/cladding interface with high resolution. Because the geometrical, chemical and mechanical nature of the contact interface is neither axially nor radially homogeneous, 2D images of this interface need to be acquired via this ultrasonic system with a highly performing processing signal and by means of controlled displacement of the sample rod along both its axis and its circumference. Modeling the multi-layer system (water, cladding, fuel etc.) is necessary in this present study and aims to take into account all the parameters that have an influence on the resolution of the acquired images. The first prototype of this microscope and the first results of the visualization of the inner face of the cladding will be presented in a poster in order to highlight the potentials of the system, whose final objective is to be introduced in the existing bench MEGAFOX dedicated to the non-destructive examination of irradiated fuel rods at LECA-STAR facility in CEA-Cadarache.Keywords: high-frequency acoustic microscopy, multi-layer model, non-destructive testing, nuclear fuel rod, pellet/cladding interface, signal processing
Procedia PDF Downloads 19124449 Using Learning Apps in the Classroom
Authors: Janet C. Read
Abstract:
UClan set collaboration with Lingokids to assess the Lingokids learning app's impact on learning outcomes in classrooms in the UK for children with ages ranging from 3 to 5 years. Data gathered during the controlled study with 69 children includes attitudinal data, engagement, and learning scores. Data shows that children enjoyment while learning was higher among those children using the game-based app compared to those children using other traditional methods. It’s worth pointing out that engagement when using the learning app was significantly higher than other traditional methods among older children. According to existing literature, there is a direct correlation between engagement, motivation, and learning. Therefore, this study provides relevant data points to conclude that Lingokids learning app serves its purpose of encouraging learning through playful and interactive content. That being said, we believe that learning outcomes should be assessed with a wider range of methods in further studies. Likewise, it would be beneficial to assess the level of usability and playability of the app in order to evaluate the learning app from other angles.Keywords: learning app, learning outcomes, rapid test activity, Smileyometer, early childhood education, innovative pedagogy
Procedia PDF Downloads 7224448 An Investigation on the Pulse Electrodeposition of Ni-TiO2/TiO2 Multilayer Structures
Authors: S. Mohajeri
Abstract:
Electrocodeposition of Ni-TiO2 nanocomposite single layers and Ni-TiO2/TiO2 multilayers from Watts bath containing TiO2 sol was carried out on copper substrate. Pulse plating and pulse reverse plating techniques were applied to facilitate higher incorporations of TiO2 nanoparticles in Ni-TiO2 nanocomposite single layers, and the results revealed that by prolongation of the current-off durations and the anodic cycles, deposits containing 11.58 wt.% and 13.16 wt.% TiO2 were produced, respectively. Multilayer coatings which consisted of Ni-TiO2 and TiO2-rich layers were deposited by pulse potential deposition through limiting the nickel deposition by diffusion control mechanism. The TiO2-rich layers thickness and accordingly, the content of TiO2 reinforcement reached 104 nm and 18.47 wt.%, respectively in the optimum condition. The phase structure and surface morphology of the nanocomposite coatings were characterized by X-ray diffraction (XRD) and scanning electron microscopy (SEM). The cross sectional morphology and line scans of the layers were studied by field emission scanning electron microscopy (FESEM). It was confirmed that the preferred orientations and the crystallite sizes of nickel matrix were influenced by the deposition technique parameters, and higher contents of codeposited TiO2 nanoparticles refined the microstructure. The corrosion behavior of the coatings in 1M NaCl and 0.5M H2SO4 electrolytes were compared by means of potentiodynamic polarization and electrochemical impedance spectroscopy (EIS) techniques. Increase of corrosion resistance and the passivation tendency were favored by TiO2 incorporation, while the degree of passivation declined as embedded particles disturbed the continuity of passive layer. The role of TiO2 incorporation on the improvement of mechanical properties including hardness, elasticity, scratch resistance and friction coefficient was investigated by the means of atomic force microscopy (AFM). Hydrophilicity and wettability of the composite coatings were investigated under UV illumination, and the water contact angle of the multilayer was reduced to 7.23° after 1 hour of UV irradiation.Keywords: electrodeposition, hydrophilicity, multilayer, pulse-plating
Procedia PDF Downloads 25224447 Road Safety in the Great Britain: An Exploratory Data Analysis
Authors: Jatin Kumar Choudhary, Naren Rayala, Abbas Eslami Kiasari, Fahimeh Jafari
Abstract:
The Great Britain has one of the safest road networks in the world. However, the consequences of any death or serious injury are devastating for loved ones, as well as for those who help the severely injured. This paper aims to analyse the Great Britain's road safety situation and show the response measures for areas where the total damage caused by accidents can be significantly and quickly reduced. In this paper, we do an exploratory data analysis using STATS19 data. For the past 30 years, the UK has had a good record in reducing fatalities. The UK ranked third based on the number of road deaths per million inhabitants. There were around 165,000 accidents reported in the Great Britain in 2009 and it has been decreasing every year until 2019 which is under 120,000. The government continues to scale back road deaths empowering responsible road users by identifying and prosecuting the parameters that make the roads less safe.Keywords: road safety, data analysis, openstreetmap, feature expanding.
Procedia PDF Downloads 14224446 Intrusion Detection System Using Linear Discriminant Analysis
Authors: Zyad Elkhadir, Khalid Chougdali, Mohammed Benattou
Abstract:
Most of the existing intrusion detection systems works on quantitative network traffic data with many irrelevant and redundant features, which makes detection process more time’s consuming and inaccurate. A several feature extraction methods, such as linear discriminant analysis (LDA), have been proposed. However, LDA suffers from the small sample size (SSS) problem which occurs when the number of the training samples is small compared with the samples dimension. Hence, classical LDA cannot be applied directly for high dimensional data such as network traffic data. In this paper, we propose two solutions to solve SSS problem for LDA and apply them to a network IDS. The first method, reduce the original dimension data using principal component analysis (PCA) and then apply LDA. In the second solution, we propose to use the pseudo inverse to avoid singularity of within-class scatter matrix due to SSS problem. After that, the KNN algorithm is used for classification process. We have chosen two known datasets KDDcup99 and NSLKDD for testing the proposed approaches. Results showed that the classification accuracy of (PCA+LDA) method outperforms clearly the pseudo inverse LDA method when we have large training data.Keywords: LDA, Pseudoinverse, PCA, IDS, NSL-KDD, KDDcup99
Procedia PDF Downloads 22824445 Studies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise — in the Case of Critical Dataset Size —
Authors: Tetsuro Saeki, Yuichi Kato, Shoutarou Mizuno
Abstract:
STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to realworld data.Keywords: rule induction, decision table, missing data, noise
Procedia PDF Downloads 39624444 Development of an Atmospheric Radioxenon Detection System for Nuclear Explosion Monitoring
Authors: V. Thomas, O. Delaune, W. Hennig, S. Hoover
Abstract:
Measurement of radioactive isotopes of atmospheric xenon is used to detect, locate and identify any confined nuclear tests as part of the Comprehensive Nuclear Test-Ban Treaty (CTBT). In this context, the Alternative Energies and French Atomic Energy Commission (CEA) has developed a fixed device to continuously measure the concentration of these fission products, the SPALAX process. During its atmospheric transport, the radioactive xenon will undergo a significant dilution between the source point and the measurement station. Regarding the distance between fixed stations located all over the globe, the typical volume activities measured are near 1 mBq m⁻³. To avoid the constraints induced by atmospheric dilution, the development of a mobile detection system is in progress; this system will allow on-site measurements in order to confirm or infringe a suspicious measurement detected by a fixed station. Furthermore, this system will use beta/gamma coincidence measurement technique in order to drastically reduce environmental background (which masks such activities). The detector prototype consists of a gas cell surrounded by two large silicon wafers, coupled with two square NaI(Tl) detectors. The gas cell has a sample volume of 30 cm³ and the silicon wafers are 500 µm thick with an active surface area of 3600 mm². In order to minimize leakage current, each wafer has been segmented into four independent silicon pixels. This cell is sandwiched between two low background NaI(Tl) detectors (70x70x40 mm³ crystal). The expected Minimal Detectable Concentration (MDC) for each radio-xenon is in the order of 1-10 mBq m⁻³. Three 4-channels digital acquisition modules (Pixie-NET) are used to process all the signals. Time synchronization is ensured by a dedicated PTP-network, using the IEEE 1588 Precision Time Protocol. We would like to present this system from its simulation to the laboratory tests.Keywords: beta/gamma coincidence technique, low level measurement, radioxenon, silicon pixels
Procedia PDF Downloads 12624443 Machine Learning Strategies for Data Extraction from Unstructured Documents in Financial Services
Authors: Delphine Vendryes, Dushyanth Sekhar, Baojia Tong, Matthew Theisen, Chester Curme
Abstract:
Much of the data that inform the decisions of governments, corporations and individuals are harvested from unstructured documents. Data extraction is defined here as a process that turns non-machine-readable information into a machine-readable format that can be stored, for instance, in a database. In financial services, introducing more automation in data extraction pipelines is a major challenge. Information sought by financial data consumers is often buried within vast bodies of unstructured documents, which have historically required thorough manual extraction. Automated solutions provide faster access to non-machine-readable datasets, in a context where untimely information quickly becomes irrelevant. Data quality standards cannot be compromised, so automation requires high data integrity. This multifaceted task is broken down into smaller steps: ingestion, table parsing (detection and structure recognition), text analysis (entity detection and disambiguation), schema-based record extraction, user feedback incorporation. Selected intermediary steps are phrased as machine learning problems. Solutions leveraging cutting-edge approaches from the fields of computer vision (e.g. table detection) and natural language processing (e.g. entity detection and disambiguation) are proposed.Keywords: computer vision, entity recognition, finance, information retrieval, machine learning, natural language processing
Procedia PDF Downloads 11424442 Regression Approach for Optimal Purchase of Hosts Cluster in Fixed Fund for Hadoop Big Data Platform
Authors: Haitao Yang, Jianming Lv, Fei Xu, Xintong Wang, Yilin Huang, Lanting Xia, Xuewu Zhu
Abstract:
Given a fixed fund, purchasing fewer hosts of higher capability or inversely more of lower capability is a must-be-made trade-off in practices for building a Hadoop big data platform. An exploratory study is presented for a Housing Big Data Platform project (HBDP), where typical big data computing is with SQL queries of aggregate, join, and space-time condition selections executed upon massive data from more than 10 million housing units. In HBDP, an empirical formula was introduced to predict the performance of host clusters potential for the intended typical big data computing, and it was shaped via a regression approach. With this empirical formula, it is easy to suggest an optimal cluster configuration. The investigation was based on a typical Hadoop computing ecosystem HDFS+Hive+Spark. A proper metric was raised to measure the performance of Hadoop clusters in HBDP, which was tested and compared with its predicted counterpart, on executing three kinds of typical SQL query tasks. Tests were conducted with respect to factors of CPU benchmark, memory size, virtual host division, and the number of element physical host in cluster. The research has been applied to practical cluster procurement for housing big data computing.Keywords: Hadoop platform planning, optimal cluster scheme at fixed-fund, performance predicting formula, typical SQL query tasks
Procedia PDF Downloads 23224441 Model Predictive Controller for Pasteurization Process
Authors: Tesfaye Alamirew Dessie
Abstract:
Our study focuses on developing a Model Predictive Controller (MPC) and evaluating it against a traditional PID for a pasteurization process. Utilizing system identification from the experimental data, the dynamics of the pasteurization process were calculated. Using best fit with data validation, residual, and stability analysis, the quality of several model architectures was evaluated. The validation data fit the auto-regressive with exogenous input (ARX322) model of the pasteurization process by roughly 80.37 percent. The ARX322 model structure was used to create MPC and PID control techniques. After comparing controller performance based on settling time, overshoot percentage, and stability analysis, it was found that MPC controllers outperform PID for those parameters.Keywords: MPC, PID, ARX, pasteurization
Procedia PDF Downloads 16424440 Point Estimation for the Type II Generalized Logistic Distribution Based on Progressively Censored Data
Authors: Rana Rimawi, Ayman Baklizi
Abstract:
Skewed distributions are important models that are frequently used in applications. Generalized distributions form a class of skewed distributions and gain widespread use in applications because of their flexibility in data analysis. More specifically, the Generalized Logistic Distribution with its different types has received considerable attention recently. In this study, based on progressively type-II censored data, we will consider point estimation in type II Generalized Logistic Distribution (Type II GLD). We will develop several estimators for its unknown parameters, including maximum likelihood estimators (MLE), Bayes estimators and linear estimators (BLUE). The estimators will be compared using simulation based on the criteria of bias and Mean square error (MSE). An illustrative example of a real data set will be given.Keywords: point estimation, type II generalized logistic distribution, progressive censoring, maximum likelihood estimation
Procedia PDF Downloads 20024439 Omni: Data Science Platform for Evaluate Performance of a LoRaWAN Network
Authors: Emanuele A. Solagna, Ricardo S, Tozetto, Roberto dos S. Rabello
Abstract:
Nowadays, physical processes are becoming digitized by the evolution of communication, sensing and storage technologies which promote the development of smart cities. The evolution of this technology has generated multiple challenges related to the generation of big data and the active participation of electronic devices in society. Thus, devices can send information that is captured and processed over large areas, but there is no guarantee that all the obtained data amount will be effectively stored and correctly persisted. Because, depending on the technology which is used, there are parameters that has huge influence on the full delivery of information. This article aims to characterize the project, currently under development, of a platform that based on data science will perform a performance and effectiveness evaluation of an industrial network that implements LoRaWAN technology considering its main parameters configuration relating these parameters to the information loss.Keywords: Internet of Things, LoRa, LoRaWAN, smart cities
Procedia PDF Downloads 148