Search results for: data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25325

Search results for: data mining

23735 Reliable Consensus Problem for Multi-Agent Systems with Sampled-Data

Authors: S. H. Lee, M. J. Park, O. M. Kwon

Abstract:

In this paper, reliable consensus of multi-agent systems with sampled-data is investigated. By using a suitable Lyapunov-Krasovskii functional and some techniques such as Wirtinger Inequality, Schur Complement and Kronecker Product, the results of this systems are obtained by solving a set of Linear Matrix Inequalities(LMIs). One numerical example is included to show the effectiveness of the proposed criteria.

Keywords: multi-agent, linear matrix inequalities (LMIs), kronecker product, sampled-data, Lyapunov method

Procedia PDF Downloads 524
23734 Materialized View Effect on Query Performance

Authors: Yusuf Ziya Ayık, Ferhat Kahveci

Abstract:

Currently, database management systems have various tools such as backup and maintenance, and also provide statistical information such as resource usage and security. In terms of query performance, this paper covers query optimization, views, indexed tables, pre-computation materialized view, query performance analysis in which query plan alternatives can be created and the least costly one selected to optimize a query. Indexes and views can be created for related table columns. The literature review of this study showed that, in the course of time, despite the growing capabilities of the database management system, only database administrators are aware of the need for dealing with archival and transactional data types differently. These data may be constantly changing data used in everyday life, and also may be from the completed questionnaire whose data input was completed. For both types of data, the database uses its capabilities; but as shown in the findings section, instead of repeating similar heavy calculations which are carrying out same results with the same query over a survey results, using materialized view results can be in a more simple way. In this study, this performance difference was observed quantitatively considering the cost of the query.

Keywords: cost of query, database management systems, materialized view, query performance

Procedia PDF Downloads 275
23733 An AK-Chart for the Non-Normal Data

Authors: Chia-Hau Liu, Tai-Yue Wang

Abstract:

Traditional multivariate control charts assume that measurement from manufacturing processes follows a multivariate normal distribution. However, this assumption may not hold or may be difficult to verify because not all the measurement from manufacturing processes are normal distributed in practice. This study develops a new multivariate control chart for monitoring the processes with non-normal data. We propose a mechanism based on integrating the one-class classification method and the adaptive technique. The adaptive technique is used to improve the sensitivity to small shift on one-class classification in statistical process control. In addition, this design provides an easy way to allocate the value of type I error so it is easier to be implemented. Finally, the simulation study and the real data from industry are used to demonstrate the effectiveness of the propose control charts.

Keywords: multivariate control chart, statistical process control, one-class classification method, non-normal data

Procedia PDF Downloads 420
23732 Panel Application for Determining Impact of Real Exchange Rate and Security on Tourism Revenues: Countries with Middle and High Level Tourism Income

Authors: M. Koray Cetin, Mehmet Mert

Abstract:

The purpose of the study is to examine impacts on tourism revenues of the exchange rate and country overall security level. There are numerous studies that examine the bidirectional relation between macroeconomic factors and tourism revenues and tourism demand. Most of the studies support the existence of impact of tourism revenues on growth rate but not vice versa. Few studies examine the impact of factors like real exchange rate or purchasing power parity on the tourism revenues. In this context, firstly impact of real exchange rate on tourism revenues examination is aimed. Because exchange rate is one of the main determinants of international tourism services price in guests currency unit. Another determinant of tourism demand for a country is country’s overall security level. This issue can be handled in the context of the relationship between tourism revenues and overall security including turmoil, terrorism, border problem, political violence. In this study, factors are handled for several countries which have tourism revenues on a certain level. With this structure, it is a panel data, and it is evaluated with panel data analysis techniques. Panel data have at least two dimensions, and one of them is time dimensions. The panel data analysis techniques are applied to data gathered from Worldbank data web page. In this study, it is expected to find impacts of real exchange rate and security factors on tourism revenues for the countries that have noteworthy tourism revenues.

Keywords: exchange rate, panel data analysis, security, tourism revenues

Procedia PDF Downloads 346
23731 The Effect of General Data Protection Regulation on South Asian Data Protection Laws

Authors: Sumedha Ganjoo, Santosh Goswami

Abstract:

The rising reliance on technology places national security at the forefront of 21st-century issues. It complicates the efforts of emerging and developed countries to combat cyber threats and increases the inherent risk factors connected with technology. The inability to preserve data securely might have devastating repercussions on a massive scale. Consequently, it is vital to establish national, regional, and global data protection rules and regulations that penalise individuals who participate in immoral technology usage and exploit the inherent vulnerabilities of technology. This study paper seeks to analyse GDPR-inspired Bills in the South Asian Region and determine their suitability for the development of a worldwide data protection framework, considering that Asian countries are much more diversified than European ones. In light of this context, the objectives of this paper are to identify GDPR-inspired Bills in the South Asian Region, identify their similarities and differences, as well as the obstacles to developing a regional-level data protection mechanism, thereby satisfying the need to develop a global-level mechanism. Due to the qualitative character of this study, the researcher did a comprehensive literature review of prior research papers, journal articles, survey reports, and government publications on the aforementioned topics. Taking into consideration the survey results, the researcher conducted a critical analysis of the significant parameters highlighted in the literature study. Many nations in the South Asian area are in the process of revising their present data protection measures in accordance with GDPR, according to the primary results of this study. Consideration is given to the data protection laws of Thailand, Malaysia, China, and Japan. Significant parallels and differences in comparison to GDPR have been discussed in detail. The conclusion of the research analyses the development of various data protection legislation regimes in South Asia.

Keywords: data privacy, GDPR, Asia, data protection laws

Procedia PDF Downloads 81
23730 Longitudinal Analysis of Internet Speed Data in the Gulf Cooperation Council Region

Authors: Musab Isah

Abstract:

This paper presents a longitudinal analysis of Internet speed data in the Gulf Cooperation Council (GCC) region, focusing on the most populous cities of each of the six countries – Riyadh, Saudi Arabia; Dubai, UAE; Kuwait City, Kuwait; Doha, Qatar; Manama, Bahrain; and Muscat, Oman. The study utilizes data collected from the Measurement Lab (M-Lab) infrastructure over a five-year period from January 1, 2019, to December 31, 2023. The analysis includes downstream and upstream throughput data for the cities, covering significant events such as the launch of 5G networks in 2019, COVID-19-induced lockdowns in 2020 and 2021, and the subsequent recovery period and return to normalcy. The results showcase substantial increases in Internet speeds across the cities, highlighting improvements in both download and upload throughput over the years. All the GCC countries have achieved above-average Internet speeds that can conveniently support various online activities and applications with excellent user experience.

Keywords: internet data science, internet performance measurement, throughput analysis, internet speed, measurement lab, network diagnostic tool

Procedia PDF Downloads 57
23729 A Web Service Based Sensor Data Management System

Authors: Rose A. Yemson, Ping Jiang, Oyedeji L. Inumoh

Abstract:

The deployment of wireless sensor network has rapidly increased, however with the increased capacity and diversity of sensors, and applications ranging from biological, environmental, military etc. generates tremendous volume of data’s where more attention is placed on the distributed sensing and little on how to manage, analyze, retrieve and understand the data generated. This makes it more quite difficult to process live sensor data, run concurrent control and update because sensor data are either heavyweight, complex, and slow. This work will focus on developing a web service platform for automatic detection of sensors, acquisition of sensor data, storage of sensor data into a database, processing of sensor data using reconfigurable software components. This work will also create a web service based sensor data management system to monitor physical movement of an individual wearing wireless network sensor technology (SunSPOT). The sensor will detect movement of that individual by sensing the acceleration in the direction of X, Y and Z axes accordingly and then send the sensed reading to a database that will be interfaced with an internet platform. The collected sensed data will determine the posture of the person such as standing, sitting and lying down. The system is designed using the Unified Modeling Language (UML) and implemented using Java, JavaScript, html and MySQL. This system allows real time monitoring an individual closely and obtain their physical activity details without been physically presence for in-situ measurement which enables you to work remotely instead of the time consuming check of an individual. These details can help in evaluating an individual’s physical activity and generate feedback on medication. It can also help in keeping track of any mandatory physical activities required to be done by the individuals. These evaluations and feedback can help in maintaining a better health status of the individual and providing improved health care.

Keywords: HTML, java, javascript, MySQL, sunspot, UML, web-based, wireless network sensor

Procedia PDF Downloads 210
23728 Foundation of the Information Model for Connected-Cars

Authors: Hae-Won Seo, Yong-Gu Lee

Abstract:

Recent progress in the next generation of automobile technology is geared towards incorporating information technology into cars. Collectively called smart cars are bringing intelligence to cars that provides comfort, convenience and safety. A branch of smart cars is connected-car system. The key concept in connected-cars is the sharing of driving information among cars through decentralized manner enabling collective intelligence. This paper proposes a foundation of the information model that is necessary to define the driving information for smart-cars. Road conditions are modeled through a unique data structure that unambiguously represent the time variant traffics in the streets. Additionally, the modeled data structure is exemplified in a navigational scenario and usage using UML. Optimal driving route searching is also discussed using the proposed data structure in a dynamically changing road conditions.

Keywords: connected-car, data modeling, route planning, navigation system

Procedia PDF Downloads 371
23727 Automated Multisensory Data Collection System for Continuous Monitoring of Refrigerating Appliances Recycling Plants

Authors: Georgii Emelianov, Mikhail Polikarpov, Fabian Hübner, Jochen Deuse, Jochen Schiemann

Abstract:

Recycling refrigerating appliances plays a major role in protecting the Earth's atmosphere from ozone depletion and emissions of greenhouse gases. The performance of refrigerator recycling plants in terms of material retention is the subject of strict environmental certifications and is reviewed periodically through specialized audits. The continuous collection of Refrigerator data required for the input-output analysis is still mostly manual, error-prone, and not digitalized. In this paper, we propose an automated data collection system for recycling plants in order to deduce expected material contents in individual end-of-life refrigerating appliances. The system utilizes laser scanner measurements and optical data to extract attributes of individual refrigerators by applying transfer learning with pre-trained vision models and optical character recognition. Based on Recognized features, the system automatically provides material categories and target values of contained material masses, especially foaming and cooling agents. The presented data collection system paves the way for continuous performance monitoring and efficient control of refrigerator recycling plants.

Keywords: automation, data collection, performance monitoring, recycling, refrigerators

Procedia PDF Downloads 161
23726 Sales Patterns Clustering Analysis on Seasonal Product Sales Data

Authors: Soojin Kim, Jiwon Yang, Sungzoon Cho

Abstract:

As a seasonal product is only in demand for a short time, inventory management is critical to profits. Both markdowns and stockouts decrease the return on perishable products; therefore, researchers have been interested in the distribution of seasonal products with the aim of maximizing profits. In this study, we propose a data-driven seasonal product sales pattern analysis method for individual retail outlets based on observed sales data clustering; the proposed method helps in determining distribution strategies.

Keywords: clustering, distribution, sales pattern, seasonal product

Procedia PDF Downloads 592
23725 Probability Sampling in Matched Case-Control Study in Drug Abuse

Authors: Surya R. Niraula, Devendra B Chhetry, Girish K. Singh, S. Nagesh, Frederick A. Connell

Abstract:

Background: Although random sampling is generally considered to be the gold standard for population-based research, the majority of drug abuse research is based on non-random sampling despite the well-known limitations of this kind of sampling. Method: We compared the statistical properties of two surveys of drug abuse in the same community: one using snowball sampling of drug users who then identified “friend controls” and the other using a random sample of non-drug users (controls) who then identified “friend cases.” Models to predict drug abuse based on risk factors were developed for each data set using conditional logistic regression. We compared the precision of each model using bootstrapping method and the predictive properties of each model using receiver operating characteristics (ROC) curves. Results: Analysis of 100 random bootstrap samples drawn from the snowball-sample data set showed a wide variation in the standard errors of the beta coefficients of the predictive model, none of which achieved statistical significance. One the other hand, bootstrap analysis of the random-sample data set showed less variation, and did not change the significance of the predictors at the 5% level when compared to the non-bootstrap analysis. Comparison of the area under the ROC curves using the model derived from the random-sample data set was similar when fitted to either data set (0.93, for random-sample data vs. 0.91 for snowball-sample data, p=0.35); however, when the model derived from the snowball-sample data set was fitted to each of the data sets, the areas under the curve were significantly different (0.98 vs. 0.83, p < .001). Conclusion: The proposed method of random sampling of controls appears to be superior from a statistical perspective to snowball sampling and may represent a viable alternative to snowball sampling.

Keywords: drug abuse, matched case-control study, non-probability sampling, probability sampling

Procedia PDF Downloads 489
23724 Bioinformatics High Performance Computation and Big Data

Authors: Javed Mohammed

Abstract:

Right now, bio-medical infrastructure lags well behind the curve. Our healthcare system is dispersed and disjointed; medical records are a bit of a mess; and we do not yet have the capacity to store and process the crazy amounts of data coming our way from widespread whole-genome sequencing. And then there are privacy issues. Despite these infrastructure challenges, some researchers are plunging into bio medical Big Data now, in hopes of extracting new and actionable knowledge. They are doing delving into molecular-level data to discover bio markers that help classify patients based on their response to existing treatments; and pushing their results out to physicians in novel and creative ways. Computer scientists and bio medical researchers are able to transform data into models and simulations that will enable scientists for the first time to gain a profound under-standing of the deepest biological functions. Solving biological problems may require High-Performance Computing HPC due either to the massive parallel computation required to solve a particular problem or to algorithmic complexity that may range from difficult to intractable. Many problems involve seemingly well-behaved polynomial time algorithms (such as all-to-all comparisons) but have massive computational requirements due to the large data sets that must be analyzed. High-throughput techniques for DNA sequencing and analysis of gene expression have led to exponential growth in the amount of publicly available genomic data. With the increased availability of genomic data traditional database approaches are no longer sufficient for rapidly performing life science queries involving the fusion of data types. Computing systems are now so powerful it is possible for researchers to consider modeling the folding of a protein or even the simulation of an entire human body. This research paper emphasizes the computational biology's growing need for high-performance computing and Big Data. It illustrates this article’s indispensability in meeting the scientific and engineering challenges of the twenty-first century, and how Protein Folding (the structure and function of proteins) and Phylogeny Reconstruction (evolutionary history of a group of genes) can use HPC that provides sufficient capability for evaluating or solving more limited but meaningful instances. This article also indicates solutions to optimization problems, and benefits Big Data and Computational Biology. The article illustrates the Current State-of-the-Art and Future-Generation Biology of HPC Computing with Big Data.

Keywords: high performance, big data, parallel computation, molecular data, computational biology

Procedia PDF Downloads 360
23723 Evaluating the Effectiveness of Science Teacher Training Programme in National Colleges of Education: a Preliminary Study, Perceptions of Prospective Teachers

Authors: A. S. V Polgampala, F. Huang

Abstract:

This is an overview of what is entailed in an evaluation and issues to be aware of when class observation is being done. This study examined the effects of evaluating teaching practice of a 7-day ‘block teaching’ session in a pre -service science teacher training program at a reputed National College of Education in Sri Lanka. Effects were assessed in three areas: evaluation of the training process, evaluation of the training impact, and evaluation of the training procedure. Data for this study were collected by class observation of 18 teachers during 9th February to 16th of 2017. Prospective teachers of science teaching, the participants of the study were evaluated based on newly introduced format by the NIE. The data collected was analyzed qualitatively using the Miles and Huberman procedure for analyzing qualitative data: data reduction, data display and conclusion drawing/verification. It was observed that the trainees showed their confidence in teaching those competencies and skills. Teacher educators’ dissatisfaction has been a great impact on evaluation process.

Keywords: evaluation, perceptions & perspectives, pre-service, science teachering

Procedia PDF Downloads 312
23722 Generalized Approach to Linear Data Transformation

Authors: Abhijith Asok

Abstract:

This paper presents a generalized approach for the simple linear data transformation, Y=bX, through an integration of multidimensional coordinate geometry, vector space theory and polygonal geometry. The scaling is performed by adding an additional ’Dummy Dimension’ to the n-dimensional data, which helps plot two dimensional component-wise straight lines on pairs of dimensions. The end result is a set of scaled extensions of observations in any of the 2n spatial divisions, where n is the total number of applicable dimensions/dataset variables, created by shifting the n-dimensional plane along the ’Dummy Axis’. The derived scaling factor was found to be dependent on the coordinates of the common point of origin for diverging straight lines and the plane of extension, chosen on and perpendicular to the ’Dummy Axis’, respectively. This result indicates the geometrical interpretation of a linear data transformation and hence, opportunities for a more informed choice of the factor ’b’, based on a better choice of these coordinate values. The paper follows on to identify the effect of this transformation on certain popular distance metrics, wherein for many, the distance metric retained the same scaling factor as that of the features.

Keywords: data transformation, dummy dimension, linear transformation, scaling

Procedia PDF Downloads 296
23721 Blockchain Platform Configuration for MyData Operator in Digital and Connected Health

Authors: Minna Pikkarainen, Yueqiang Xu

Abstract:

The integration of digital technology with existing healthcare processes has been painfully slow, a huge gap exists between the fields of strictly regulated official medical care and the quickly moving field of health and wellness technology. We claim that the promises of preventive healthcare can only be fulfilled when this gap is closed – health care and self-care becomes seamless continuum “correct information, in the correct hands, at the correct time allowing individuals and professionals to make better decisions” what we call connected health approach. Currently, the issues related to security, privacy, consumer consent and data sharing are hindering the implementation of this new paradigm of healthcare. This could be solved by following MyData principles stating that: Individuals should have the right and practical means to manage their data and privacy. MyData infrastructure enables decentralized management of personal data, improves interoperability, makes it easier for companies to comply with tightening data protection regulations, and allows individuals to change service providers without proprietary data lock-ins. This paper tackles today’s unprecedented challenges of enabling and stimulating multiple healthcare data providers and stakeholders to have more active participation in the digital health ecosystem. First, the paper systematically proposes the MyData approach for healthcare and preventive health data ecosystem. In this research, the work is targeted for health and wellness ecosystems. Each ecosystem consists of key actors, such as 1) individual (citizen or professional controlling/using the services) i.e. data subject, 2) services providing personal data (e.g. startups providing data collection apps or data collection devices), 3) health and wellness services utilizing aforementioned data and 4) services authorizing the access to this data under individual’s provided explicit consent. Second, the research extends the existing four archetypes of orchestrator-driven healthcare data business models for the healthcare industry and proposes the fifth type of healthcare data model, the MyData Blockchain Platform. This new architecture is developed by the Action Design Research approach, which is a prominent research methodology in the information system domain. The key novelty of the paper is to expand the health data value chain architecture and design from centralization and pseudo-decentralization to full decentralization, enabled by blockchain, thus the MyData blockchain platform. The study not only broadens the healthcare informatics literature but also contributes to the theoretical development of digital healthcare and blockchain research domains with a systemic approach.

Keywords: blockchain, health data, platform, action design

Procedia PDF Downloads 97
23720 Product Features Extraction from Opinions According to Time

Authors: Kamal Amarouche, Houda Benbrahim, Ismail Kassou

Abstract:

Nowadays, e-commerce shopping websites have experienced noticeable growth. These websites have gained consumers’ trust. After purchasing a product, many consumers share comments where opinions are usually embedded about the given product. Research on the automatic management of opinions that gives suggestions to potential consumers and portrays an image of the product to manufactures has been growing recently. After launching the product in the market, the reviews generated around it do not usually contain helpful information or generic opinions about this product (e.g. telephone: great phone...); in the sense that the product is still in the launching phase in the market. Within time, the product becomes old. Therefore, consumers perceive the advantages/ disadvantages about each specific product feature. Therefore, they will generate comments that contain their sentiments about these features. In this paper, we present an unsupervised method to extract different product features hidden in the opinions which influence its purchase, and that combines Time Weighting (TW) which depends on the time opinions were expressed with Term Frequency-Inverse Document Frequency (TF-IDF). We conduct several experiments using two different datasets about cell phones and hotels. The results show the effectiveness of our automatic feature extraction, as well as its domain independent characteristic.

Keywords: opinion mining, product feature extraction, sentiment analysis, SentiWordNet

Procedia PDF Downloads 404
23719 Using Learning Apps in the Classroom

Authors: Janet C. Read

Abstract:

UClan set collaboration with Lingokids to assess the Lingokids learning app's impact on learning outcomes in classrooms in the UK for children with ages ranging from 3 to 5 years. Data gathered during the controlled study with 69 children includes attitudinal data, engagement, and learning scores. Data shows that children enjoyment while learning was higher among those children using the game-based app compared to those children using other traditional methods. It’s worth pointing out that engagement when using the learning app was significantly higher than other traditional methods among older children. According to existing literature, there is a direct correlation between engagement, motivation, and learning. Therefore, this study provides relevant data points to conclude that Lingokids learning app serves its purpose of encouraging learning through playful and interactive content. That being said, we believe that learning outcomes should be assessed with a wider range of methods in further studies. Likewise, it would be beneficial to assess the level of usability and playability of the app in order to evaluate the learning app from other angles.

Keywords: learning app, learning outcomes, rapid test activity, Smileyometer, early childhood education, innovative pedagogy

Procedia PDF Downloads 67
23718 Road Safety in the Great Britain: An Exploratory Data Analysis

Authors: Jatin Kumar Choudhary, Naren Rayala, Abbas Eslami Kiasari, Fahimeh Jafari

Abstract:

The Great Britain has one of the safest road networks in the world. However, the consequences of any death or serious injury are devastating for loved ones, as well as for those who help the severely injured. This paper aims to analyse the Great Britain's road safety situation and show the response measures for areas where the total damage caused by accidents can be significantly and quickly reduced. In this paper, we do an exploratory data analysis using STATS19 data. For the past 30 years, the UK has had a good record in reducing fatalities. The UK ranked third based on the number of road deaths per million inhabitants. There were around 165,000 accidents reported in the Great Britain in 2009 and it has been decreasing every year until 2019 which is under 120,000. The government continues to scale back road deaths empowering responsible road users by identifying and prosecuting the parameters that make the roads less safe.

Keywords: road safety, data analysis, openstreetmap, feature expanding.

Procedia PDF Downloads 135
23717 Intrusion Detection System Using Linear Discriminant Analysis

Authors: Zyad Elkhadir, Khalid Chougdali, Mohammed Benattou

Abstract:

Most of the existing intrusion detection systems works on quantitative network traffic data with many irrelevant and redundant features, which makes detection process more time’s consuming and inaccurate. A several feature extraction methods, such as linear discriminant analysis (LDA), have been proposed. However, LDA suffers from the small sample size (SSS) problem which occurs when the number of the training samples is small compared with the samples dimension. Hence, classical LDA cannot be applied directly for high dimensional data such as network traffic data. In this paper, we propose two solutions to solve SSS problem for LDA and apply them to a network IDS. The first method, reduce the original dimension data using principal component analysis (PCA) and then apply LDA. In the second solution, we propose to use the pseudo inverse to avoid singularity of within-class scatter matrix due to SSS problem. After that, the KNN algorithm is used for classification process. We have chosen two known datasets KDDcup99 and NSLKDD for testing the proposed approaches. Results showed that the classification accuracy of (PCA+LDA) method outperforms clearly the pseudo inverse LDA method when we have large training data.

Keywords: LDA, Pseudoinverse, PCA, IDS, NSL-KDD, KDDcup99

Procedia PDF Downloads 223
23716 Studies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise — in the Case of Critical Dataset Size —

Authors: Tetsuro Saeki, Yuichi Kato, Shoutarou Mizuno

Abstract:

STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to realworld data.

Keywords: rule induction, decision table, missing data, noise

Procedia PDF Downloads 392
23715 Machine Learning Strategies for Data Extraction from Unstructured Documents in Financial Services

Authors: Delphine Vendryes, Dushyanth Sekhar, Baojia Tong, Matthew Theisen, Chester Curme

Abstract:

Much of the data that inform the decisions of governments, corporations and individuals are harvested from unstructured documents. Data extraction is defined here as a process that turns non-machine-readable information into a machine-readable format that can be stored, for instance, in a database. In financial services, introducing more automation in data extraction pipelines is a major challenge. Information sought by financial data consumers is often buried within vast bodies of unstructured documents, which have historically required thorough manual extraction. Automated solutions provide faster access to non-machine-readable datasets, in a context where untimely information quickly becomes irrelevant. Data quality standards cannot be compromised, so automation requires high data integrity. This multifaceted task is broken down into smaller steps: ingestion, table parsing (detection and structure recognition), text analysis (entity detection and disambiguation), schema-based record extraction, user feedback incorporation. Selected intermediary steps are phrased as machine learning problems. Solutions leveraging cutting-edge approaches from the fields of computer vision (e.g. table detection) and natural language processing (e.g. entity detection and disambiguation) are proposed.

Keywords: computer vision, entity recognition, finance, information retrieval, machine learning, natural language processing

Procedia PDF Downloads 107
23714 Regression Approach for Optimal Purchase of Hosts Cluster in Fixed Fund for Hadoop Big Data Platform

Authors: Haitao Yang, Jianming Lv, Fei Xu, Xintong Wang, Yilin Huang, Lanting Xia, Xuewu Zhu

Abstract:

Given a fixed fund, purchasing fewer hosts of higher capability or inversely more of lower capability is a must-be-made trade-off in practices for building a Hadoop big data platform. An exploratory study is presented for a Housing Big Data Platform project (HBDP), where typical big data computing is with SQL queries of aggregate, join, and space-time condition selections executed upon massive data from more than 10 million housing units. In HBDP, an empirical formula was introduced to predict the performance of host clusters potential for the intended typical big data computing, and it was shaped via a regression approach. With this empirical formula, it is easy to suggest an optimal cluster configuration. The investigation was based on a typical Hadoop computing ecosystem HDFS+Hive+Spark. A proper metric was raised to measure the performance of Hadoop clusters in HBDP, which was tested and compared with its predicted counterpart, on executing three kinds of typical SQL query tasks. Tests were conducted with respect to factors of CPU benchmark, memory size, virtual host division, and the number of element physical host in cluster. The research has been applied to practical cluster procurement for housing big data computing.

Keywords: Hadoop platform planning, optimal cluster scheme at fixed-fund, performance predicting formula, typical SQL query tasks

Procedia PDF Downloads 230
23713 Model Predictive Controller for Pasteurization Process

Authors: Tesfaye Alamirew Dessie

Abstract:

Our study focuses on developing a Model Predictive Controller (MPC) and evaluating it against a traditional PID for a pasteurization process. Utilizing system identification from the experimental data, the dynamics of the pasteurization process were calculated. Using best fit with data validation, residual, and stability analysis, the quality of several model architectures was evaluated. The validation data fit the auto-regressive with exogenous input (ARX322) model of the pasteurization process by roughly 80.37 percent. The ARX322 model structure was used to create MPC and PID control techniques. After comparing controller performance based on settling time, overshoot percentage, and stability analysis, it was found that MPC controllers outperform PID for those parameters.

Keywords: MPC, PID, ARX, pasteurization

Procedia PDF Downloads 158
23712 Point Estimation for the Type II Generalized Logistic Distribution Based on Progressively Censored Data

Authors: Rana Rimawi, Ayman Baklizi

Abstract:

Skewed distributions are important models that are frequently used in applications. Generalized distributions form a class of skewed distributions and gain widespread use in applications because of their flexibility in data analysis. More specifically, the Generalized Logistic Distribution with its different types has received considerable attention recently. In this study, based on progressively type-II censored data, we will consider point estimation in type II Generalized Logistic Distribution (Type II GLD). We will develop several estimators for its unknown parameters, including maximum likelihood estimators (MLE), Bayes estimators and linear estimators (BLUE). The estimators will be compared using simulation based on the criteria of bias and Mean square error (MSE). An illustrative example of a real data set will be given.

Keywords: point estimation, type II generalized logistic distribution, progressive censoring, maximum likelihood estimation

Procedia PDF Downloads 194
23711 Omni: Data Science Platform for Evaluate Performance of a LoRaWAN Network

Authors: Emanuele A. Solagna, Ricardo S, Tozetto, Roberto dos S. Rabello

Abstract:

Nowadays, physical processes are becoming digitized by the evolution of communication, sensing and storage technologies which promote the development of smart cities. The evolution of this technology has generated multiple challenges related to the generation of big data and the active participation of electronic devices in society. Thus, devices can send information that is captured and processed over large areas, but there is no guarantee that all the obtained data amount will be effectively stored and correctly persisted. Because, depending on the technology which is used, there are parameters that has huge influence on the full delivery of information. This article aims to characterize the project, currently under development, of a platform that based on data science will perform a performance and effectiveness evaluation of an industrial network that implements LoRaWAN technology considering its main parameters configuration relating these parameters to the information loss.

Keywords: Internet of Things, LoRa, LoRaWAN, smart cities

Procedia PDF Downloads 143
23710 Cybervetting and Online Privacy in Job Recruitment – Perspectives on the Current and Future Legislative Framework Within the EU

Authors: Nicole Christiansen, Hanne Marie Motzfeldt

Abstract:

In recent years, more and more HR professionals have been using cyber-vetting in job recruitment in an effort to find the perfect match for the company. These practices are growing rapidly, accessing a vast amount of data from social networks, some of which is privileged and protected information. Thus, there is a risk that the right to privacy is becoming a duty to manage your private data. This paper investigates to which degree a job applicant's fundamental rights are protected adequately in current and future legislation in the EU. This paper argues that current data protection regulations and forthcoming regulations on the use of AI ensure sufficient protection. However, even though the regulation on paper protects employees within the EU, the recruitment sector may not pay sufficient attention to the regulation as it not specifically targeting this area. Therefore, the lack of specific labor and employment regulation is a concern that the social partners should attend to.

Keywords: AI, cyber vetting, data protection, job recruitment, online privacy

Procedia PDF Downloads 83
23709 Estimation of Reservoirs Fracture Network Properties Using an Artificial Intelligence Technique

Authors: Reda Abdel Azim, Tariq Shehab

Abstract:

The main objective of this study is to develop a subsurface fracture map of naturally fractured reservoirs by overcoming the limitations associated with different data sources in characterising fracture properties. Some of these limitations are overcome by employing a nested neuro-stochastic technique to establish inter-relationship between different data, as conventional well logs, borehole images (FMI), core description, seismic attributes, and etc. and then characterise fracture properties in terms of fracture density and fractal dimension for each data source. Fracture density is an important property of a system of fracture network as it is a measure of the cumulative area of all the fractures in a unit volume of a fracture network system and Fractal dimension is also used to characterize self-similar objects such as fractures. At the wellbore locations, fracture density and fractal dimension can only be estimated for limited sections where FMI data are available. Therefore, artificial intelligence technique is applied to approximate the quantities at locations along the wellbore, where the hard data is not available. It should be noted that Artificial intelligence techniques have proven their effectiveness in this domain of applications.

Keywords: naturally fractured reservoirs, artificial intelligence, fracture intensity, fractal dimension

Procedia PDF Downloads 248
23708 Governance, Risk Management, and Compliance Factors Influencing the Adoption of Cloud Computing in Australia

Authors: Tim Nedyalkov

Abstract:

A business decision to move to the cloud brings fundamental changes in how an organization develops and delivers its Information Technology solutions. The accelerated pace of digital transformation across businesses and government agencies increases the reliance on cloud-based services. They are collecting, managing, and retaining large amounts of data in cloud environments makes information security and data privacy protection essential. It becomes even more important to understand what key factors drive successful cloud adoption following the commencement of the Privacy Amendment Notifiable Data Breaches (NDB) Act 2017 in Australia as the regulatory changes impact many organizations and industries. This quantitative correlational research investigated the governance, risk management, and compliance factors contributing to cloud security success. The factors influence the adoption of cloud computing within an organizational context after the commencement of the NDB scheme. The results and findings demonstrated that corporate information security policies, data storage location, management understanding of data governance responsibilities, and regular compliance assessments are the factors influencing cloud computing adoption. The research has implications for organizations, future researchers, practitioners, policymakers, and cloud computing providers to meet the rapidly changing regulatory and compliance requirements.

Keywords: cloud compliance, cloud security, data governance, privacy protection

Procedia PDF Downloads 112
23707 An Evaluation of Edible Plants for Remediation of Contaminated Soil- Can Edible Plants Be Used to Remove Heavy Metals on Soil?

Authors: Celia Marilia Martins, Sonia I. V. Guilundo, Iris M. Victorino, Antonio O. Quilambo

Abstract:

In Mozambique rapid industrialization (mining, aluminium and cement activities) and urbanization processes has led to the incorporation of heavy metals on soil, thus degrading not only the quality of the environment, but also affecting plants, animals and human healthy. Several methods have been used to remediate contaminated soils, but most of them are costly and difficult to get optimum results. Currently, phytoremediation is an effective and affordable technological solution used to extract or remove inactive metals from contaminated soil. Phytoremediation is the use of plants to clean up a contamination from soils, sediments, and water. This technology is environmental friendly and potentially cost effective. The present investigation summarised the potential of edible vegetable to grow under the high level of heavy metals such as lead and zinc. The plants used in these studies include Tomatoes, lettuce and Soya beans. The studies have shown that edible plants can be grown under the high level of heavy metals on the soil. Further investigations are identifying mechanisms used by plants to ensure a safe and sustainable use for remediation of contaminated soils by heavy metals.

Keywords: contaminated soil, edible plants, heavy metals, phytoremediation

Procedia PDF Downloads 372
23706 Simulations to Predict Solar Energy Potential by ERA5 Application at North Africa

Authors: U. Ali Rahoma, Nabil Esawy, Fawzia Ibrahim Moursy, A. H. Hassan, Samy A. Khalil, Ashraf S. Khamees

Abstract:

The design of any solar energy conversion system requires the knowledge of solar radiation data obtained over a long period. Satellite data has been widely used to estimate solar energy where no ground observation of solar radiation is available, yet there are limitations on the temporal coverage of satellite data. Reanalysis is a “retrospective analysis” of the atmosphere parameters generated by assimilating observation data from various sources, including ground observation, satellites, ships, and aircraft observation with the output of NWP (Numerical Weather Prediction) models, to develop an exhaustive record of weather and climate parameters. The evaluation of the performance of reanalysis datasets (ERA-5) for North Africa against high-quality surface measured data was performed using statistical analysis. The estimation of global solar radiation (GSR) distribution over six different selected locations in North Africa during ten years from the period time 2011 to 2020. The root means square error (RMSE), mean bias error (MBE) and mean absolute error (MAE) of reanalysis data of solar radiation range from 0.079 to 0.222, 0.0145 to 0.198, and 0.055 to 0.178, respectively. The seasonal statistical analysis was performed to study seasonal variation of performance of datasets, which reveals the significant variation of errors in different seasons—the performance of the dataset changes by changing the temporal resolution of the data used for comparison. The monthly mean values of data show better performance, but the accuracy of data is compromised. The solar radiation data of ERA-5 is used for preliminary solar resource assessment and power estimation. The correlation coefficient (R2) varies from 0.93 to 99% for the different selected sites in North Africa in the present research. The goal of this research is to give a good representation for global solar radiation to help in solar energy application in all fields, and this can be done by using gridded data from European Centre for Medium-Range Weather Forecasts ECMWF and producing a new model to give a good result.

Keywords: solar energy, solar radiation, ERA-5, potential energy

Procedia PDF Downloads 207