Search results for: data acquisition
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24990

Search results for: data acquisition

23850 Simulation of a Cost Model Response Requests for Replication in Data Grid Environment

Authors: Kaddi Mohammed, A. Benatiallah, D. Benatiallah

Abstract:

Data grid is a technology that has full emergence of new challenges, such as the heterogeneity and availability of various resources and geographically distributed, fast data access, minimizing latency and fault tolerance. Researchers interested in this technology address the problems of the various systems related to the industry such as task scheduling, load balancing and replication. The latter is an effective solution to achieve good performance in terms of data access and grid resources and better availability of data cost. In a system with duplication, a coherence protocol is used to impose some degree of synchronization between the various copies and impose some order on updates. In this project, we present an approach for placing replicas to minimize the cost of response of requests to read or write, and we implement our model in a simulation environment. The placement techniques are based on a cost model which depends on several factors, such as bandwidth, data size and storage nodes.

Keywords: response time, query, consistency, bandwidth, storage capacity, CERN

Procedia PDF Downloads 255
23849 Comparison of Different Methods to Produce Fuzzy Tolerance Relations for Rainfall Data Classification in the Region of Central Greece

Authors: N. Samarinas, C. Evangelides, C. Vrekos

Abstract:

The aim of this paper is the comparison of three different methods, in order to produce fuzzy tolerance relations for rainfall data classification. More specifically, the three methods are correlation coefficient, cosine amplitude and max-min method. The data were obtained from seven rainfall stations in the region of central Greece and refers to 20-year time series of monthly rainfall height average. Three methods were used to express these data as a fuzzy relation. This specific fuzzy tolerance relation is reformed into an equivalence relation with max-min composition for all three methods. From the equivalence relation, the rainfall stations were categorized and classified according to the degree of confidence. The classification shows the similarities among the rainfall stations. Stations with high similarity can be utilized in water resource management scenarios interchangeably or to augment data from one to another. Due to the complexity of calculations, it is important to find out which of the methods is computationally simpler and needs fewer compositions in order to give reliable results.

Keywords: classification, fuzzy logic, tolerance relations, rainfall data

Procedia PDF Downloads 300
23848 Customer Satisfaction and Effective HRM Policies: Customer and Employee Satisfaction

Authors: S. Anastasiou, C. Nathanailides

Abstract:

The purpose of this study is to examine the possible link between employee and customer satisfaction. The service provided by employees, help to build a good relationship with customers and can help at increasing their loyalty. Published data for job satisfaction and indicators of customer services were gathered from relevant published works which included data from five different countries. The reviewed data indicate a significant correlation between indicators of customer and employee satisfaction in the Banking sector. There was a significant correlation between the two parameters (Pearson correlation R2=0.52 P<0.05) The reviewed data provide evidence that there is some practical evidence which links these two parameters.

Keywords: job satisfaction, job performance, customer’ service, banks, human resources management

Procedia PDF Downloads 306
23847 Evaluation of Australian Open Banking Regulation: Balancing Customer Data Privacy and Innovation

Authors: Suman Podder

Abstract:

As Australian ‘Open Banking’ allows customers to share their financial data with accredited Third-Party Providers (‘TPPs’), it is necessary to evaluate whether the regulators have achieved the balance between protecting customer data privacy and promoting data-related innovation. Recognising the need to increase customers’ influence on their own data, and the benefits of data-related innovation, the Australian Government introduced ‘Consumer Data Right’ (‘CDR’) to the banking sector through Open Banking regulation. Under Open Banking, TPPs can access customers’ banking data that allows the TPPs to tailor their products and services to meet customer needs at a more competitive price. This facilitated access and use of customer data will promote innovation by providing opportunities for new products and business models to emerge and grow. However, the success of Open Banking depends on the willingness of the customers to share their data, so the regulators have augmented the protection of data by introducing new privacy safeguards to instill confidence and trust in the system. The dilemma in policymaking is that, on the one hand, lenient data privacy laws will help the flow of information, but at the risk of individuals’ loss of privacy, on the other hand, stringent laws that adequately protect privacy may dissuade innovation. Using theoretical and doctrinal methods, this paper examines whether the privacy safeguards under Open Banking will add to the compliance burden of the participating financial institutions, resulting in the undesirable effect of stifling other policy objectives such as innovation. The contribution of this research is three-fold. In the emerging field of customer data sharing, this research is one of the few academic studies on the objectives and impact of Open Banking in the Australian context. Additionally, Open Banking is still in the early stages of implementation, so this research traces the evolution of Open Banking through policy debates regarding the desirability of customer data-sharing. Finally, the research focuses not only on the customers’ data privacy and juxtaposes it with another important objective of promoting innovation, but it also highlights the critical issues facing the data-sharing regime. This paper argues that while it is challenging to develop a regulatory framework for protecting data privacy without impeding innovation and jeopardising yet unknown opportunities, data privacy and innovation promote different aspects of customer welfare. This paper concludes that if a regulation is appropriately designed and implemented, the benefits of data-sharing will outweigh the cost of compliance with the CDR.

Keywords: consumer data right, innovation, open banking, privacy safeguards

Procedia PDF Downloads 128
23846 Generation of Automated Alarms for Plantwide Process Monitoring

Authors: Hyun-Woo Cho

Abstract:

Earlier detection of incipient abnormal operations in terms of plant-wide process management is quite necessary in order to improve product quality and process safety. And generating warning signals or alarms for operating personnel plays an important role in process automation and intelligent plant health monitoring. Various methodologies have been developed and utilized in this area such as expert systems, mathematical model-based approaches, multivariate statistical approaches, and so on. This work presents a nonlinear empirical monitoring methodology based on the real-time analysis of massive process data. Unfortunately, the big data includes measurement noises and unwanted variations unrelated to true process behavior. Thus the elimination of such unnecessary patterns of the data is executed in data processing step to enhance detection speed and accuracy. The performance of the methodology was demonstrated using simulated process data. The case study showed that the detection speed and performance was improved significantly irrespective of the size and the location of abnormal events.

Keywords: detection, monitoring, process data, noise

Procedia PDF Downloads 233
23845 Meanings and Concepts of Standardization in Systems Medicine

Authors: Imme Petersen, Wiebke Sick, Regine Kollek

Abstract:

In systems medicine, high-throughput technologies produce large amounts of data on different biological and pathological processes, including (disturbed) gene expressions, metabolic pathways and signaling. The large volume of data of different types, stored in separate databases and often located at different geographical sites have posed new challenges regarding data handling and processing. Tools based on bioinformatics have been developed to resolve the upcoming problems of systematizing, standardizing and integrating the various data. However, the heterogeneity of data gathered at different levels of biological complexity is still a major challenge in data analysis. To build multilayer disease modules, large and heterogeneous data of disease-related information (e.g., genotype, phenotype, environmental factors) are correlated. Therefore, a great deal of attention in systems medicine has been put on data standardization, primarily to retrieve and combine large, heterogeneous datasets into standardized and incorporated forms and structures. However, this data-centred concept of standardization in systems medicine is contrary to the debate in science and technology studies (STS) on standardization that rather emphasizes the dynamics, contexts and negotiations of standard operating procedures. Based on empirical work on research consortia that explore the molecular profile of diseases to establish systems medical approaches in the clinic in Germany, we trace how standardized data are processed and shaped by bioinformatics tools, how scientists using such data in research perceive such standard operating procedures and which consequences for knowledge production (e.g. modeling) arise from it. Hence, different concepts and meanings of standardization are explored to get a deeper insight into standard operating procedures not only in systems medicine, but also beyond.

Keywords: data, science and technology studies (STS), standardization, systems medicine

Procedia PDF Downloads 324
23844 Big Data in Construction Project Management: The Colombian Northeast Case

Authors: Sergio Zabala-Vargas, Miguel Jiménez-Barrera, Luz VArgas-Sánchez

Abstract:

In recent years, information related to project management in organizations has been increasing exponentially. Performance data, management statistics, indicator results have forced the collection, analysis, traceability, and dissemination of project managers to be essential. In this sense, there are current trends to facilitate efficient decision-making in emerging technology projects, such as: Machine Learning, Data Analytics, Data Mining, and Big Data. The latter is the most interesting in this project. This research is part of the thematic line Construction methods and project management. Many authors present the relevance that the use of emerging technologies, such as Big Data, has taken in recent years in project management in the construction sector. The main focus is the optimization of time, scope, budget, and in general mitigating risks. This research was developed in the northeastern region of Colombia-South America. The first phase was aimed at diagnosing the use of emerging technologies (Big-Data) in the construction sector. In Colombia, the construction sector represents more than 50% of the productive system, and more than 2 million people participate in this economic segment. The quantitative approach was used. A survey was applied to a sample of 91 companies in the construction sector. Preliminary results indicate that the use of Big Data and other emerging technologies is very low and also that there is interest in modernizing project management. There is evidence of a correlation between the interest in using new data management technologies and the incorporation of Building Information Modeling BIM. The next phase of the research will allow the generation of guidelines and strategies for the incorporation of technological tools in the construction sector in Colombia.

Keywords: big data, building information modeling, tecnology, project manamegent

Procedia PDF Downloads 114
23843 Analysis of the Feasibility of Using a Solar Spiral Type Water Heater for Swimming Pool Application in Physiotherapy and Sports Centers

Authors: G. B. M. Carvalho, V. A. C. Vale, E. T. L. Cöuras Ford

Abstract:

A heated pool makes it possible to use it during all hours of the day and in the seasons, especially in physiotherapies and sports centers. However, the cost of installation, operation and maintenance often makes it difficult to deploy. In addition, the current global policy for the use of natural resources from energy sources contradicts the most common means of heating swimming pools, such as the use of gas (Natural Gas and Liquefied Petroleum Gas), the use of firewood or oil and the use of electricity (heat pumps and electrical resistances). In this sense, this work focuses on the use of solar water heaters to be used in swimming pools of physiotherapy centers, in order to analyze their viability for this purpose in view of the costs linked to the medium and/or long term heating. For this, materials of low cost, low weight, easy commercial acquisition were used besides easy manufacture. Parameters such as flow, temperature distribution, efficiency and technical-economic feasibility were evaluated.

Keywords: heating, water, pool, solar energy, solar collectors, temperature, efficiency

Procedia PDF Downloads 155
23842 Minimum Data of a Speech Signal as Special Indicators of Identification in Phonoscopy

Authors: Nazaket Gazieva

Abstract:

Voice biometric data associated with physiological, psychological and other factors are widely used in forensic phonoscopy. There are various methods for identifying and verifying a person by voice. This article explores the minimum speech signal data as individual parameters of a speech signal. Monozygotic twins are believed to be genetically identical. Using the minimum data of the speech signal, we came to the conclusion that the voice imprint of monozygotic twins is individual. According to the conclusion of the experiment, we can conclude that the minimum indicators of the speech signal are more stable and reliable for phonoscopic examinations.

Keywords: phonogram, speech signal, temporal characteristics, fundamental frequency, biometric fingerprints

Procedia PDF Downloads 123
23841 A Non-parametric Clustering Approach for Multivariate Geostatistical Data

Authors: Francky Fouedjio

Abstract:

Multivariate geostatistical data have become omnipresent in the geosciences and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are different from each other, in some sense. Spatially contiguous clusters can significantly improve the interpretation that turns the resulting clusters into meaningful geographical subregions. In this paper, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the spatial dependence structure of data. It integrates existing methods to find the optimal cluster number and to evaluate the contribution of variables to the clustering. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is assessed using bivariate synthetic dataset and multivariate geochemical dataset. The proposed clustering method gives satisfactory results compared to other similar geostatistical clustering methods.

Keywords: clustering, geostatistics, multivariate data, non-parametric

Procedia PDF Downloads 466
23840 Big Data in Telecom Industry: Effective Predictive Techniques on Call Detail Records

Authors: Sara ElElimy, Samir Moustafa

Abstract:

Mobile network operators start to face many challenges in the digital era, especially with high demands from customers. Since mobile network operators are considered a source of big data, traditional techniques are not effective with new era of big data, Internet of things (IoT) and 5G; as a result, handling effectively different big datasets becomes a vital task for operators with the continuous growth of data and moving from long term evolution (LTE) to 5G. So, there is an urgent need for effective Big data analytics to predict future demands, traffic, and network performance to full fill the requirements of the fifth generation of mobile network technology. In this paper, we introduce data science techniques using machine learning and deep learning algorithms: the autoregressive integrated moving average (ARIMA), Bayesian-based curve fitting, and recurrent neural network (RNN) are employed for a data-driven application to mobile network operators. The main framework included in models are identification parameters of each model, estimation, prediction, and final data-driven application of this prediction from business and network performance applications. These models are applied to Telecom Italia Big Data challenge call detail records (CDRs) datasets. The performance of these models is found out using a specific well-known evaluation criteria shows that ARIMA (machine learning-based model) is more accurate as a predictive model in such a dataset than the RNN (deep learning model).

Keywords: big data analytics, machine learning, CDRs, 5G

Procedia PDF Downloads 124
23839 A Data Mining Approach for Analysing and Predicting the Bank's Asset Liability Management Based on Basel III Norms

Authors: Nidhin Dani Abraham, T. K. Sri Shilpa

Abstract:

Asset liability management is an important aspect in banking business. Moreover, the today’s banking is based on BASEL III which strictly regulates on the counterparty default. This paper focuses on prediction and analysis of counter party default risk, which is a type of risk occurs when the customers fail to repay the amount back to the lender (bank or any financial institutions). This paper proposes an approach to reduce the counterparty risk occurring in the financial institutions using an appropriate data mining technique and thus predicts the occurrence of NPA. It also helps in asset building and restructuring quality. Liability management is very important to carry out banking business. To know and analyze the depth of liability of bank, a suitable technique is required. For that a data mining technique is being used to predict the dormant behaviour of various deposit bank customers. Various models are implemented and the results are analyzed of saving bank deposit customers. All these data are cleaned using data cleansing approach from the bank data warehouse.

Keywords: data mining, asset liability management, BASEL III, banking

Procedia PDF Downloads 535
23838 Language Developmental Trends of Mandarin-Speaking Preschoolers in Beijing

Authors: Nga Yui Tong

Abstract:

Mandarin, the official language of China, is based on the Beijing dialect and is spoken by more than one billion people from all over the world. To investigate the trends of Mandarin acquisition, 192 preschoolers are recruited by stratified random sampling. They are from 4 different districts in Beijing, 2 schools in each district, with 4 age groups, both genders, and 3 children in each stratum. The children are paired up to conduct semi-structured free play for 30 minutes. Their language output is videotaped, transcribed, and coded for the calculation of Mean Length of Utterance (MLU). Two-way ANOVA showed that the variation of MLU is significantly contributed by age, which is coherent to previous findings of other languages. This first large-scale study to investigate the developmental trend of Mandarin in young children in Beijing provides empirical evidence to the development of standards and curriculum planning for early Mandarin education. Interestingly, the gender effect in the study is insignificant, with boys showing a slightly higher MLU than girls across all age groups and settings, except the 4.5 years same-gender dyads. The societal factors in the Chinese context on parenting and gender bias are worth looking into.

Keywords: Beijing, language development, Mandarin, preschoolers

Procedia PDF Downloads 106
23837 A Dynamic Ensemble Learning Approach for Online Anomaly Detection in Alibaba Datacenters

Authors: Wanyi Zhu, Xia Ming, Huafeng Wang, Junda Chen, Lu Liu, Jiangwei Jiang, Guohua Liu

Abstract:

Anomaly detection is a first and imperative step needed to respond to unexpected problems and to assure high performance and security in large data center management. This paper presents an online anomaly detection system through an innovative approach of ensemble machine learning and adaptive differentiation algorithms, and applies them to performance data collected from a continuous monitoring system for multi-tier web applications running in Alibaba data centers. We evaluate the effectiveness and efficiency of this algorithm with production traffic data and compare with the traditional anomaly detection approaches such as a static threshold and other deviation-based detection techniques. The experiment results show that our algorithm correctly identifies the unexpected performance variances of any running application, with an acceptable false positive rate. This proposed approach has already been deployed in real-time production environments to enhance the efficiency and stability in daily data center operations.

Keywords: Alibaba data centers, anomaly detection, big data computation, dynamic ensemble learning

Procedia PDF Downloads 181
23836 Unsupervised Text Mining Approach to Early Warning System

Authors: Ichihan Tai, Bill Olson, Paul Blessner

Abstract:

Traditional early warning systems that alarm against crisis are generally based on structured or numerical data; therefore, a system that can make predictions based on unstructured textual data, an uncorrelated data source, is a great complement to the traditional early warning systems. The Chicago Board Options Exchange (CBOE) Volatility Index (VIX), commonly referred to as the fear index, measures the cost of insurance against market crash, and spikes in the event of crisis. In this study, news data is consumed for prediction of whether there will be a market-wide crisis by predicting the movement of the fear index, and the historical references to similar events are presented in an unsupervised manner. Topic modeling-based prediction and representation are made based on daily news data between 1990 and 2015 from The Wall Street Journal against VIX index data from CBOE.

Keywords: early warning system, knowledge management, market prediction, topic modeling.

Procedia PDF Downloads 322
23835 The Role of Synthetic Data in Aerial Object Detection

Authors: Ava Dodd, Jonathan Adams

Abstract:

The purpose of this study is to explore the characteristics of developing a machine learning application using synthetic data. The study is structured to develop the application for the purpose of deploying the computer vision model. The findings discuss the realities of attempting to develop a computer vision model for practical purpose, and detail the processes, tools, and techniques that were used to meet accuracy requirements. The research reveals that synthetic data represents another variable that can be adjusted to improve the performance of a computer vision model. Further, a suite of tools and tuning recommendations are provided.

Keywords: computer vision, machine learning, synthetic data, YOLOv4

Procedia PDF Downloads 208
23834 Comparative Analysis of Data Gathering Protocols with Multiple Mobile Elements for Wireless Sensor Network

Authors: Bhat Geetalaxmi Jairam, D. V. Ashoka

Abstract:

Wireless Sensor Networks are used in many applications to collect sensed data from different sources. Sensed data has to be delivered through sensors wireless interface using multi-hop communication towards the sink. The data collection in wireless sensor networks consumes energy. Energy consumption is the major constraints in WSN .Reducing the energy consumption while increasing the amount of generated data is a great challenge. In this paper, we have implemented two data gathering protocols with multiple mobile sinks/elements to collect data from sensor nodes. First, is Energy-Efficient Data Gathering with Tour Length-Constrained Mobile Elements in Wireless Sensor Networks (EEDG), in which mobile sinks uses vehicle routing protocol to collect data. Second is An Intelligent Agent-based Routing Structure for Mobile Sinks in WSNs (IAR), in which mobile sinks uses prim’s algorithm to collect data. Authors have implemented concepts which are common to both protocols like deployment of mobile sinks, generating visiting schedule, collecting data from the cluster member. Authors have compared the performance of both protocols by taking statistics based on performance parameters like Delay, Packet Drop, Packet Delivery Ratio, Energy Available, Control Overhead. Authors have concluded this paper by proving EEDG is more efficient than IAR protocol but with few limitations which include unaddressed issues likes Redundancy removal, Idle listening, Mobile Sink’s pause/wait state at the node. In future work, we plan to concentrate more on these limitations to avail a new energy efficient protocol which will help in improving the life time of the WSN.

Keywords: aggregation, consumption, data gathering, efficiency

Procedia PDF Downloads 476
23833 Status and Results from EXO-200

Authors: Ryan Maclellan

Abstract:

EXO-200 has provided one of the most sensitive searches for neutrinoless double-beta decay utilizing 175 kg of enriched liquid xenon in an ultra-low background time projection chamber. This detector has demonstrated excellent energy resolution and background rejection capabilities. Using the first two years of data, EXO-200 has set a limit of 1.1x10^25 years at 90% C.L. on the neutrinoless double-beta decay half-life of Xe-136. The experiment has experienced a brief hiatus in data taking during a temporary shutdown of its host facility: the Waste Isolation Pilot Plant. EXO-200 expects to resume data taking in earnest this fall with upgraded detector electronics. Results from the analysis of EXO-200 data and an update on the current status of EXO-200 will be presented.

Keywords: double-beta, Majorana, neutrino, neutrinoless

Procedia PDF Downloads 398
23832 Using the Timepix Detector at CERN Accelerator Facilities

Authors: Andrii Natochii

Abstract:

The UA9 collaboration in the last two years has installed two different types of detectors to investigate the channeling effect in the bent silicon crystals with high-energy particles beam on the CERN accelerator facilities: Cherenkov detector CpFM and silicon pixel detector Timepix. In the current work, we describe the main performances of the Timepix detector operation at the SPS and H8 extracted beamline at CERN. We are presenting some detector calibration results and tuning. Our research topics also cover a cluster analysis algorithm for the particle hits reconstruction. We describe the optimal acquisition setup for the Timepix device and the edges of its functionality for the high energy and flux beam monitoring. The measurements of the crystal parameters are very important for the future bent crystal applications and needs a track reconstruction apparatus. Thus, it was decided to construct a short range (1.2 m long) particle telescope based on the Timepix sensors and test it at H8 SPS extraction beamline. The obtained results will be shown as well.

Keywords: beam monitoring, channeling, particle tracking, Timepix detector

Procedia PDF Downloads 167
23831 Remaining Useful Life (RUL) Assessment Using Progressive Bearing Degradation Data and ANN Model

Authors: Amit R. Bhende, G. K. Awari

Abstract:

Remaining useful life (RUL) prediction is one of key technologies to realize prognostics and health management that is being widely applied in many industrial systems to ensure high system availability over their life cycles. The present work proposes a data-driven method of RUL prediction based on multiple health state assessment for rolling element bearings. Bearing degradation data at three different conditions from run to failure is used. A RUL prediction model is separately built in each condition. Feed forward back propagation neural network models are developed for prediction modeling.

Keywords: bearing degradation data, remaining useful life (RUL), back propagation, prognosis

Procedia PDF Downloads 417
23830 Spatio-Temporal Data Mining with Association Rules for Lake Van

Authors: Tolga Aydin, M. Fatih Alaeddinoğlu

Abstract:

People, throughout the history, have made estimates and inferences about the future by using their past experiences. Developing information technologies and the improvements in the database management systems make it possible to extract useful information from knowledge in hand for the strategic decisions. Therefore, different methods have been developed. Data mining by association rules learning is one of such methods. Apriori algorithm, one of the well-known association rules learning algorithms, is not commonly used in spatio-temporal data sets. However, it is possible to embed time and space features into the data sets and make Apriori algorithm a suitable data mining technique for learning spatio-temporal association rules. Lake Van, the largest lake of Turkey, is a closed basin. This feature causes the volume of the lake to increase or decrease as a result of change in water amount it holds. In this study, evaporation, humidity, lake altitude, amount of rainfall and temperature parameters recorded in Lake Van region throughout the years are used by the Apriori algorithm and a spatio-temporal data mining application is developed to identify overflows and newly-formed soil regions (underflows) occurring in the coastal parts of Lake Van. Identifying possible reasons of overflows and underflows may be used to alert the experts to take precautions and make the necessary investments.

Keywords: apriori algorithm, association rules, data mining, spatio-temporal data

Procedia PDF Downloads 357
23829 Building Data Infrastructure for Public Use and Informed Decision Making in Developing Countries-Nigeria

Authors: Busayo Fashoto, Abdulhakeem Shaibu, Justice Agbadu, Samuel Aiyeoribe

Abstract:

Data has gone from just rows and columns to being an infrastructure itself. The traditional medium of data infrastructure has been managed by individuals in different industries and saved on personal work tools; one of such is the laptop. This hinders data sharing and Sustainable Development Goal (SDG) 9 for infrastructure sustainability across all countries and regions. However, there has been a constant demand for data across different agencies and ministries by investors and decision-makers. The rapid development and adoption of open-source technologies that promote the collection and processing of data in new ways and in ever-increasing volumes are creating new data infrastructure in sectors such as lands and health, among others. This paper examines the process of developing data infrastructure and, by extension, a data portal to provide baseline data for sustainable development and decision making in Nigeria. This paper employs the FAIR principle (Findable, Accessible, Interoperable, and Reusable) of data management using open-source technology tools to develop data portals for public use. eHealth Africa, an organization that uses technology to drive public health interventions in Nigeria, developed a data portal which is a typical data infrastructure that serves as a repository for various datasets on administrative boundaries, points of interest, settlements, social infrastructure, amenities, and others. This portal makes it possible for users to have access to datasets of interest at any point in time at no cost. A skeletal infrastructure of this data portal encompasses the use of open-source technology such as Postgres database, GeoServer, GeoNetwork, and CKan. These tools made the infrastructure sustainable, thus promoting the achievement of SDG 9 (Industries, Innovation, and Infrastructure). As of 6th August 2021, a wider cross-section of 8192 users had been created, 2262 datasets had been downloaded, and 817 maps had been created from the platform. This paper shows the use of rapid development and adoption of technologies that facilitates data collection, processing, and publishing in new ways and in ever-increasing volumes. In addition, the paper is explicit on new data infrastructure in sectors such as health, social amenities, and agriculture. Furthermore, this paper reveals the importance of cross-sectional data infrastructures for planning and decision making, which in turn can form a central data repository for sustainable development across developing countries.

Keywords: data portal, data infrastructure, open source, sustainability

Procedia PDF Downloads 78
23828 Process Data-Driven Representation of Abnormalities for Efficient Process Control

Authors: Hyun-Woo Cho

Abstract:

Unexpected operational events or abnormalities of industrial processes have a serious impact on the quality of final product of interest. In terms of statistical process control, fault detection and diagnosis of processes is one of the essential tasks needed to run the process safely. In this work, nonlinear representation of process measurement data is presented and evaluated using a simulation process. The effect of using different representation methods on the diagnosis performance is tested in terms of computational efficiency and data handling. The results have shown that the nonlinear representation technique produced more reliable diagnosis results and outperforms linear methods. The use of data filtering step improved computational speed and diagnosis performance for test data sets. The presented scheme is different from existing ones in that it attempts to extract the fault pattern in the reduced space, not in the original process variable space. Thus this scheme helps to reduce the sensitivity of empirical models to noise.

Keywords: fault diagnosis, nonlinear technique, process data, reduced spaces

Procedia PDF Downloads 231
23827 Text-to-Speech in Azerbaijani Language via Transfer Learning in a Low Resource Environment

Authors: Dzhavidan Zeinalov, Bugra Sen, Firangiz Aslanova

Abstract:

Most text-to-speech models cannot operate well in low-resource languages and require a great amount of high-quality training data to be considered good enough. Yet, with the improvements made in ASR systems, it is now much easier than ever to collect data for the design of custom text-to-speech models. In this work, our work on using the ASR model to collect data to build a viable text-to-speech system for one of the leading financial institutions of Azerbaijan will be outlined. NVIDIA’s implementation of the Tacotron 2 model was utilized along with the HiFiGAN vocoder. As for the training, the model was first trained with high-quality audio data collected from the Internet, then fine-tuned on the bank’s single speaker call center data. The results were then evaluated by 50 different listeners and got a mean opinion score of 4.17, displaying that our method is indeed viable. With this, we have successfully designed the first text-to-speech model in Azerbaijani and publicly shared 12 hours of audiobook data for everyone to use.

Keywords: Azerbaijani language, HiFiGAN, Tacotron 2, text-to-speech, transfer learning, whisper

Procedia PDF Downloads 23
23826 An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data

Authors: Ruchika Malhotra, Megha Khanna

Abstract:

The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.

Keywords: change proneness, empirical validation, imbalanced learning, machine learning techniques, object-oriented metrics

Procedia PDF Downloads 404
23825 PlayTrain: A Research and Intervention Project for Early Childhood Teacher Education

Authors: Dalila Lino, Maria Joao Hortas, Carla Rocha, Clarisse Nunes, Natalia Vieira, Marina Fuertes, Kátia Sa

Abstract:

The value of play is recognized worldwide and is considered a fundamental right of all children, as defined in Article 31 of the United Nations Children’s Rights. It is consensual among the scientific community that play, and toys are of vital importance for children’s learning and development. Play promotes the acquisition of language, enhances creativity and improves social, affective, emotional, cognitive and motor development of young children. Young children ages 0 to 6 who have had many opportunities to get involved in play show greater competence to adapt to new and unexpected situations and more easily overcome the pain and suffering caused by traumatic situations. The PlayTrain Project aims to understand the places/spaces of play in the education of children from 0 to 6 years and promoting the training of preschool teachers to become capable of developing practices that enhance children’s agency, experimentation in the physical and social world and the development of imagination and creativity. This project follows the Design-Based-Research (DBR) and has two dimensions: research and intervention. The participants are 120 students from the Master in Pre-school Education of the Higher School of Education, Polytechnic Institute of Lisbon enrolled in the academic year 2018/2019. The development of workshops focused on the role of play and toys for young children’s learning promotes the participants reflection and the development of skills and knowledge to construct developmentally appropriated practices in early childhood education. Data was collected through an online questionnaire and focal groups. Results show that the PlayTrain Project contribute to the development of a body of knowledge about the role of play for early childhood education. It was possible to identify the needs of preschool teacher education and to enhance the discussion among the scientific and academic community about the importance of deepening the role of play and toys in the study plans of the masters in pre-school education.

Keywords: children's learning, early childhood education, play, teacher education, toys

Procedia PDF Downloads 127
23824 Assessing the Utility of Unmanned Aerial Vehicle-Borne Hyperspectral Image and Photogrammetry Derived 3D Data for Wetland Species Distribution Quick Mapping

Authors: Qiaosi Li, Frankie Kwan Kit Wong, Tung Fung

Abstract:

Lightweight unmanned aerial vehicle (UAV) loading with novel sensors offers a low cost approach for data acquisition in complex environment. This study established a framework for applying UAV system in complex environment quick mapping and assessed the performance of UAV-based hyperspectral image and digital surface model (DSM) derived from photogrammetric point clouds for 13 species classification in wetland area Mai Po Inner Deep Bay Ramsar Site, Hong Kong. The study area was part of shallow bay with flat terrain and the major species including reedbed and four mangroves: Kandelia obovata, Aegiceras corniculatum, Acrostichum auerum and Acanthus ilicifolius. Other species involved in various graminaceous plants, tarbor, shrub and invasive species Mikania micrantha. In particular, invasive species climbed up to the mangrove canopy caused damage and morphology change which might increase species distinguishing difficulty. Hyperspectral images were acquired by Headwall Nano sensor with spectral range from 400nm to 1000nm and 0.06m spatial resolution image. A sequence of multi-view RGB images was captured with 0.02m spatial resolution and 75% overlap. Hyperspectral image was corrected for radiative and geometric distortion while high resolution RGB images were matched to generate maximum dense point clouds. Furtherly, a 5 cm grid digital surface model (DSM) was derived from dense point clouds. Multiple feature reduction methods were compared to identify the efficient method and to explore the significant spectral bands in distinguishing different species. Examined methods including stepwise discriminant analysis (DA), support vector machine (SVM) and minimum noise fraction (MNF) transformation. Subsequently, spectral subsets composed of the first 20 most importance bands extracted by SVM, DA and MNF, and multi-source subsets adding extra DSM to 20 spectrum bands were served as input in maximum likelihood classifier (MLC) and SVM classifier to compare the classification result. Classification results showed that feature reduction methods from best to worst are MNF transformation, DA and SVM. MNF transformation accuracy was even higher than all bands input result. Selected bands frequently laid along the green peak, red edge and near infrared. Additionally, DA found that chlorophyll absorption red band and yellow band were also important for species classification. In terms of 3D data, DSM enhanced the discriminant capacity among low plants, arbor and mangrove. Meanwhile, DSM largely reduced misclassification due to the shadow effect and morphological variation of inter-species. In respect to classifier, nonparametric SVM outperformed than MLC for high dimension and multi-source data in this study. SVM classifier tended to produce higher overall accuracy and reduce scattered patches although it costs more time than MLC. The best result was obtained by combining MNF components and DSM in SVM classifier. This study offered a precision species distribution survey solution for inaccessible wetland area with low cost of time and labour. In addition, findings relevant to the positive effect of DSM as well as spectral feature identification indicated that the utility of UAV-borne hyperspectral and photogrammetry deriving 3D data is promising in further research on wetland species such as bio-parameters modelling and biological invasion monitoring.

Keywords: digital surface model (DSM), feature reduction, hyperspectral, photogrammetric point cloud, species mapping, unmanned aerial vehicle (UAV)

Procedia PDF Downloads 243
23823 Variance-Aware Routing and Authentication Scheme for Harvesting Data in Cloud-Centric Wireless Sensor Networks

Authors: Olakanmi Oladayo Olufemi, Bamifewe Olusegun James, Badmus Yaya Opeyemi, Adegoke Kayode

Abstract:

The wireless sensor network (WSN) has made a significant contribution to the emergence of various intelligent services or cloud-based applications. Most of the time, these data are stored on a cloud platform for efficient management and sharing among different services or users. However, the sensitivity of the data makes them prone to various confidentiality and performance-related attacks during and after harvesting. Various security schemes have been developed to ensure the integrity and confidentiality of the WSNs' data. However, their specificity towards particular attacks and the resource constraint and heterogeneity of WSNs make most of these schemes imperfect. In this paper, we propose a secure variance-aware routing and authentication scheme with two-tier verification to collect, share, and manage WSN data. The scheme is capable of classifying WSN into different subnets, detecting any attempt of wormhole and black hole attack during harvesting, and enforcing access control on the harvested data stored in the cloud. The results of the analysis showed that the proposed scheme has more security functionalities than other related schemes, solves most of the WSNs and cloud security issues, prevents wormhole and black hole attacks, identifies the attackers during data harvesting, and enforces access control on the harvested data stored in the cloud at low computational, storage, and communication overheads.

Keywords: data block, heterogeneous IoT network, data harvesting, wormhole attack, blackhole attack access control

Procedia PDF Downloads 57
23822 Effects of Unfamiliar Orthography on the Lexical Encoding of Novel Phonological Features

Authors: Asmaa Shehata

Abstract:

Prior research indicates that second language (L2) learners encounter difficulty in the distinguishing novel L2 contrasting sounds that are not contrastive in their native languages. L2 orthographic information, however, is found to play a positive role in the acquisition of non-native phoneme contrasts. While most studies have mainly involved a familiar written script (i.e., the Roman script), the influence of a foreign, unfamiliar script is still unknown. Therefore, the present study asks: Does unfamiliar L2 script play a role in creating distinct phonological representations of novel contrasting phonemes? It is predicted that subjects’ performance in the unfamiliar orthography group will outperform their counterparts’ performance in the control group. Thus, training that entails orthographic inputs can yield a significant improvement in L2 adult learners’ identification and lexical encoding of novel L2 consonant contrasts. Results are discussed in terms of their implications for the type of input introduced to L2 learners to improve their language learning.

Keywords: Arabic, consonant contrasts, foreign script, lexical encoding, orthography, word learning

Procedia PDF Downloads 244
23821 Contribution of Word Decoding and Reading Fluency on Reading Comprehension in Young Typical Readers of Kannada Language

Authors: Vangmayee V. Subban, Suzan Deelan. Pinto, Somashekara Haralakatta Shivananjappa, Shwetha Prabhu, Jayashree S. Bhat

Abstract:

Introduction and Need: During early years of schooling, the instruction in the schools mainly focus on children’s word decoding abilities. However, the skilled readers should master all the components of reading such as word decoding, reading fluency and comprehension. Nevertheless, the relationship between each component during the process of learning to read is less clear. The studies conducted in alphabetical languages have mixed opinion on relative contribution of word decoding and reading fluency on reading comprehension. However, the scenarios in alphasyllabary languages are unexplored. Aim and Objectives: The aim of the study was to explore the role of word decoding, reading fluency on reading comprehension abilities in children learning to read Kannada between the age ranges of 5.6 to 8.6 years. Method: In this cross sectional study, a total of 60 typically developing children, 20 each from Grade I, Grade II, Grade III maintaining equal gender ratio between the age range of 5.6 to 6.6 years, 6.7 to 7.6 years and 7.7 to 8.6 years respectively were selected from Kannada medium schools. The reading fluency and reading comprehension abilities of the children were assessed using Grade level passages selected from the Kannada text book of children core curriculum. All the passages consist of five questions to assess reading comprehension. The pseudoword decoding skills were assessed using 40 pseudowords with varying syllable length and their Akshara composition. Pseudowords are formed by interchanging the syllables within the meaningful word while maintaining the phonotactic constraints of Kannada language. The assessment material was subjected to content validation and reliability measures before collecting the data on the study samples. The data were collected individually, and reading fluency was assessed for words correctly read per minute. Pseudoword decoding was scored for the accuracy of reading. Results: The descriptive statistics indicated that the mean pseudoword reading, reading comprehension, words accurately read per minute increased with the Grades. The performance of Grade III children found to be higher, Grade I lower and Grade II remained intermediate of Grade III and Grade I. The trend indicated that reading skills gradually improve with the Grades. Pearson’s correlation co-efficient showed moderate and highly significant (p=0.00) positive co-relation between the variables, indicating the interdependency of all the three components required for reading. The hierarchical regression analysis revealed 37% variance in reading comprehension was explained by pseudoword decoding and was highly significant. Subsequent entry of reading fluency measure, there was no significant change in R-square and was only change 3%. Therefore, pseudoword-decoding evolved as a single most significant predictor of reading comprehension during early Grades of reading acquisition. Conclusion: The present study concludes that the pseudoword decoding skills contribute significantly to reading comprehension than reading fluency during initial years of schooling in children learning to read Kannada language.

Keywords: alphasyllabary, pseudo-word decoding, reading comprehension, reading fluency

Procedia PDF Downloads 245