Search results for: heterogeneous massive data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25995

Search results for: heterogeneous massive data

24705 Prompt Design for Code Generation in Data Analysis Using Large Language Models

Authors: Lu Song Ma Li Zhi

Abstract:

With the rapid advancement of artificial intelligence technology, large language models (LLMs) have become a milestone in the field of natural language processing, demonstrating remarkable capabilities in semantic understanding, intelligent question answering, and text generation. These models are gradually penetrating various industries, particularly showcasing significant application potential in the data analysis domain. However, retraining or fine-tuning these models requires substantial computational resources and ample downstream task datasets, which poses a significant challenge for many enterprises and research institutions. Without modifying the internal parameters of the large models, prompt engineering techniques can rapidly adapt these models to new domains. This paper proposes a prompt design strategy aimed at leveraging the capabilities of large language models to automate the generation of data analysis code. By carefully designing prompts, data analysis requirements can be described in natural language, which the large language model can then understand and convert into executable data analysis code, thereby greatly enhancing the efficiency and convenience of data analysis. This strategy not only lowers the threshold for using large models but also significantly improves the accuracy and efficiency of data analysis. Our approach includes requirements for the precision of natural language descriptions, coverage of diverse data analysis needs, and mechanisms for immediate feedback and adjustment. Experimental results show that with this prompt design strategy, large language models perform exceptionally well in multiple data analysis tasks, generating high-quality code and significantly shortening the data analysis cycle. This method provides an efficient and convenient tool for the data analysis field and demonstrates the enormous potential of large language models in practical applications.

Keywords: large language models, prompt design, data analysis, code generation

Procedia PDF Downloads 42
24704 Comparison of Different Methods to Produce Fuzzy Tolerance Relations for Rainfall Data Classification in the Region of Central Greece

Authors: N. Samarinas, C. Evangelides, C. Vrekos

Abstract:

The aim of this paper is the comparison of three different methods, in order to produce fuzzy tolerance relations for rainfall data classification. More specifically, the three methods are correlation coefficient, cosine amplitude and max-min method. The data were obtained from seven rainfall stations in the region of central Greece and refers to 20-year time series of monthly rainfall height average. Three methods were used to express these data as a fuzzy relation. This specific fuzzy tolerance relation is reformed into an equivalence relation with max-min composition for all three methods. From the equivalence relation, the rainfall stations were categorized and classified according to the degree of confidence. The classification shows the similarities among the rainfall stations. Stations with high similarity can be utilized in water resource management scenarios interchangeably or to augment data from one to another. Due to the complexity of calculations, it is important to find out which of the methods is computationally simpler and needs fewer compositions in order to give reliable results.

Keywords: classification, fuzzy logic, tolerance relations, rainfall data

Procedia PDF Downloads 314
24703 Customer Satisfaction and Effective HRM Policies: Customer and Employee Satisfaction

Authors: S. Anastasiou, C. Nathanailides

Abstract:

The purpose of this study is to examine the possible link between employee and customer satisfaction. The service provided by employees, help to build a good relationship with customers and can help at increasing their loyalty. Published data for job satisfaction and indicators of customer services were gathered from relevant published works which included data from five different countries. The reviewed data indicate a significant correlation between indicators of customer and employee satisfaction in the Banking sector. There was a significant correlation between the two parameters (Pearson correlation R2=0.52 P<0.05) The reviewed data provide evidence that there is some practical evidence which links these two parameters.

Keywords: job satisfaction, job performance, customer’ service, banks, human resources management

Procedia PDF Downloads 321
24702 Using the UK as a Case Study to Assess the Current State of Large Woody Debris Restoration as a Tool for Improving the Ecological Status of Natural Watercourses Globally

Authors: Isabelle Barrett

Abstract:

Natural watercourses provide a range of vital ecosystem services, notably freshwater provision. They also offer highly heterogeneous habitat which supports an extreme diversity of aquatic life. Exploitation of rivers, changing land use and flood prevention measures have led to habitat degradation and subsequent biodiversity loss; indeed, freshwater species currently face a disproportionate rate of extinction compared to their terrestrial and marine counterparts. Large woody debris (LWD) encompasses the trees, large branches and logs which fall into watercourses, and is responsible for important habitat characteristics. Historically, natural LWD has been removed from streams under the assumption that it is not aesthetically pleasing and is thus ecologically unfavourable, despite extensive evidence contradicting this. Restoration efforts aim to replace lost LWD in order to reinstate habitat heterogeneity. This paper aims to assess the current state of such restoration schemes for improving fluvial ecological health in the UK. A detailed review of the scientific literature was conducted alongside a meta-analysis of 25 UK-based projects involving LWD restoration. Projects were chosen for which sufficient information was attainable for analysis, covering a broad range of budgets and scales. The most effective strategies for river restoration encompass ecological success, stakeholder engagement and scientific advancement, however few projects surveyed showed sensitivity to all three; for example, only 32% of projects stated biological aims. Focus tended to be on stakeholder engagement and public approval, since this is often a key funding driver. Consequently, there is a tendency to focus on the aesthetic outcomes of a project, however physical habitat restoration does not necessarily lead to direct biodiversity increases. This highlights the significance of rivers as highly heterogeneous environments with multiple interlinked processes, and emphasises a need for a stronger scientific presence in project planning. Poor scientific rigour means monitoring is often lacking, with varying, if any, definitions of success which are rarely pre-determined. A tendency to overlook negative or neutral results was apparent, with unjustified focus often put on qualitative results. The temporal scale of monitoring is typically inadequate to facilitate scientific conclusions, with only 20% of projects surveyed reporting any pre-restoration monitoring. Furthermore, monitoring is often limited to a few variables, with biotic monitoring often fish-focussed. Due to their longer life cycles and dispersal capability, fish are usually poor indicators of environmental change, making it difficult to attribute any changes in ecological health to restoration efforts. Although the potential impact of LWD restoration may be positive, this method of restoration could simply be making short-term, small-scale improvements; without addressing the underlying symptoms of degradation, for example water quality, the issue cannot be fully resolved. Promotion of standardised monitoring for LWD projects could help establish a deeper understanding of the ecology surrounding the practice, supporting movement towards adaptive management in which scientific evidence feeds back to practitioners, enabling the design of more efficient projects with greater ecological success. By highlighting LWD, this study hopes to address the difficulties faced within river management, and emphasise the need for a more holistic international and inter-institutional approach to tackling problems associated with degradation.

Keywords: biological monitoring, ecological health, large woody debris, river management, river restoration

Procedia PDF Downloads 216
24701 Evaluation of Australian Open Banking Regulation: Balancing Customer Data Privacy and Innovation

Authors: Suman Podder

Abstract:

As Australian ‘Open Banking’ allows customers to share their financial data with accredited Third-Party Providers (‘TPPs’), it is necessary to evaluate whether the regulators have achieved the balance between protecting customer data privacy and promoting data-related innovation. Recognising the need to increase customers’ influence on their own data, and the benefits of data-related innovation, the Australian Government introduced ‘Consumer Data Right’ (‘CDR’) to the banking sector through Open Banking regulation. Under Open Banking, TPPs can access customers’ banking data that allows the TPPs to tailor their products and services to meet customer needs at a more competitive price. This facilitated access and use of customer data will promote innovation by providing opportunities for new products and business models to emerge and grow. However, the success of Open Banking depends on the willingness of the customers to share their data, so the regulators have augmented the protection of data by introducing new privacy safeguards to instill confidence and trust in the system. The dilemma in policymaking is that, on the one hand, lenient data privacy laws will help the flow of information, but at the risk of individuals’ loss of privacy, on the other hand, stringent laws that adequately protect privacy may dissuade innovation. Using theoretical and doctrinal methods, this paper examines whether the privacy safeguards under Open Banking will add to the compliance burden of the participating financial institutions, resulting in the undesirable effect of stifling other policy objectives such as innovation. The contribution of this research is three-fold. In the emerging field of customer data sharing, this research is one of the few academic studies on the objectives and impact of Open Banking in the Australian context. Additionally, Open Banking is still in the early stages of implementation, so this research traces the evolution of Open Banking through policy debates regarding the desirability of customer data-sharing. Finally, the research focuses not only on the customers’ data privacy and juxtaposes it with another important objective of promoting innovation, but it also highlights the critical issues facing the data-sharing regime. This paper argues that while it is challenging to develop a regulatory framework for protecting data privacy without impeding innovation and jeopardising yet unknown opportunities, data privacy and innovation promote different aspects of customer welfare. This paper concludes that if a regulation is appropriately designed and implemented, the benefits of data-sharing will outweigh the cost of compliance with the CDR.

Keywords: consumer data right, innovation, open banking, privacy safeguards

Procedia PDF Downloads 141
24700 Integrated On-Board Diagnostic-II and Direct Controller Area Network Access for Vehicle Monitoring System

Authors: Kavian Khosravinia, Mohd Khair Hassan, Ribhan Zafira Abdul Rahman, Syed Abdul Rahman Al-Haddad

Abstract:

The CAN (controller area network) bus is introduced as a multi-master, message broadcast system. The messages sent on the CAN are used to communicate state information, referred as a signal between different ECUs, which provides data consistency in every node of the system. OBD-II Dongles that are based on request and response method is the wide-spread solution for extracting sensor data from cars among researchers. Unfortunately, most of the past researches do not consider resolution and quantity of their input data extracted through OBD-II technology. The maximum feasible scan rate is only 9 queries per second which provide 8 data points per second with using ELM327 as well-known OBD-II dongle. This study aims to develop and design a programmable, and latency-sensitive vehicle data acquisition system that improves the modularity and flexibility to extract exact, trustworthy, and fresh car sensor data with higher frequency rates. Furthermore, the researcher must break apart, thoroughly inspect, and observe the internal network of the vehicle, which may cause severe damages to the expensive ECUs of the vehicle due to intrinsic vulnerabilities of the CAN bus during initial research. Desired sensors data were collected from various vehicles utilizing Raspberry Pi3 as computing and processing unit with using OBD (request-response) and direct CAN method at the same time. Two types of data were collected for this study. The first, CAN bus frame data that illustrates data collected for each line of hex data sent from an ECU and the second type is the OBD data that represents some limited data that is requested from ECU under standard condition. The proposed system is reconfigurable, human-readable and multi-task telematics device that can be fitted into any vehicle with minimum effort and minimum time lag in the data extraction process. The standard operational procedure experimental vehicle network test bench is developed and can be used for future vehicle network testing experiment.

Keywords: CAN bus, OBD-II, vehicle data acquisition, connected cars, telemetry, Raspberry Pi3

Procedia PDF Downloads 205
24699 Big Data in Construction Project Management: The Colombian Northeast Case

Authors: Sergio Zabala-Vargas, Miguel Jiménez-Barrera, Luz VArgas-Sánchez

Abstract:

In recent years, information related to project management in organizations has been increasing exponentially. Performance data, management statistics, indicator results have forced the collection, analysis, traceability, and dissemination of project managers to be essential. In this sense, there are current trends to facilitate efficient decision-making in emerging technology projects, such as: Machine Learning, Data Analytics, Data Mining, and Big Data. The latter is the most interesting in this project. This research is part of the thematic line Construction methods and project management. Many authors present the relevance that the use of emerging technologies, such as Big Data, has taken in recent years in project management in the construction sector. The main focus is the optimization of time, scope, budget, and in general mitigating risks. This research was developed in the northeastern region of Colombia-South America. The first phase was aimed at diagnosing the use of emerging technologies (Big-Data) in the construction sector. In Colombia, the construction sector represents more than 50% of the productive system, and more than 2 million people participate in this economic segment. The quantitative approach was used. A survey was applied to a sample of 91 companies in the construction sector. Preliminary results indicate that the use of Big Data and other emerging technologies is very low and also that there is interest in modernizing project management. There is evidence of a correlation between the interest in using new data management technologies and the incorporation of Building Information Modeling BIM. The next phase of the research will allow the generation of guidelines and strategies for the incorporation of technological tools in the construction sector in Colombia.

Keywords: big data, building information modeling, tecnology, project manamegent

Procedia PDF Downloads 128
24698 Minimum Data of a Speech Signal as Special Indicators of Identification in Phonoscopy

Authors: Nazaket Gazieva

Abstract:

Voice biometric data associated with physiological, psychological and other factors are widely used in forensic phonoscopy. There are various methods for identifying and verifying a person by voice. This article explores the minimum speech signal data as individual parameters of a speech signal. Monozygotic twins are believed to be genetically identical. Using the minimum data of the speech signal, we came to the conclusion that the voice imprint of monozygotic twins is individual. According to the conclusion of the experiment, we can conclude that the minimum indicators of the speech signal are more stable and reliable for phonoscopic examinations.

Keywords: phonogram, speech signal, temporal characteristics, fundamental frequency, biometric fingerprints

Procedia PDF Downloads 144
24697 Optimize Study and Optical Characterization of Bilayer Structures from Silicon Nitride

Authors: Beddiaf Abdelaziz

Abstract:

The optical characteristics of thin films of silicon oxynitride SiOₓNy prepared by the Low-Pressure Chemical Vapor Deposition (LPCVD) technique have been studied. The films are elaborated from the SiH₂Cl₂, N₂O and NH₃ gaseous mixtures. The flows of SiH₂Cl₂ and (N₂O+NH₃) are 200 sccm and 160 sccm respectively. The deposited films have been characterized by ellipsometry, to model our silicon oxynitride SiOₓNy films. We have suggested two theoretical models (Maxwell Garnett and Bruggeman effective medium approximation (BEMA)). These models have been applied on silicon oxynitride considering the material as a heterogeneous medium formed by silicon oxide and silicon nitride. The model's validation was justified by the confrontation of theoretical spectra and those measured by ellipsometry. This result permits us to obtain the optical refractive coefficient of these films and their thickness. Ellipsometry analysis of the optical properties of the SiOₓNy films shows that the SiO₂ fraction decreases when the gaseous ratio NH₃/N₂O increases. Whereas the increase of this ratio leads to an increase of the silicon nitride Si3N4 fraction. The study also shows that the increasing gaseous ratio leads to a strong incorporation of nitrogen atoms in films. Also, the increasing of the SiOₓNy refractive coefficient until the SiO₂ value shows that this insulating material has good dielectric quality.

Keywords: ellipsometry, silicon oxynitrde, model, refractive coefficient, effective medium

Procedia PDF Downloads 19
24696 The Regulation of the Pro-inflammatory Cytokine Interleukin 6 (IL6) by Epstein-Barr Virus (EBV)

Authors: Liu Xiaohan

Abstract:

Epstein–Barr virus (EBV) is a human herpesvirus and is closely related to many malignancies of lymphocyte and epithelial origins, such as gastric cancer, Burkitt’s lymphoma, and nasopharyngeal carcinoma (NPC). NPC is a malignant epithelial tumor which is 100% associated with EBV latent infection. Most of the NPC cases are densely populated in southern China, especially in Guangdong and Hong Kong. To our knowledge, overexpression of pro-inflammatory cytokines may result in a loss of balance of the immune system and cause damage to human bodies. Interleukin-6 (IL6) is a pro-inflammatory cytokine which plays an important role in tumor progression. In addition, gene expression is regulated by both transcriptional and post-transcriptional pathways, while post-transcriptional regulation is an important mechanism to modulate the mature mRNA level in mammalian cells. AU-rich element binding factor 1 (AUF1)/heterogeneous nuclear RNP D (hnRNP D) is known for its function in destabilizing mRNAs, including cytokines and cell cycle regulators. Previous studies have found that overexpression of hnRNP D would lead to tumorigenesis. In this project, our aim is to determine the role played by hnRNP D in EBV-infected cells and how our anti-EBV agents can affect the function of hnRNP D. The results of this study will provide a new insight into how the pro-inflammatory cytokine expression can be regulated by EBV.

Keywords: interleukin 6 (IL6), epstein-barr virus (EBV), nasopharyngeal carcinoma (NPC, epstein-barr nuclear antigen-1 (EBNA1)

Procedia PDF Downloads 62
24695 A Non-parametric Clustering Approach for Multivariate Geostatistical Data

Authors: Francky Fouedjio

Abstract:

Multivariate geostatistical data have become omnipresent in the geosciences and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are different from each other, in some sense. Spatially contiguous clusters can significantly improve the interpretation that turns the resulting clusters into meaningful geographical subregions. In this paper, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the spatial dependence structure of data. It integrates existing methods to find the optimal cluster number and to evaluate the contribution of variables to the clustering. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is assessed using bivariate synthetic dataset and multivariate geochemical dataset. The proposed clustering method gives satisfactory results compared to other similar geostatistical clustering methods.

Keywords: clustering, geostatistics, multivariate data, non-parametric

Procedia PDF Downloads 477
24694 Big Data in Telecom Industry: Effective Predictive Techniques on Call Detail Records

Authors: Sara ElElimy, Samir Moustafa

Abstract:

Mobile network operators start to face many challenges in the digital era, especially with high demands from customers. Since mobile network operators are considered a source of big data, traditional techniques are not effective with new era of big data, Internet of things (IoT) and 5G; as a result, handling effectively different big datasets becomes a vital task for operators with the continuous growth of data and moving from long term evolution (LTE) to 5G. So, there is an urgent need for effective Big data analytics to predict future demands, traffic, and network performance to full fill the requirements of the fifth generation of mobile network technology. In this paper, we introduce data science techniques using machine learning and deep learning algorithms: the autoregressive integrated moving average (ARIMA), Bayesian-based curve fitting, and recurrent neural network (RNN) are employed for a data-driven application to mobile network operators. The main framework included in models are identification parameters of each model, estimation, prediction, and final data-driven application of this prediction from business and network performance applications. These models are applied to Telecom Italia Big Data challenge call detail records (CDRs) datasets. The performance of these models is found out using a specific well-known evaluation criteria shows that ARIMA (machine learning-based model) is more accurate as a predictive model in such a dataset than the RNN (deep learning model).

Keywords: big data analytics, machine learning, CDRs, 5G

Procedia PDF Downloads 139
24693 A Data Mining Approach for Analysing and Predicting the Bank's Asset Liability Management Based on Basel III Norms

Authors: Nidhin Dani Abraham, T. K. Sri Shilpa

Abstract:

Asset liability management is an important aspect in banking business. Moreover, the today’s banking is based on BASEL III which strictly regulates on the counterparty default. This paper focuses on prediction and analysis of counter party default risk, which is a type of risk occurs when the customers fail to repay the amount back to the lender (bank or any financial institutions). This paper proposes an approach to reduce the counterparty risk occurring in the financial institutions using an appropriate data mining technique and thus predicts the occurrence of NPA. It also helps in asset building and restructuring quality. Liability management is very important to carry out banking business. To know and analyze the depth of liability of bank, a suitable technique is required. For that a data mining technique is being used to predict the dormant behaviour of various deposit bank customers. Various models are implemented and the results are analyzed of saving bank deposit customers. All these data are cleaned using data cleansing approach from the bank data warehouse.

Keywords: data mining, asset liability management, BASEL III, banking

Procedia PDF Downloads 553
24692 Parallel Coordinates on a Spiral Surface for Visualizing High-Dimensional Data

Authors: Chris Suma, Yingcai Xiao

Abstract:

This paper presents Parallel Coordinates on a Spiral Surface (PCoSS), a parallel coordinate based interactive visualization method for high-dimensional data, and a test implementation of the method. Plots generated by the test system are compared with those generated by XDAT, a software implementing traditional parallel coordinates. Traditional parallel coordinate plots can be cluttered when the number of data points is large or when the dimensionality of the data is high. PCoSS plots display multivariate data on a 3D spiral surface and allow users to see the whole picture of high-dimensional data with less cluttering. Taking advantage of the 3D display environment in PCoSS, users can further reduce cluttering by zooming into an axis of interest for a closer view or by moving vantage points and by reorienting the viewing angle to obtain a desired view of the plots.

Keywords: human computer interaction, parallel coordinates, spiral surface, visualization

Procedia PDF Downloads 12
24691 Highly Selective Conversion of CO2 to CO on Cu Nanoparticles

Authors: Rauf Razzaq, Kaiwu Dong, Muhammad Sharif, Ralf Jackstell, Matthias Beller

Abstract:

Carbon dioxide (CO2), a key greenhouse gas produced from both anthropogenic and natural sources, has been recently considered to be an important C1 building-block for the synthesis of many industrial fuels and chemicals. Catalytic hydrogenation of CO2 using a heterogeneous system is regarded as an efficient process for CO2 valorization. In this regard CO2 reduction to CO via the reverse water gas shift reaction (RWGSR) has attracted much attention as a viable process for large scale commercial CO2 utilization. This process can generate syn-gas (CO+H2) which can provide an alternative route to direct CO2 conversion to methanol and/or liquid HCs from FT reaction. Herein, we report a highly active and selective silica supported copper catalyst with efficient CO2 reduction to CO in a slurry-bed batch autoclave reactor. The reactions were carried out at 200°C and 60 bar initial pressure with CO2/H2 ratio of 1:3 with varying temperature, pressure and fed-gas ratio. The gaseous phase products were analyzed using FID while the liquid products were analyzed by using FID detectors. It was found that Cu/SiO2 catalyst prepared using novel ammonia precipitation-urea gelation method achieved 26% CO2 conversion with a CO and methanol selectivity of 98 and 2% respectively. The high catalytic activity could be attributed to its strong metal-support interaction with highly dispersed and stabilized Cu+ species active for RWGSR. So, it can be concluded that reduction of CO2 to CO via RWGSR could address the problem of using CO2 gas in C1 chemistry.

Keywords: CO2 reduction, methanol, slurry reactor, synthesis gas

Procedia PDF Downloads 327
24690 A Dynamic Ensemble Learning Approach for Online Anomaly Detection in Alibaba Datacenters

Authors: Wanyi Zhu, Xia Ming, Huafeng Wang, Junda Chen, Lu Liu, Jiangwei Jiang, Guohua Liu

Abstract:

Anomaly detection is a first and imperative step needed to respond to unexpected problems and to assure high performance and security in large data center management. This paper presents an online anomaly detection system through an innovative approach of ensemble machine learning and adaptive differentiation algorithms, and applies them to performance data collected from a continuous monitoring system for multi-tier web applications running in Alibaba data centers. We evaluate the effectiveness and efficiency of this algorithm with production traffic data and compare with the traditional anomaly detection approaches such as a static threshold and other deviation-based detection techniques. The experiment results show that our algorithm correctly identifies the unexpected performance variances of any running application, with an acceptable false positive rate. This proposed approach has already been deployed in real-time production environments to enhance the efficiency and stability in daily data center operations.

Keywords: Alibaba data centers, anomaly detection, big data computation, dynamic ensemble learning

Procedia PDF Downloads 201
24689 Role of Direct Immunofluorescence in Diagnosing Vesiculobullous Lesions

Authors: Mitakshara Sharma, Sonal Sharma

Abstract:

Vesiculobullous diseases are heterogeneous group of dermatological disorders with protean manifestations. The most important technique for the patients with vesiculobullous diseases is conventional histopathology and confirmatory tests like direct immunofluorescence (DIF) and indirect immunofluorescence (IIF). DIF has been used for decades to investigate pathophysiology and in the diagnosis. It detects molecules such as immunoglobulins and complement components. It is done on the perilesional skin. Diagnosis of DIF test depends on features like primary site of the immune deposits, class of immunoglobulin, number of immune deposits and deposition at other sites. The aim of the study is to correlate DIF with clinical and histopathological findings and to analyze the utility of DIF in the diagnosis of these disorders. It is a retrospective descriptive study conducted for 2 years from 2015 to 2017 in Department of Pathology, GTB Hospital on perilesional punch biopsies of vesiculobullous lesions. Biopsies were sent in Michael’s medium. The specimens were washed, frozen and incubated with fluorescein isothiocyanate (FITC) tagged antihuman antibodies IgA, IgG, IgM, C3 & F and were viewed under fluorescent microscope. Out of 401 skin biopsies submitted for DIF, 285 were vesiculobullous diseases, in which the most common was Pemphigus vulgaris (34%) followed by Bullous pemphigoid (21.5%), Dermatitis herpetiformis (16%), Pemphigus foliaceus (11.9%), Linear IgA disease (11.9%), Epidermolysisbullosa (2.39%) and Pemphigus herpetiformis (1.7%). We will be presenting the DIF findings in the all these vesiculobullous diseases. DIF in conjugation with histopathology gives the best diagnostic yield in these lesions. It also helps in the diagnosis whenever there is a clinical and histopathological overlap.

Keywords: antibodies, direct immunofluorescence, pemphigus, vesiculobullous

Procedia PDF Downloads 364
24688 Unsupervised Text Mining Approach to Early Warning System

Authors: Ichihan Tai, Bill Olson, Paul Blessner

Abstract:

Traditional early warning systems that alarm against crisis are generally based on structured or numerical data; therefore, a system that can make predictions based on unstructured textual data, an uncorrelated data source, is a great complement to the traditional early warning systems. The Chicago Board Options Exchange (CBOE) Volatility Index (VIX), commonly referred to as the fear index, measures the cost of insurance against market crash, and spikes in the event of crisis. In this study, news data is consumed for prediction of whether there will be a market-wide crisis by predicting the movement of the fear index, and the historical references to similar events are presented in an unsupervised manner. Topic modeling-based prediction and representation are made based on daily news data between 1990 and 2015 from The Wall Street Journal against VIX index data from CBOE.

Keywords: early warning system, knowledge management, market prediction, topic modeling.

Procedia PDF Downloads 338
24687 The Role of Synthetic Data in Aerial Object Detection

Authors: Ava Dodd, Jonathan Adams

Abstract:

The purpose of this study is to explore the characteristics of developing a machine learning application using synthetic data. The study is structured to develop the application for the purpose of deploying the computer vision model. The findings discuss the realities of attempting to develop a computer vision model for practical purpose, and detail the processes, tools, and techniques that were used to meet accuracy requirements. The research reveals that synthetic data represents another variable that can be adjusted to improve the performance of a computer vision model. Further, a suite of tools and tuning recommendations are provided.

Keywords: computer vision, machine learning, synthetic data, YOLOv4

Procedia PDF Downloads 225
24686 Perception-Oriented Model Driven Development for Designing Data Acquisition Process in Wireless Sensor Networks

Authors: K. Indra Gandhi

Abstract:

Wireless Sensor Networks (WSNs) have always been characterized for application-specific sensing, relaying and collection of information for further analysis. However, software development was not considered as a separate entity in this process of data collection which has posed severe limitations on the software development for WSN. Software development for WSN is a complex process since the components involved are data-driven, network-driven and application-driven in nature. This implies that there is a tremendous need for the separation of concern from the software development perspective. A layered approach for developing data acquisition design based on Model Driven Development (MDD) has been proposed as the sensed data collection process itself varies depending upon the application taken into consideration. This work focuses on the layered view of the data acquisition process so as to ease the software point of development. A metamodel has been proposed that enables reusability and realization of the software development as an adaptable component for WSN systems. Further, observing users perception indicates that proposed model helps in improving the programmer's productivity by realizing the collaborative system involved.

Keywords: data acquisition, model-driven development, separation of concern, wireless sensor networks

Procedia PDF Downloads 434
24685 Safety Tolerance Zone for Driver-Vehicle-Environment Interactions under Challenging Conditions

Authors: Matjaž Šraml, Marko Renčelj, Tomaž Tollazzi, Chiara Gruden

Abstract:

Road safety is a worldwide issue with numerous and heterogeneous factors influencing it. On the side, driver state – comprising distraction/inattention, fatigue, drowsiness, extreme emotions, and socio-cultural factors highly affect road safety. On the other side, the vehicle state has an important role in mitigating (or not) the road risk. Finally, the road environment is still one of the main determinants of road safety, defining driving task complexity. At the same time, thanks to technological development, a lot of detailed data is easily available, creating opportunities for the detection of driver state, vehicle characteristics and road conditions and, consequently, for the design of ad hoc interventions aimed at improving driver performance, increase awareness and mitigate road risks. This is the challenge faced by the i-DREAMS project. i-DREAMS, which stands for a smart Driver and Road Environment Assessment and Monitoring System, is a 3-year project funded by the European Union’s Horizon 2020 research and innovation program. It aims to set up a platform to define, develop, test and validate a ‘Safety Tolerance Zone’ to prevent drivers from getting too close to the boundaries of unsafe operation by mitigating risks in real-time and after the trip. After the definition and development of the Safety Tolerance Zone concept and the concretization of the same in an Advanced driver-assistance system (ADAS) platform, the system was tested firstly for 2 months in a driving simulator environment in 5 different countries. After that, naturalistic driving studies started for a 10-month period (comprising a 1-month pilot study, 3-month baseline study and 6 months study implementing interventions). Currently, the project team has approved a common evaluation approach, and it is developing the assessment of the usage and outcomes of the i-DREAMS system, which is turning positive insights. The i-DREAMS consortium consists of 13 partners, 7 engineering universities and research groups, 4 industry partners and 2 partners (European Transport Safety Council - ETSC - and POLIS cities and regions for transport innovation) closely linked to transport safety stakeholders, covering 8 different countries altogether.

Keywords: advanced driver assistant systems, driving simulator, safety tolerance zone, traffic safety

Procedia PDF Downloads 67
24684 Comparative Analysis of Data Gathering Protocols with Multiple Mobile Elements for Wireless Sensor Network

Authors: Bhat Geetalaxmi Jairam, D. V. Ashoka

Abstract:

Wireless Sensor Networks are used in many applications to collect sensed data from different sources. Sensed data has to be delivered through sensors wireless interface using multi-hop communication towards the sink. The data collection in wireless sensor networks consumes energy. Energy consumption is the major constraints in WSN .Reducing the energy consumption while increasing the amount of generated data is a great challenge. In this paper, we have implemented two data gathering protocols with multiple mobile sinks/elements to collect data from sensor nodes. First, is Energy-Efficient Data Gathering with Tour Length-Constrained Mobile Elements in Wireless Sensor Networks (EEDG), in which mobile sinks uses vehicle routing protocol to collect data. Second is An Intelligent Agent-based Routing Structure for Mobile Sinks in WSNs (IAR), in which mobile sinks uses prim’s algorithm to collect data. Authors have implemented concepts which are common to both protocols like deployment of mobile sinks, generating visiting schedule, collecting data from the cluster member. Authors have compared the performance of both protocols by taking statistics based on performance parameters like Delay, Packet Drop, Packet Delivery Ratio, Energy Available, Control Overhead. Authors have concluded this paper by proving EEDG is more efficient than IAR protocol but with few limitations which include unaddressed issues likes Redundancy removal, Idle listening, Mobile Sink’s pause/wait state at the node. In future work, we plan to concentrate more on these limitations to avail a new energy efficient protocol which will help in improving the life time of the WSN.

Keywords: aggregation, consumption, data gathering, efficiency

Procedia PDF Downloads 497
24683 Status and Results from EXO-200

Authors: Ryan Maclellan

Abstract:

EXO-200 has provided one of the most sensitive searches for neutrinoless double-beta decay utilizing 175 kg of enriched liquid xenon in an ultra-low background time projection chamber. This detector has demonstrated excellent energy resolution and background rejection capabilities. Using the first two years of data, EXO-200 has set a limit of 1.1x10^25 years at 90% C.L. on the neutrinoless double-beta decay half-life of Xe-136. The experiment has experienced a brief hiatus in data taking during a temporary shutdown of its host facility: the Waste Isolation Pilot Plant. EXO-200 expects to resume data taking in earnest this fall with upgraded detector electronics. Results from the analysis of EXO-200 data and an update on the current status of EXO-200 will be presented.

Keywords: double-beta, Majorana, neutrino, neutrinoless

Procedia PDF Downloads 414
24682 Remaining Useful Life (RUL) Assessment Using Progressive Bearing Degradation Data and ANN Model

Authors: Amit R. Bhende, G. K. Awari

Abstract:

Remaining useful life (RUL) prediction is one of key technologies to realize prognostics and health management that is being widely applied in many industrial systems to ensure high system availability over their life cycles. The present work proposes a data-driven method of RUL prediction based on multiple health state assessment for rolling element bearings. Bearing degradation data at three different conditions from run to failure is used. A RUL prediction model is separately built in each condition. Feed forward back propagation neural network models are developed for prediction modeling.

Keywords: bearing degradation data, remaining useful life (RUL), back propagation, prognosis

Procedia PDF Downloads 436
24681 Spatio-Temporal Data Mining with Association Rules for Lake Van

Authors: Tolga Aydin, M. Fatih Alaeddinoğlu

Abstract:

People, throughout the history, have made estimates and inferences about the future by using their past experiences. Developing information technologies and the improvements in the database management systems make it possible to extract useful information from knowledge in hand for the strategic decisions. Therefore, different methods have been developed. Data mining by association rules learning is one of such methods. Apriori algorithm, one of the well-known association rules learning algorithms, is not commonly used in spatio-temporal data sets. However, it is possible to embed time and space features into the data sets and make Apriori algorithm a suitable data mining technique for learning spatio-temporal association rules. Lake Van, the largest lake of Turkey, is a closed basin. This feature causes the volume of the lake to increase or decrease as a result of change in water amount it holds. In this study, evaporation, humidity, lake altitude, amount of rainfall and temperature parameters recorded in Lake Van region throughout the years are used by the Apriori algorithm and a spatio-temporal data mining application is developed to identify overflows and newly-formed soil regions (underflows) occurring in the coastal parts of Lake Van. Identifying possible reasons of overflows and underflows may be used to alert the experts to take precautions and make the necessary investments.

Keywords: apriori algorithm, association rules, data mining, spatio-temporal data

Procedia PDF Downloads 374
24680 Building Data Infrastructure for Public Use and Informed Decision Making in Developing Countries-Nigeria

Authors: Busayo Fashoto, Abdulhakeem Shaibu, Justice Agbadu, Samuel Aiyeoribe

Abstract:

Data has gone from just rows and columns to being an infrastructure itself. The traditional medium of data infrastructure has been managed by individuals in different industries and saved on personal work tools; one of such is the laptop. This hinders data sharing and Sustainable Development Goal (SDG) 9 for infrastructure sustainability across all countries and regions. However, there has been a constant demand for data across different agencies and ministries by investors and decision-makers. The rapid development and adoption of open-source technologies that promote the collection and processing of data in new ways and in ever-increasing volumes are creating new data infrastructure in sectors such as lands and health, among others. This paper examines the process of developing data infrastructure and, by extension, a data portal to provide baseline data for sustainable development and decision making in Nigeria. This paper employs the FAIR principle (Findable, Accessible, Interoperable, and Reusable) of data management using open-source technology tools to develop data portals for public use. eHealth Africa, an organization that uses technology to drive public health interventions in Nigeria, developed a data portal which is a typical data infrastructure that serves as a repository for various datasets on administrative boundaries, points of interest, settlements, social infrastructure, amenities, and others. This portal makes it possible for users to have access to datasets of interest at any point in time at no cost. A skeletal infrastructure of this data portal encompasses the use of open-source technology such as Postgres database, GeoServer, GeoNetwork, and CKan. These tools made the infrastructure sustainable, thus promoting the achievement of SDG 9 (Industries, Innovation, and Infrastructure). As of 6th August 2021, a wider cross-section of 8192 users had been created, 2262 datasets had been downloaded, and 817 maps had been created from the platform. This paper shows the use of rapid development and adoption of technologies that facilitates data collection, processing, and publishing in new ways and in ever-increasing volumes. In addition, the paper is explicit on new data infrastructure in sectors such as health, social amenities, and agriculture. Furthermore, this paper reveals the importance of cross-sectional data infrastructures for planning and decision making, which in turn can form a central data repository for sustainable development across developing countries.

Keywords: data portal, data infrastructure, open source, sustainability

Procedia PDF Downloads 98
24679 Process Data-Driven Representation of Abnormalities for Efficient Process Control

Authors: Hyun-Woo Cho

Abstract:

Unexpected operational events or abnormalities of industrial processes have a serious impact on the quality of final product of interest. In terms of statistical process control, fault detection and diagnosis of processes is one of the essential tasks needed to run the process safely. In this work, nonlinear representation of process measurement data is presented and evaluated using a simulation process. The effect of using different representation methods on the diagnosis performance is tested in terms of computational efficiency and data handling. The results have shown that the nonlinear representation technique produced more reliable diagnosis results and outperforms linear methods. The use of data filtering step improved computational speed and diagnosis performance for test data sets. The presented scheme is different from existing ones in that it attempts to extract the fault pattern in the reduced space, not in the original process variable space. Thus this scheme helps to reduce the sensitivity of empirical models to noise.

Keywords: fault diagnosis, nonlinear technique, process data, reduced spaces

Procedia PDF Downloads 247
24678 Text-to-Speech in Azerbaijani Language via Transfer Learning in a Low Resource Environment

Authors: Dzhavidan Zeinalov, Bugra Sen, Firangiz Aslanova

Abstract:

Most text-to-speech models cannot operate well in low-resource languages and require a great amount of high-quality training data to be considered good enough. Yet, with the improvements made in ASR systems, it is now much easier than ever to collect data for the design of custom text-to-speech models. In this work, our work on using the ASR model to collect data to build a viable text-to-speech system for one of the leading financial institutions of Azerbaijan will be outlined. NVIDIA’s implementation of the Tacotron 2 model was utilized along with the HiFiGAN vocoder. As for the training, the model was first trained with high-quality audio data collected from the Internet, then fine-tuned on the bank’s single speaker call center data. The results were then evaluated by 50 different listeners and got a mean opinion score of 4.17, displaying that our method is indeed viable. With this, we have successfully designed the first text-to-speech model in Azerbaijani and publicly shared 12 hours of audiobook data for everyone to use.

Keywords: Azerbaijani language, HiFiGAN, Tacotron 2, text-to-speech, transfer learning, whisper

Procedia PDF Downloads 45
24677 An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data

Authors: Ruchika Malhotra, Megha Khanna

Abstract:

The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.

Keywords: change proneness, empirical validation, imbalanced learning, machine learning techniques, object-oriented metrics

Procedia PDF Downloads 418
24676 Quality of Age Reporting from Tanzania 2012 Census Results: An Assessment Using Whipple’s Index, Myer’s Blended Index, and Age-Sex Accuracy Index

Authors: A. Sathiya Susuman, Hamisi F. Hamisi

Abstract:

Background: Many socio-economic and demographic data are age-sex attributed. However, a variety of irregularities and misstatement are noted with respect to age-related data and less to sex data because of its biological differences between the genders. Noting the misstatement/misreporting of age data regardless of its significance importance in demographics and epidemiological studies, this study aims at assessing the quality of 2012 Tanzania Population and Housing Census Results. Methods: Data for the analysis are downloaded from Tanzania National Bureau of Statistics. Age heaping and digit preference were measured using summary indices viz., Whipple’s index, Myers’ blended index, and Age-Sex Accuracy index. Results: The recorded Whipple’s index for both sexes was 154.43; male has the lowest index of about 152.65 while female has the highest index of about 156.07. For Myers’ blended index, the preferences were at digits ‘0’ and ‘5’ while avoidance were at digits ‘1’ and ‘3’ for both sexes. Finally, Age-sex index stood at 59.8 where sex ratio score was 5.82 and age ratio scores were 20.89 and 21.4 for males and female respectively. Conclusion: The evaluation of the 2012 PHC data using the demographic techniques has qualified the data inaccurate as the results of systematic heaping and digit preferences/avoidances. Thus, innovative methods in data collection along with measuring and minimizing errors using statistical techniques should be used to ensure accuracy of age data.

Keywords: age heaping, digit preference/avoidance, summary indices, Whipple’s index, Myer’s index, age-sex accuracy index

Procedia PDF Downloads 476