Search results for: large amounts of data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 29260

Search results for: large amounts of data

28990 Social Data Aggregator and Locator of Knowledge (STALK)

Authors: Rashmi Raghunandan, Sanjana Shankar, Rakshitha K. Bhat

Abstract:

Social media contributes a vast amount of data and information about individuals to the internet. This project will greatly reduce the need for unnecessary manual analysis of large and diverse social media profiles by filtering out and combining the useful information from various social media profiles, eliminating irrelevant data. It differs from the existing social media aggregators in that it does not provide a consolidated view of various profiles. Instead, it provides consolidated INFORMATION derived from the subject’s posts and other activities. It also allows analysis over multiple profiles and analytics based on several profiles. We strive to provide a query system to provide a natural language answer to questions when a user does not wish to go through the entire profile. The information provided can be filtered according to the different use cases it is used for.

Keywords: social network, analysis, Facebook, Linkedin, git, big data

Procedia PDF Downloads 417
28989 Business-Intelligence Mining of Large Decentralized Multimedia Datasets with a Distributed Multi-Agent System

Authors: Karima Qayumi, Alex Norta

Abstract:

The rapid generation of high volume and a broad variety of data from the application of new technologies pose challenges for the generation of business-intelligence. Most organizations and business owners need to extract data from multiple sources and apply analytical methods for the purposes of developing their business. Therefore, the recently decentralized data management environment is relying on a distributed computing paradigm. While data are stored in highly distributed systems, the implementation of distributed data-mining techniques is a challenge. The aim of this technique is to gather knowledge from every domain and all the datasets stemming from distributed resources. As agent technologies offer significant contributions for managing the complexity of distributed systems, we consider this for next-generation data-mining processes. To demonstrate agent-based business intelligence operations, we use agent-oriented modeling techniques to develop a new artifact for mining massive datasets.

Keywords: agent-oriented modeling (AOM), business intelligence model (BIM), distributed data mining (DDM), multi-agent system (MAS)

Procedia PDF Downloads 395
28988 Automated Testing to Detect Instance Data Loss in Android Applications

Authors: Anusha Konduru, Zhiyong Shan, Preethi Santhanam, Vinod Namboodiri, Rajiv Bagai

Abstract:

Mobile applications are increasing in a significant amount, each to address the requirements of many users. However, the quick developments and enhancements are resulting in many underlying defects. Android apps create and handle a large variety of 'instance' data that has to persist across runs, such as the current navigation route, workout results, antivirus settings, or game state. Due to the nature of Android, an app can be paused, sent into the background, or killed at any time. If the instance data is not saved and restored between runs, in addition to data loss, partially-saved or corrupted data can crash the app upon resume or restart. However, it is difficult for the programmer to manually test this issue for all the activities. This results in the issue of data loss that the data entered by the user are not saved when there is any interruption. This issue can degrade user experience because the user needs to reenter the information each time there is an interruption. Automated testing to detect such data loss is important to improve the user experience. This research proposes a tool, DroidDL, a data loss detector for Android, which detects the instance data loss from a given android application. We have tested 395 applications and found 12 applications with the issue of data loss. This approach is proved highly accurate and reliable to find the apps with this defect, which can be used by android developers to avoid such errors.

Keywords: Android, automated testing, activity, data loss

Procedia PDF Downloads 211
28987 Genodata: The Human Genome Variation Using BigData

Authors: Surabhi Maiti, Prajakta Tamhankar, Prachi Uttam Mehta

Abstract:

Since the accomplishment of the Human Genome Project, there has been an unparalled escalation in the sequencing of genomic data. This project has been the first major vault in the field of medical research, especially in genomics. This project won accolades by using a concept called Bigdata which was earlier, extensively used to gain value for business. Bigdata makes use of data sets which are generally in the form of files of size terabytes, petabytes, or exabytes and these data sets were traditionally used and managed using excel sheets and RDBMS. The voluminous data made the process tedious and time consuming and hence a stronger framework called Hadoop was introduced in the field of genetic sciences to make data processing faster and efficient. This paper focuses on using SPARK which is gaining momentum with the advancement of BigData technologies. Cloud Storage is an effective medium for storage of large data sets which is generated from the genetic research and the resultant sets produced from SPARK analysis.

Keywords: human genome project, Bigdata, genomic data, SPARK, cloud storage, Hadoop

Procedia PDF Downloads 227
28986 Conductivity-Depth Inversion of Large Loop Transient Electromagnetic Sounding Data over Layered Earth Models

Authors: Ravi Ande, Mousumi Hazari

Abstract:

One of the common geophysical techniques for mapping subsurface geo-electrical structures, extensive hydro-geological research, and engineering and environmental geophysics applications is the use of time domain electromagnetic (TDEM)/transient electromagnetic (TEM) soundings. A large transmitter loop for energising the ground and a small receiver loop or magnetometer for recording the transient voltage or magnetic field in the air or on the surface of the earth, with the receiver at the center of the loop or at any random point inside or outside the source loop, make up a large loop TEM system. In general, one can acquire data using one of the configurations with a large loop source, namely, with the receiver at the center point of the loop (central loop method), at an arbitrary in-loop point (in-loop method), coincident with the transmitter loop (coincidence-loop method), and at an arbitrary offset loop point (offset-loop method), respectively. Because of the mathematical simplicity associated with the expressions of EM fields, as compared to the in-loop and offset-loop systems, the central loop system (for ground surveys) and coincident loop system (for ground as well as airborne surveys) have been developed and used extensively for the exploration of mineral and geothermal resources, for mapping contaminated groundwater caused by hazardous waste and thickness of permafrost layer. Because a proper analytical expression for the TEM response over the layered earth model for the large loop TEM system does not exist, the forward problem used in this inversion scheme is first formulated in the frequency domain and then it is transformed in the time domain using Fourier cosine or sine transforms. Using the EMLCLLER algorithm, the forward computation is initially carried out in the frequency domain. As a result, the EMLCLLER modified the forward calculation scheme in NLSTCI to compute frequency domain answers before converting them to the time domain using Fourier Cosine and/or Sine transforms.

Keywords: time domain electromagnetic (TDEM), TEM system, geoelectrical sounding structure, Fourier cosine

Procedia PDF Downloads 66
28985 Landslide Susceptibility Analysis in the St. Lawrence Lowlands Using High Resolution Data and Failure Plane Analysis

Authors: Kevin Potoczny, Katsuichiro Goda

Abstract:

The St. Lawrence lowlands extend from Ottawa to Quebec City and are known for large deposits of sensitive Leda clay. Leda clay deposits are responsible for many large landslides, such as the 1993 Lemieux and 2010 St. Jude (4 fatalities) landslides. Due to the large extent and sensitivity of Leda clay, regional hazard analysis for landslides is an important tool in risk management. A 2018 regional study by Farzam et al. on the susceptibility of Leda clay slopes to landslide hazard uses 1 arc second topographical data. A qualitative method known as Hazus is used to estimate susceptibility by checking for various criteria in a location and determine a susceptibility rating on a scale of 0 (no susceptibility) to 10 (very high susceptibility). These criteria are slope angle, geological group, soil wetness, and distance from waterbodies. Given the flat nature of St. Lawrence lowlands, the current assessment fails to capture local slopes, such as the St. Jude site. Additionally, the data did not allow one to analyze failure planes accurately. This study majorly improves the analysis performed by Farzam et al. in two aspects. First, regional assessment with high resolution data allows for identification of local locations that may have been previously identified as low susceptibility. This then provides the opportunity to conduct a more refined analysis on the failure plane of the slope. Slopes derived from 1 arc second data are relatively gentle (0-10 degrees) across the region; however, the 1- and 2-meter resolution 2022 HRDEM provided by NRCAN shows that short, steep slopes are present. At a regional level, 1 arc second data can underestimate the susceptibility of short, steep slopes, which can be dangerous as Leda clay landslides behave retrogressively and travel upwards into flatter terrain. At the location of the St. Jude landslide, slope differences are significant. 1 arc second data shows a maximum slope of 12.80 degrees and a mean slope of 4.72 degrees, while the HRDEM data shows a maximum slope of 56.67 degrees and a mean slope of 10.72 degrees. This equates to a difference of three susceptibility levels when the soil is dry and one susceptibility level when wet. The use of GIS software is used to create a regional susceptibility map across the St. Lawrence lowlands at 1- and 2-meter resolutions. Failure planes are necessary to differentiate between small and large landslides, which have so far been ignored in regional analysis. Leda clay failures can only retrogress as far as their failure planes, so the regional analysis must be able to transition smoothly into a more robust local analysis. It is expected that slopes within the region, once previously assessed at low susceptibility scores, contain local areas of high susceptibility. The goal is to create opportunities for local failure plane analysis to be undertaken, which has not been possible before. Due to the low resolution of previous regional analyses, any slope near a waterbody could be considered hazardous. However, high-resolution regional analysis would allow for more precise determination of hazard sites.

Keywords: hazus, high-resolution DEM, leda clay, regional analysis, susceptibility

Procedia PDF Downloads 48
28984 Design and Development of a Platform for Analyzing Spatio-Temporal Data from Wireless Sensor Networks

Authors: Walid Fantazi

Abstract:

The development of sensor technology (such as microelectromechanical systems (MEMS), wireless communications, embedded systems, distributed processing and wireless sensor applications) has contributed to a broad range of WSN applications which are capable of collecting a large amount of spatiotemporal data in real time. These systems require real-time data processing to manage storage in real time and query the data they process. In order to cover these needs, we propose in this paper a Snapshot spatiotemporal data model based on object-oriented concepts. This model allows saving storing and reducing data redundancy which makes it easier to execute spatiotemporal queries and save analyzes time. Further, to ensure the robustness of the system as well as the elimination of congestion from the main access memory we propose a spatiotemporal indexing technique in RAM called Captree *. As a result, we offer an RIA (Rich Internet Application) -based SOA application architecture which allows the remote monitoring and control.

Keywords: WSN, indexing data, SOA, RIA, geographic information system

Procedia PDF Downloads 226
28983 Determination of the Bank's Customer Risk Profile: Data Mining Applications

Authors: Taner Ersoz, Filiz Ersoz, Seyma Ozbilge

Abstract:

In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.

Keywords: client classification, loan suitability, risk rating, CART analysis

Procedia PDF Downloads 317
28982 Fast Bayesian Inference of Multivariate Block-Nearest Neighbor Gaussian Process (NNGP) Models for Large Data

Authors: Carlos Gonzales, Zaida Quiroz, Marcos Prates

Abstract:

Several spatial variables collected at the same location that share a common spatial distribution can be modeled simultaneously through a multivariate geostatistical model that takes into account the correlation between these variables and the spatial autocorrelation. The main goal of this model is to perform spatial prediction of these variables in the region of study. Here we focus on a geostatistical multivariate formulation that relies on sharing common spatial random effect terms. In particular, the first response variable can be modeled by a mean that incorporates a shared random spatial effect, while the other response variables depend on this shared spatial term, in addition to specific random spatial effects. Each spatial random effect is defined through a Gaussian process with a valid covariance function, but in order to improve the computational efficiency when the data are large, each Gaussian process is approximated to a Gaussian random Markov field (GRMF), specifically to the block nearest neighbor Gaussian process (Block-NNGP). This approach involves dividing the spatial domain into several dependent blocks under certain constraints, where the cross blocks allow capturing the spatial dependence on a large scale, while each individual block captures the spatial dependence on a smaller scale. The multivariate geostatistical model belongs to the class of Latent Gaussian Models; thus, to achieve fast Bayesian inference, it is used the integrated nested Laplace approximation (INLA) method. The good performance of the proposed model is shown through simulations and applications for massive data.

Keywords: Block-NNGP, geostatistics, gaussian process, GRMF, INLA, multivariate models.

Procedia PDF Downloads 66
28981 User Intention Generation with Large Language Models Using Chain-of-Thought Prompting Title

Authors: Gangmin Li, Fan Yang

Abstract:

Personalized recommendation is crucial for any recommendation system. One of the techniques for personalized recommendation is to identify the intention. Traditional user intention identification uses the user’s selection when facing multiple items. This modeling relies primarily on historical behaviour data resulting in challenges such as the cold start, unintended choice, and failure to capture intention when items are new. Motivated by recent advancements in Large Language Models (LLMs) like ChatGPT, we present an approach for user intention identification by embracing LLMs with Chain-of-Thought (CoT) prompting. We use the initial user profile as input to LLMs and design a collection of prompts to align the LLM's response through various recommendation tasks encompassing rating prediction, search and browse history, user clarification, etc. Our tests on real-world datasets demonstrate the improvements in recommendation by explicit user intention identification and, with that intention, merged into a user model.

Keywords: personalized recommendation, generative user modelling, user intention identification, large language models, chain-of-thought prompting

Procedia PDF Downloads 12
28980 Large-Scale Electroencephalogram Biometrics through Contrastive Learning

Authors: Mostafa ‘Neo’ Mohsenvand, Mohammad Rasool Izadi, Pattie Maes

Abstract:

EEG-based biometrics (user identification) has been explored on small datasets of no more than 157 subjects. Here we show that the accuracy of modern supervised methods falls rapidly as the number of users increases to a few thousand. Moreover, supervised methods require a large amount of labeled data for training which limits their applications in real-world scenarios where acquiring data for training should not take more than a few minutes. We show that using contrastive learning for pre-training, it is possible to maintain high accuracy on a dataset of 2130 subjects while only using a fraction of labels. We compare 5 different self-supervised tasks for pre-training of the encoder where our proposed method achieves the accuracy of 96.4%, improving the baseline supervised models by 22.75% and the competing self-supervised model by 3.93%. We also study the effects of the length of the signal and the number of channels on the accuracy of the user-identification models. Our results reveal that signals from temporal and frontal channels contain more identifying features compared to other channels.

Keywords: brainprint, contrastive learning, electroencephalo-gram, self-supervised learning, user identification

Procedia PDF Downloads 132
28979 Efficient Principal Components Estimation of Large Factor Models

Authors: Rachida Ouysse

Abstract:

This paper proposes a constrained principal components (CnPC) estimator for efficient estimation of large-dimensional factor models when errors are cross sectionally correlated and the number of cross-sections (N) may be larger than the number of observations (T). Although principal components (PC) method is consistent for any path of the panel dimensions, it is inefficient as the errors are treated to be homoskedastic and uncorrelated. The new CnPC exploits the assumption of bounded cross-sectional dependence, which defines Chamberlain and Rothschild’s (1983) approximate factor structure, as an explicit constraint and solves a constrained PC problem. The CnPC method is computationally equivalent to the PC method applied to a regularized form of the data covariance matrix. Unlike maximum likelihood type methods, the CnPC method does not require inverting a large covariance matrix and thus is valid for panels with N ≥ T. The paper derives a convergence rate and an asymptotic normality result for the CnPC estimators of the common factors. We provide feasible estimators and show in a simulation study that they are more accurate than the PC estimator, especially for panels with N larger than T, and the generalized PC type estimators, especially for panels with N almost as large as T.

Keywords: high dimensionality, unknown factors, principal components, cross-sectional correlation, shrinkage regression, regularization, pseudo-out-of-sample forecasting

Procedia PDF Downloads 123
28978 Degradation of EE2 by Different Consortium of Enriched Nitrifying Activated Sludge

Authors: Pantip Kayee

Abstract:

17α-ethinylestradiol (EE2) is a recalcitrant micropollutant which is found in small amounts in municipal wastewater. But these small amounts still adversely affect for the reproductive function of aquatic organisms. Evidence in the past suggested that full-scale WWTPs equipped with nitrification process enhanced the removal of EE2 in the municipal wastewater. EE2 has been proven to be able to be transformed by ammonia oxidizing bacteria (AOB) via co-metabolism. This research aims to clarify the EE2 degradation pattern by different consortium of ammonia oxidizing microorganism (AOM) including AOA (ammonia oxidizing archaea) and investigate contribution between the existing ammonia monooxygenase (AMO) and new synthesized AOM. The result showed that AOA or AOB of N. oligotropha cluster in enriched nitrifying activated sludge (NAS) from 2mM and 5mM, commonly found in municipal WWTPs, could degrade EE2 in wastewater via co-metabolism. Moreover, the investigation of the contribution between the existing ammonia monooxygenase (AMO) and new synthesized AOM demonstrated that the new synthesized AMO enzyme may perform ammonia oxidation rather than the existing AMO enzyme or the existing AMO enzyme may has a small amount to oxidize ammonia.

Keywords: 17α-ethinylestradiol, nitrification, ammonia oxidizing bacteria, ammonia oxidizing archaea

Procedia PDF Downloads 260
28977 Developing an Intervention Program to Promote Healthy Eating in a Catering System Based on Qualitative Research Results

Authors: O. Katz-Shufan, T. Simon-Tuval, L. Sabag, L. Granek, D. R. Shahar

Abstract:

Meals provided at catering systems are a common source of workers' nutrition and were found as contributing high amounts calories and fat. Thus, eating daily catering food can lead to overweight and chronic diseases. On the other hand, the institutional dining room may be an ideal environment for implementation of intervention programs that promote healthy eating. This may improve diners' lifestyle and reduce their prevalence of overweight, obesity and chronic diseases. The significance of this study is in developing an intervention program based on the diners’ dietary habits, preferences and their attitudes towards various intervention programs. In addition, a successful catering-based intervention program may have a significant effect simultaneously on a large group of diners, leading to improved nutrition, healthier lifestyle, and disease-prevention on a large scale. In order to develop the intervention program, we conducted a qualitative study. We interviewed 13 diners who eat regularly at catering systems, using a semi-structured interview. The interviews were recorded, transcribed and then analyzed by the thematic method, which identifies, analyzes and reports themes within the data. The interviews revealed several major themes, including expectation of diners to be provided with healthy food choices; their request for nutrition-expert involvement in planning the meals; the diners' feel that there is a conflict between sensory attractiveness of the food and its' nutritional quality. In the context of the catering-based intervention programs, the diners prefer scientific and clear messages focusing on labeling healthy dishes only, as opposed to the labeling of unhealthy dishes; they were interested in a nutritional education program to accompany the intervention program. Based on these findings, we have developed an intervention program that includes: changes in food served such as replacing several menu items and nutritional improvement of some of the recipes; as well as, environmental changes such as changing the location of some food items presented on the buffet, placing positive nutritional labels on healthy dishes and an ongoing healthy nutrition campaign, all accompanied by a nutrition education program. The intervention program is currently being tested for its impact on health outcomes and its cost-effectiveness.

Keywords: catering system, food services, intervention, nutrition policy, public health, qualitative research

Procedia PDF Downloads 164
28976 Nazca: A Context-Based Matching Method for Searching Heterogeneous Structures

Authors: Karine B. de Oliveira, Carina F. Dorneles

Abstract:

The structure level matching is the problem of combining elements of a structure, which can be represented as entities, classes, XML elements, web forms, and so on. This is a challenge due to large number of distinct representations of semantically similar structures. This paper describes a structure-based matching method applied to search for different representations in data sources, considering the similarity between elements of two structures and the data source context. Using real data sources, we have conducted an experimental study comparing our approach with our baseline implementation and with another important schema matching approach. We demonstrate that our proposal reaches higher precision than the baseline.

Keywords: context, data source, index, matching, search, similarity, structure

Procedia PDF Downloads 334
28975 Large Neural Networks Learning From Scratch With Very Few Data and Without Explicit Regularization

Authors: Christoph Linse, Thomas Martinetz

Abstract:

Recent findings have shown that Neural Networks generalize also in over-parametrized regimes with zero training error. This is surprising, since it is completely against traditional machine learning wisdom. In our empirical study we fortify these findings in the domain of fine-grained image classification. We show that very large Convolutional Neural Networks with millions of weights do learn with only a handful of training samples and without image augmentation, explicit regularization or pretraining. We train the architectures ResNet018, ResNet101 and VGG19 on subsets of the difficult benchmark datasets Caltech101, CUB_200_2011, FGVCAircraft, Flowers102 and StanfordCars with 100 classes and more, perform a comprehensive comparative study and draw implications for the practical application of CNNs. Finally, we show that VGG19 with 140 million weights learns to distinguish airplanes and motorbikes with up to 95% accuracy using only 20 training samples per class.

Keywords: convolutional neural networks, fine-grained image classification, generalization, image recognition, over-parameterized, small data sets

Procedia PDF Downloads 58
28974 Integration of Knowledge and Metadata for Complex Data Warehouses and Big Data

Authors: Jean Christian Ralaivao, Fabrice Razafindraibe, Hasina Rakotonirainy

Abstract:

This document constitutes a resumption of work carried out in the field of complex data warehouses (DW) relating to the management and formalization of knowledge and metadata. It offers a methodological approach for integrating two concepts, knowledge and metadata, within the framework of a complex DW architecture. The objective of the work considers the use of the technique of knowledge representation by description logics and the extension of Common Warehouse Metamodel (CWM) specifications. This will lead to a fallout in terms of the performance of a complex DW. Three essential aspects of this work are expected, including the representation of knowledge in description logics and the declination of this knowledge into consistent UML diagrams while respecting or extending the CWM specifications and using XML as pivot. The field of application is large but will be adapted to systems with heteroge-neous, complex and unstructured content and moreover requiring a great (re)use of knowledge such as medical data warehouses.

Keywords: data warehouse, description logics, integration, knowledge, metadata

Procedia PDF Downloads 107
28973 Study of Management of Waste Construction Materials in Civil Engineering Projects

Authors: Jalindar R. Patil, Harish P. Gayakwad

Abstract:

The increased economic growth across the globe as well as urbanization in developing countries have led into extensive construction activities that generate large amounts of wastes. Material wastage in construction projects resulted into huge financial setbacks to builders and contractors. In addition to this, it may also cause significant effects over aesthetics, health, and the general environment. However in many cities across the globe where construction wastes material management is still a problem. In this paper, the discussion is all about the method for the management of waste construction materials. The objectives of this seminar are to identify the significant source of construction waste globally, to improve the performance of by extracting the major barriers construction waste management and to determine the cost impact on the construction project. These wastes needs to be managed as well as their impacts needs to be ascertained to pave way for their proper management. The seminar includes the details of construction waste management with the reference to construction project. The application of construction waste management in the civil engineering projects is to describe the reduction in the construction wastes.

Keywords: civil engineering, construction materials, waste management, construction activities

Procedia PDF Downloads 483
28972 Regulation on the Protection of Personal Data Versus Quality Data Assurance in the Healthcare System Case Report

Authors: Elizabeta Krstić Vukelja

Abstract:

Digitization of personal data is a consequence of the development of information and communication technologies that create a new work environment with many advantages and challenges, but also potential threats to privacy and personal data protection. Regulation (EU) 2016/679 of the European Parliament and of the Council is becoming a law and obligation that should address the issues of personal data protection and information security. The existence of the Regulation leads to the conclusion that national legislation in the field of virtual environment, protection of the rights of EU citizens and processing of their personal data is insufficiently effective. In the health system, special emphasis is placed on the processing of special categories of personal data, such as health data. The healthcare industry is recognized as a particularly sensitive area in which a large amount of medical data is processed, the digitization of which enables quick access and quick identification of the health insured. The protection of the individual requires quality IT solutions that guarantee the technical protection of personal categories. However, the real problems are the technical and human nature and the spatial limitations of the application of the Regulation. Some conclusions will be drawn by analyzing the implementation of the basic principles of the Regulation on the example of the Croatian health care system and comparing it with similar activities in other EU member states.

Keywords: regulation, healthcare system, personal dana protection, quality data assurance

Procedia PDF Downloads 11
28971 Enhancing Large Language Models' Data Analysis Capability with Planning-and-Execution and Code Generation Agents: A Use Case for Southeast Asia Real Estate Market Analytics

Authors: Kien Vu, Jien Min Soh, Mohamed Jahangir Abubacker, Piyawut Pattamanon, Soojin Lee, Suvro Banerjee

Abstract:

Recent advances in Generative Artificial Intelligence (GenAI), in particular Large Language Models (LLMs) have shown promise to disrupt multiple industries at scale. However, LLMs also present unique challenges, notably, these so-called "hallucination" which is the generation of outputs that are not grounded in the input data that hinders its adoption into production. Common practice to mitigate hallucination problem is utilizing Retrieval Agmented Generation (RAG) system to ground LLMs'response to ground truth. RAG converts the grounding documents into embeddings, retrieve the relevant parts with vector similarity between user's query and documents, then generates a response that is not only based on its pre-trained knowledge but also on the specific information from the retrieved documents. However, the RAG system is not suitable for tabular data and subsequent data analysis tasks due to multiple reasons such as information loss, data format, and retrieval mechanism. In this study, we have explored a novel methodology that combines planning-and-execution and code generation agents to enhance LLMs' data analysis capabilities. The approach enables LLMs to autonomously dissect a complex analytical task into simpler sub-tasks and requirements, then convert them into executable segments of code. In the final step, it generates the complete response from output of the executed code. When deployed beta version on DataSense, the property insight tool of PropertyGuru, the approach yielded promising results, as it was able to provide market insights and data visualization needs with high accuracy and extensive coverage by abstracting the complexities for real-estate agents and developers from non-programming background. In essence, the methodology not only refines the analytical process but also serves as a strategic tool for real estate professionals, aiding in market understanding and enhancement without the need for programming skills. The implication extends beyond immediate analytics, paving the way for a new era in the real estate industry characterized by efficiency and advanced data utilization.

Keywords: large language model, reasoning, planning and execution, code generation, natural language processing, prompt engineering, data analysis, real estate, data sense, PropertyGuru

Procedia PDF Downloads 42
28970 Adaption Model for Building Agile Pronunciation Dictionaries Using Phonemic Distance Measurements

Authors: Akella Amarendra Babu, Rama Devi Yellasiri, Natukula Sainath

Abstract:

Where human beings can easily learn and adopt pronunciation variations, machines need training before put into use. Also humans keep minimum vocabulary and their pronunciation variations are stored in front-end of their memory for ready reference, while machines keep the entire pronunciation dictionary for ready reference. Supervised methods are used for preparation of pronunciation dictionaries which take large amounts of manual effort, cost, time and are not suitable for real time use. This paper presents an unsupervised adaptation model for building agile and dynamic pronunciation dictionaries online. These methods mimic human approach in learning the new pronunciations in real time. A new algorithm for measuring sound distances called Dynamic Phone Warping is presented and tested. Performance of the system is measured using an adaptation model and the precision metrics is found to be better than 86 percent.

Keywords: pronunciation variations, dynamic programming, machine learning, natural language processing

Procedia PDF Downloads 147
28969 Quantitative Analysis of Nutrient Inflow from River and Groundwater to Imazu Bay in Fukuoka, Japan

Authors: Keisuke Konishi, Yoshinari Hiroshiro, Kento Terashima, Atsushi Tsutsumi

Abstract:

Imazu Bay plays an important role for endangered species such as horseshoe crabs and black-faced spoonbills that stay in the bay for spawning or the passing of winter. However, this bay is semi-enclosed with slow water exchange, which could lead to eutrophication under the condition of excess nutrient inflow to the bay. Therefore, quantification of nutrient inflow is of great importance. Generally, analysis of nutrient inflow to the bays takes into consideration nutrient inflow from only the river, but that from groundwater should not be ignored for more accurate results. The main objective of this study is to estimate the amounts of nutrient inflow from river and groundwater to Imazu Bay by analyzing water budget in Zuibaiji River Basin and loads of T-N, T-P, NO3-N and NH4-N. The water budget computation in the basin is performed using groundwater recharge model and quasi three-dimensional two-phase groundwater flow model, and the multiplication of the measured amount of nutrient inflow with the computed discharge gives the total amount of nutrient inflow to the bay. In addition, in order to evaluate nutrient inflow to the bay, the result is compared with nutrient inflow from geologically similar river basins. The result shows that the discharge is 3.50×107 m3/year from the river and 1.04×107 m3/year from groundwater. The submarine groundwater discharge accounts for approximately 23 % of the total discharge, which is large compared to the other river basins. It is also revealed that the total nutrient inflow is not particularly large. The sum of NO3-N and NH4-N loadings from groundwater is less than 10 % of that from the river because of denitrification in groundwater. The Shin Seibu Sewage Treatment Plant located below the observation points discharges treated water of 15,400 m3/day and plans to increase it. However, the loads of T-N and T-P from the treatment plant are 3.9 mg/L and 0.19 mg/L, so that it does not contribute a lot to eutrophication.

Keywords: Eutrophication, groundwater recharge model, nutrient inflow, quasi three-dimensional two-phase groundwater flow model, submarine groundwater discharge

Procedia PDF Downloads 431
28968 Efficient Motion Estimation by Fast Three Step Search Algorithm

Authors: S. M. Kulkarni, D. S. Bormane, S. L. Nalbalwar

Abstract:

The rapid development in the technology have dramatic impact on the medical health care field. Medical data base obtained with latest machines like CT Machine, MRI scanner requires large amount of memory storage and also it requires large bandwidth for transmission of data in telemedicine applications. Thus, there is need for video compression. As the database of medical images contain number of frames (slices), hence while coding of these images there is need of motion estimation. Motion estimation finds out movement of objects in an image sequence and gets motion vectors which represents estimated motion of object in the frame. In order to reduce temporal redundancy between successive frames of video sequence, motion compensation is preformed. In this paper three step search (TSS) block matching algorithm is implemented on different types of video sequences. It is shown that three step search algorithm produces better quality performance and less computational time compared with exhaustive full search algorithm.

Keywords: block matching, exhaustive search motion estimation, three step search, video compression

Procedia PDF Downloads 456
28967 Managing HR Knowledge in a Large Privately Owned Enterprise: An Empirical Case Analysis

Authors: Cindy Wang-Cowham, Judy Ningyu Tang

Abstract:

The paper contributes towards the development of scarce literature on HR knowledge management. Drawing literature from knowledge management, the authors define the meaning of HR knowledge and propose that there are social mechanisms in organizations that facilitate the management and sharing of HR knowledge. Instead of investigating the subject in large multinational corporations, the present paper examines it in a large Chinese privately owned enterprise, which has an international standing. The main finding of the case analysis is that communication and feedback plays a pivotal role when managing HR knowledge. Social mechanisms can stimulate the communication and feedback between employees, thus facilitate knowledge exchange.

Keywords: HR knowledge, knowledge management, large privately owned enterprises, China

Procedia PDF Downloads 502
28966 Identification of Potential Large Scale Floating Solar Sites in Peninsular Malaysia

Authors: Nur Iffika Ruslan, Ahmad Rosly Abbas, Munirah Stapah@Salleh, Nurfaziera Rahim

Abstract:

Increased concerns and awareness of environmental hazards by fossil fuels burning for energy have become the major factor driving the transition toward green energy. It is expected that an additional of 2,000 MW of renewable energy is to be recorded from the renewable sources by 2025 following the implementation of Large Scale Solar projects in Peninsular Malaysia, including Large Scale Floating Solar projects. Floating Solar has better advantages over its landed counterparts such as the requirement for land acquisition is relatively insignificant. As part of the site selection process established by TNB Research Sdn. Bhd., a set of mandatory and rejection criteria has been developed in order to identify only sites that are feasible for the future development of Large Scale Floating Solar power plant. There are a total of 85 lakes and reservoirs identified within Peninsular Malaysia. Only lakes and reservoirs with a minimum surface area of 120 acres will be considered as potential sites for the development of Large Scale Floating Solar power plant. The result indicates a total of 10 potential Large Scale Floating Solar sites identified which are located in Selangor, Johor, Perak, Pulau Pinang, Perlis and Pahang. This paper will elaborate on the various mandatory and rejection criteria, as well as on the various site selection process required to identify potential (suitable) Large Scale Floating Solar sites in Peninsular Malaysia.

Keywords: Large Scale Floating Solar, Peninsular Malaysia, Potential Sites, Renewable Energy

Procedia PDF Downloads 156
28965 Managing Construction and Demolition Wastes - A Case Study of Multi Triagem, Lda

Authors: Cláudia Moço, Maria Santos, Carlos Arsénio, Débora Mendes, Miguel Oliveira. José Paulo Da Silva

Abstract:

Construction industry generates large amounts of waste all over the world. About 450 million tons of construction and demolition wastes (C&DW) are produced annually in the European Union. C&DW are highly heterogeneous materials in size and composition, which imposes strong difficulties on their management. Directive n.º 2008/98/CE, of the European Parliament and of the Council of 6 November establishes that 70 % of the C&DW have to be recycled by 2020. To evaluate possible applications of these materials, a detailed physical, chemical and environmental characterization is necessary. Multi Triagem, Lda. is a company located in Algarve (Portugal) and was supported by the European Regional Development Fund (grant QREN 30307 Multivalor) to quantify and characterize the received C&DW, in order to evaluate their possible applications. This evaluation, performed in collaboration with the University of Algarve, involves a physical, chemical and environmental detailed characterization of the received C&DW. In this work we report on the amounts, trial procedures and properties of the C&DW received over a period of fifteen month. In this period the company received C&DW coming from 393 different origins. The total amount was 32.458 tons, mostly mixtures containing concrete, masonry/mortar and soil/rock. Most of C&DW came from demodulation constructions and diggings. The organic/inert component, namely metal, glass, wood and plastics, were screened first and account for about 3 % of the received materials. The remaining materials were screened and grouped according to their origin and contents, the latter evaluated by visual inspection. Twenty five samples were prepared and submitted to a detailed physical, chemical and environmental analysis. The C&DW aggregates show lower quality properties than natural aggregates for concrete preparation and unbound layers of road pavements. However, chemical analyzes indicated that most samples are environmentally safe. A continuous monitoring of the presence of heavy metals and organic compounds is needed in order to perform a proper screening of the C&DW. C&DW aggregates provide a good alternative to natural aggregates.

Keywords: construction and demolition wastes, waste classification, waste composition, waste screening

Procedia PDF Downloads 326
28964 Bit Error Rate Analysis of Multiband OFCDM UWB System in UWB Fading Channel

Authors: Sanjay M. Gulhane, Athar Ravish Khan, Umesh W. Kaware

Abstract:

Orthogonal frequency and code division multiplexing (OFCDM) has received large attention as a modulation scheme to realize high data rate transmission. Multiband (MB) Orthogonal frequency division multiplexing (OFDM) Ultra Wide Band (UWB) system become promising technique for high data rate due to its large number of advantage over Singleband (UWB) system, but it suffer from coherent frequency diversity problem. In this paper we have proposed MB-OFCDM UWB system, in which two-dimensional (2D) spreading (time and frequency domain spreading), has been introduced, combining OFDM with 2D spreading, proposed system can provide frequency diversity. This paper presents the basic structure and main functions of the MB-OFCDM system, and evaluates the bit error rate BER performance of MB-OFDM and MB-OFCDM system under UWB indoor multi-path channel model. It is observe that BER curve of MB-OFCDM UWB improve its performance by 2dB as compare to MB-OFDM UWB system.

Keywords: MB-OFDM UWB system, MB-OFCDM UWB system, UWB IEEE channel model, BER

Procedia PDF Downloads 518
28963 Numerical Modeling of Large Scale Dam Break Flows

Authors: Amanbek Jainakov, Abdikerim Kurbanaliev

Abstract:

The work presents the results of mathematical modeling of large-scale flows in areas with a complex topographic relief. The Reynolds-averaged Navier—Stokes equations constitute the basis of the three-dimensional unsteady modeling. The well-known Volume of Fluid method implemented in the solver interFoam of the open package OpenFOAM 2.3 is used to track the free-boundary location. The mathematical model adequacy is checked by comparing with experimental data. The efficiency of the applied technology is illustrated by the example of modeling the breakthrough of the dams of the Andijan (Uzbekistan) and Papan (near the Osh town, Kyrgyzstan) reservoir.

Keywords: three-dimensional modeling, free boundary, the volume-of-fluid method, dam break, flood, OpenFOAM

Procedia PDF Downloads 374
28962 Addressing the Exorbitant Cost of Labeling Medical Images with Active Learning

Authors: Saba Rahimi, Ozan Oktay, Javier Alvarez-Valle, Sujeeth Bharadwaj

Abstract:

Successful application of deep learning in medical image analysis necessitates unprecedented amounts of labeled training data. Unlike conventional 2D applications, radiological images can be three-dimensional (e.g., CT, MRI), consisting of many instances within each image. The problem is exacerbated when expert annotations are required for effective pixel-wise labeling, which incurs exorbitant labeling effort and cost. Active learning is an established research domain that aims to reduce labeling workload by prioritizing a subset of informative unlabeled examples to annotate. Our contribution is a cost-effective approach for U-Net 3D models that uses Monte Carlo sampling to analyze pixel-wise uncertainty. Experiments on the AAPM 2017 lung CT segmentation challenge dataset show that our proposed framework can achieve promising segmentation results by using only 42% of the training data.

Keywords: image segmentation, active learning, convolutional neural network, 3D U-Net

Procedia PDF Downloads 122
28961 Collision Theory Based Sentiment Detection Using Discourse Analysis in Hadoop

Authors: Anuta Mukherjee, Saswati Mukherjee

Abstract:

Data is growing everyday. Social networking sites such as Twitter are becoming an integral part of our daily lives, contributing a large increase in the growth of data. It is a rich source especially for sentiment detection or mining since people often express honest opinion through tweets. However, although sentiment analysis is a well-researched topic in text, this analysis using Twitter data poses additional challenges since these are unstructured data with abbreviations and without a strict grammatical correctness. We have employed collision theory to achieve sentiment analysis in Twitter data. We have also incorporated discourse analysis in the collision theory based model to detect accurate sentiment from tweets. We have also used the retweet field to assign weights to certain tweets and obtained the overall weightage of a topic provided in the form of a query. Hadoop has been exploited for speed. Our experiments show effective results.

Keywords: sentiment analysis, twitter, collision theory, discourse analysis

Procedia PDF Downloads 503