Search results for: large amounts of data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 29176

Search results for: large amounts of data

29176 Healthcare Big Data Analytics Using Hadoop

Authors: Chellammal Surianarayanan

Abstract:

Healthcare industry is generating large amounts of data driven by various needs such as record keeping, physician’s prescription, medical imaging, sensor data, Electronic Patient Record(EPR), laboratory, pharmacy, etc. Healthcare data is so big and complex that they cannot be managed by conventional hardware and software. The complexity of healthcare big data arises from large volume of data, the velocity with which the data is accumulated and different varieties such as structured, semi-structured and unstructured nature of data. Despite the complexity of big data, if the trends and patterns that exist within the big data are uncovered and analyzed, higher quality healthcare at lower cost can be provided. Hadoop is an open source software framework for distributed processing of large data sets across clusters of commodity hardware using a simple programming model. The core components of Hadoop include Hadoop Distributed File System which offers way to store large amount of data across multiple machines and MapReduce which offers way to process large data sets with a parallel, distributed algorithm on a cluster. Hadoop ecosystem also includes various other tools such as Hive (a SQL-like query language), Pig (a higher level query language for MapReduce), Hbase(a columnar data store), etc. In this paper an analysis has been done as how healthcare big data can be processed and analyzed using Hadoop ecosystem.

Keywords: big data analytics, Hadoop, healthcare data, towards quality healthcare

Procedia PDF Downloads 376
29175 Impact of Extended Enterprise Resource Planning in the Context of Cloud Computing on Industries and Organizations

Authors: Gholamreza Momenzadeh, Forough Nematolahi

Abstract:

The Extended Enterprise Resource Planning (ERPII) system usually requires massive amounts of storage space, powerful servers, and large upfront and ongoing investments to purchase and manage the software and the related hardware which are not affordable for organizations. In recent decades, organizations prefer to adapt their business structures with new technologies for remaining competitive in the world economy. Therefore, cloud computing (which is one of the tools of information technology (IT)) is a modern system that reveals the next-generation application architecture. Also, cloud computing has had some advantages that reduce costs in many ways such as: lower upfront costs for all computing infrastructure and lower cost of maintaining and supporting. On the other hand, traditional ERPII is not responding for huge amounts of data and relations between the organizations. In this study, based on a literature study, ERPII is investigated in the context of cloud computing where the organizations operate more efficiently. Also, ERPII conditions have a response to needs of organizations in large amounts of data and relations between the organizations.

Keywords: extended enterprise resource planning, cloud computing, business process, enterprise information integration

Procedia PDF Downloads 186
29174 Main Cause of Children's Deaths in Indigenous Wayuu Community from Department of La Guajira: A Research Developed through Data Mining Use

Authors: Isaura Esther Solano Núñez, David Suarez

Abstract:

The main purpose of this research is to discover what causes death in children of the Wayuu community, and deeply analyze those results in order to take corrective measures to properly control infant mortality. We consider important to determine the reasons that are producing early death in this specific type of population, since they are the most vulnerable to high risk environmental conditions. In this way, the government, through competent authorities, may develop prevention policies and the right measures to avoid an increase of this tragic fact. The methodology used to develop this investigation is data mining, which consists in gaining and examining large amounts of data to produce new and valuable information. Through this technique it has been possible to determine that the child population is dying mostly from malnutrition. In short, this technique has been very useful to develop this study; it has allowed us to transform large amounts of information into a conclusive and important statement, which has made it easier to take appropriate steps to resolve a particular situation.

Keywords: malnutrition, data mining, analytical, descriptive, population, Wayuu, indigenous

Procedia PDF Downloads 134
29173 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

Procedia PDF Downloads 365
29172 Modified InVEST for Whatsapp Messages Forensic Triage and Search through Visualization

Authors: Agria Rhamdhan

Abstract:

WhatsApp as the most popular mobile messaging app has been used as evidence in many criminal cases. As the use of mobile messages generates large amounts of data, forensic investigation faces the challenge of large data problems. The hardest part of finding this important evidence is because current practice utilizes tools and technique that require manual analysis to check all messages. That way, analyze large sets of mobile messaging data will take a lot of time and effort. Our work offers methodologies based on forensic triage to reduce large data to manageable sets resulting easier to do detailed reviews, then show the results through interactive visualization to show important term, entities and relationship through intelligent ranking using Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Dirichlet Allocation (LDA) Model. By implementing this methodology, investigators can improve investigation processing time and result's accuracy.

Keywords: forensics, triage, visualization, WhatsApp

Procedia PDF Downloads 126
29171 The Relationship Between Artificial Intelligence, Data Science, and Privacy

Authors: M. Naidoo

Abstract:

Artificial intelligence often requires large amounts of good quality data. Within important fields, such as healthcare, the training of AI systems predominately relies on health and personal data; however, the usage of this data is complicated by various layers of law and ethics that seek to protect individuals’ privacy rights. This research seeks to establish the challenges AI and data sciences pose to (i) informational rights, (ii) privacy rights, and (iii) data protection. To solve some of the issues presented, various methods are suggested, such as embedding values in technological development, proper balancing of rights and interests, and others.

Keywords: artificial intelligence, data science, law, policy

Procedia PDF Downloads 80
29170 A Review on Big Data Movement with Different Approaches

Authors: Nay Myo Sandar

Abstract:

With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.

Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques

Procedia PDF Downloads 51
29169 Li-Fi Technology: Data Transmission through Visible Light

Authors: Shahzad Hassan, Kamran Saeed

Abstract:

People are always in search of Wi-Fi hotspots because Internet is a major demand nowadays. But like all other technologies, there is still room for improvement in the Wi-Fi technology with regards to the speed and quality of connectivity. In order to address these aspects, Harald Haas, a professor at the University of Edinburgh, proposed what we know as the Li-Fi (Light Fidelity). Li-Fi is a new technology in the field of wireless communication to provide connectivity within a network environment. It is a two-way mode of wireless communication using light. Basically, the data is transmitted through Light Emitting Diodes which can vary the intensity of light very fast, even faster than the blink of an eye. From the research and experiments conducted so far, it can be said that Li-Fi can increase the speed and reliability of the transfer of data. This paper pays particular attention on the assessment of the performance of this technology. In other words, it is a 5G technology which uses LED as the medium of data transfer. For coverage within the buildings, Wi-Fi is good but Li-Fi can be considered favorable in situations where large amounts of data are to be transferred in areas with electromagnetic interferences. It brings a lot of data related qualities such as efficiency, security as well as large throughputs to the table of wireless communication. All in all, it can be said that Li-Fi is going to be a future phenomenon where the presence of light will mean access to the Internet as well as speedy data transfer.

Keywords: communication, LED, Li-Fi, Wi-Fi

Procedia PDF Downloads 313
29168 The Impact of Electronic Marketing on the Quality Banking Services

Authors: Ahmed Ghalem

Abstract:

The research to be explained is a collection of information about several public and private economic institutions. This information is represented in highlighting the large and useful role in adopting the method of electronic marketing. Which is widespread and easy to use among community members at the local and international levels. Which generates large sums of money with little effort and little time, and also satisfies the customers. Do these things, despite what we have said, run the risk of losing large amounts of money in a moment or a short time.

Keywords: economic, finance, bank, development, marketing

Procedia PDF Downloads 55
29167 Meanings and Concepts of Standardization in Systems Medicine

Authors: Imme Petersen, Wiebke Sick, Regine Kollek

Abstract:

In systems medicine, high-throughput technologies produce large amounts of data on different biological and pathological processes, including (disturbed) gene expressions, metabolic pathways and signaling. The large volume of data of different types, stored in separate databases and often located at different geographical sites have posed new challenges regarding data handling and processing. Tools based on bioinformatics have been developed to resolve the upcoming problems of systematizing, standardizing and integrating the various data. However, the heterogeneity of data gathered at different levels of biological complexity is still a major challenge in data analysis. To build multilayer disease modules, large and heterogeneous data of disease-related information (e.g., genotype, phenotype, environmental factors) are correlated. Therefore, a great deal of attention in systems medicine has been put on data standardization, primarily to retrieve and combine large, heterogeneous datasets into standardized and incorporated forms and structures. However, this data-centred concept of standardization in systems medicine is contrary to the debate in science and technology studies (STS) on standardization that rather emphasizes the dynamics, contexts and negotiations of standard operating procedures. Based on empirical work on research consortia that explore the molecular profile of diseases to establish systems medical approaches in the clinic in Germany, we trace how standardized data are processed and shaped by bioinformatics tools, how scientists using such data in research perceive such standard operating procedures and which consequences for knowledge production (e.g. modeling) arise from it. Hence, different concepts and meanings of standardization are explored to get a deeper insight into standard operating procedures not only in systems medicine, but also beyond.

Keywords: data, science and technology studies (STS), standardization, systems medicine

Procedia PDF Downloads 309
29166 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 216
29165 Verification & Validation of Map Reduce Program Model for Parallel K-Mediod Algorithm on Hadoop Cluster

Authors: Trapti Sharma, Devesh Kumar Srivastava

Abstract:

This paper is basically a analysis study of above MapReduce implementation and also to verify and validate the MapReduce solution model for Parallel K-Mediod algorithm on Hadoop Cluster. MapReduce is a programming model which authorize the managing of huge amounts of data in parallel, on a large number of devices. It is specially well suited to constant or moderate changing set of data since the implementation point of a position is usually high. MapReduce has slowly become the framework of choice for “big data”. The MapReduce model authorizes for systematic and instant organizing of large scale data with a cluster of evaluate nodes. One of the primary affect in Hadoop is how to minimize the completion length (i.e. makespan) of a set of MapReduce duty. In this paper, we have verified and validated various MapReduce applications like wordcount, grep, terasort and parallel K-Mediod clustering algorithm. We have found that as the amount of nodes increases the completion time decreases.

Keywords: hadoop, mapreduce, k-mediod, validation, verification

Procedia PDF Downloads 337
29164 Characterization of Optical Communication Channels as Non-Deterministic Model

Authors: Valentina Alessandra Carvalho do Vale, Elmo Thiago Lins Cöuras Ford

Abstract:

Increasingly telecommunications sectors are adopting optical technologies, due to its ability to transmit large amounts of data over long distances. However, as in all systems of data transmission, optical communication channels suffer from undesirable and non-deterministic effects, being essential to know the same. Thus, this research allows the assessment of these effects, as well as their characterization and beneficial uses of these effects.

Keywords: optical communication, optical fiber, non-deterministic effects, telecommunication

Procedia PDF Downloads 755
29163 Unsupervised Domain Adaptive Text Retrieval with Query Generation

Authors: Rui Yin, Haojie Wang, Xun Li

Abstract:

Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.

Keywords: dense retrieval, query generation, unsupervised training, text retrieval

Procedia PDF Downloads 36
29162 Remotely Sensed Data Fusion to Extract Vegetation Cover in the Cultural Park of Tassili, South of Algeria

Authors: Y. Fekir, K. Mederbal, M. A. Hammadouche, D. Anteur

Abstract:

The cultural park of the Tassili, occupying a large area of Algeria, is characterized by a rich vegetative biodiversity to be preserved and managed both in time and space. The management of a large area (case of Tassili), by its complexity, needs large amounts of data, which for the most part, are spatially localized (DEM, satellite images and socio-economic information etc.), where the use of conventional and traditional methods is quite difficult. The remote sensing, by its efficiency in environmental applications, became an indispensable solution for this kind of studies. Multispectral imaging sensors have been very useful in the last decade in very interesting applications of remote sensing. They can aid in several domains such as the de¬tection and identification of diverse surface targets, topographical details, and geological features. In this work, we try to extract vegetative areas using fusion techniques between data acquired from sensor on-board the Earth Observing 1 (EO-1) satellite and Landsat ETM+ and TM sensors. We have used images acquired over the Oasis of Djanet in the National Park of Tassili in the south of Algeria. Fusion technqiues were applied on the obtained image to extract the vegetative fraction of the different classes of land use. We compare the obtained results in vegetation end member extraction with vegetation indices calculated from both Hyperion and other multispectral sensors.

Keywords: Landsat ETM+, EO1, data fusion, vegetation, Tassili, Algeria

Procedia PDF Downloads 405
29161 Unlocking Health Insights: Studying Data for Better Care

Authors: Valentina Marutyan

Abstract:

Healthcare data mining is a rapidly developing field at the intersection of technology and medicine that has the potential to change our understanding and approach to providing healthcare. Healthcare and data mining is the process of examining huge amounts of data to extract useful information that can be applied in order to improve patient care, treatment effectiveness, and overall healthcare delivery. This field looks for patterns, trends, and correlations in a variety of healthcare datasets, such as electronic health records (EHRs), medical imaging, patient demographics, and treatment histories. To accomplish this, it uses advanced analytical approaches. Predictive analysis using historical patient data is a major area of interest in healthcare data mining. This enables doctors to get involved early to prevent problems or improve results for patients. It also assists in early disease detection and customized treatment planning for every person. Doctors can customize a patient's care by looking at their medical history, genetic profile, current and previous therapies. In this way, treatments can be more effective and have fewer negative consequences. Moreover, helping patients, it improves the efficiency of hospitals. It helps them determine the number of beds or doctors they require in regard to the number of patients they expect. In this project are used models like logistic regression, random forests, and neural networks for predicting diseases and analyzing medical images. Patients were helped by algorithms such as k-means, and connections between treatments and patient responses were identified by association rule mining. Time series techniques helped in resource management by predicting patient admissions. These methods improved healthcare decision-making and personalized treatment. Also, healthcare data mining must deal with difficulties such as bad data quality, privacy challenges, managing large and complicated datasets, ensuring the reliability of models, managing biases, limited data sharing, and regulatory compliance. Finally, secret code of data mining in healthcare helps medical professionals and hospitals make better decisions, treat patients more efficiently, and work more efficiently. It ultimately comes down to using data to improve treatment, make better choices, and simplify hospital operations for all patients.

Keywords: data mining, healthcare, big data, large amounts of data

Procedia PDF Downloads 29
29160 The Use of Remotely Sensed Data to Extract Wetlands Area in the Cultural Park of Ahaggar, South of Algeria

Authors: Y. Fekir, K. Mederbal, M. A. Hammadouche, D. Anteur

Abstract:

The cultural park of the Ahaggar, occupying a large area of Algeria, is characterized by a rich wetlands area to be preserved and managed both in time and space. The management of a large area, by its complexity, needs large amounts of data, which for the most part, are spatially localized (DEM, satellite images and socio-economic information...), where the use of conventional and traditional methods is quite difficult. The remote sensing, by its efficiency in environmental applications, became an indispensable solution for this kind of studies. Remote sensing imaging data have been very useful in the last decade in very interesting applications. They can aid in several domains such as the detection and identification of diverse wetland surface targets, topographical details, and geological features... In this work, we try to extract automatically wetlands area using multispectral remotely sensed data on-board the Earth Observing 1 (EO-1) and Landsat satellite. Both are high-resolution multispectral imager with a 30 m resolution. The instrument images an interesting surface area. We have used images acquired over the several area of interesting in the National Park of Ahaggar in the south of Algeria. An Extraction Algorithm is applied on the several spectral index obtained from combination of different spectral bands to extract wetlands fraction occupation of land use. The obtained results show an accuracy to distinguish wetlands area from the other lad use themes using a fine exploitation on spectral index.

Keywords: multispectral data, EO1, landsat, wetlands, Ahaggar, Algeria

Procedia PDF Downloads 350
29159 Frequent Item Set Mining for Big Data Using MapReduce Framework

Authors: Tamanna Jethava, Rahul Joshi

Abstract:

Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.

Keywords: frequent item set mining, big data, Hadoop, MapReduce

Procedia PDF Downloads 386
29158 Fabrication of Nanoengineered Radiation Shielding Multifunctional Polymeric Sandwich Composites

Authors: Nasim Abuali Galehdari, Venkat Mani, Ajit D. Kelkar

Abstract:

Space Radiation has become one of the major factors in successful long duration space exploration. Exposure to space radiation not only can affect the health of astronauts but also can disrupt or damage materials and electronics. Hazards to materials include degradation of properties, such as, modulus, strength, or glass transition temperature. Electronics may experience single event effects, gate rupture, burnout of field effect transistors and noise. Presently aluminum is the major component in most of the space structures due to its lightweight and good structural properties. However, aluminum is ineffective at blocking space radiation. Therefore, most of the past research involved studying at polymers which contain large amounts of hydrogen. Again, these materials are not structural materials and would require large amounts of material to achieve the structural properties needed. One of the materials to alleviate this problem is polymeric composite materials, which has good structural properties and use polymers that contained large amounts of hydrogen. This paper presents steps involved in fabrication of multi-functional hybrid sandwich panels that can provide beneficial radiation shielding as well as structural strength. Multifunctional hybrid sandwich panels were manufactured using vacuum assisted resin transfer molding process and were subjected to radiation treatment. Study indicates that various nanoparticles including Boron Nano powder, Boron Carbide and Gadolinium nanoparticles can be successfully used to block the space radiation without sacrificing the structural integrity.

Keywords: multi-functional, polymer composites, radiation shielding, sandwich composites

Procedia PDF Downloads 252
29157 Sustainable Treatment of Vegetable Oil Industry Wastewaters by Xanthomonas campestris

Authors: Bojana Ž. Bajić, Siniša N. Dodić, Vladimir S. Puškaš, Jelena M. Dodić

Abstract:

Increasing industrialization as a response to the demands of the consumer society greatly exploits resources and generates large amounts of waste effluents in addition to the desired product. This means it is a priority to implement technologies with the maximum utilization of raw materials and energy, minimum generation of waste effluents and/or their recycling (secondary use). Considering the process conditions and the nature of the raw materials used by the vegetable oil industry, its wastewaters can be used as substrates for the biotechnological production which requires large amounts of water. This way the waste effluents of one branch of industry become raw materials for another branch which produces a new product while reducing wastewater pollution and thereby reducing negative environmental impacts. Vegetable oil production generates wastewaters during the process of rinsing oils and fats which contain mainly fatty acid pollutants. The vegetable oil industry generates large amounts of waste effluents, especially in the processes of degumming, deacidification, deodorization and neutralization. Wastewaters from the vegetable oil industry are generated during the whole year in significant amounts, based on the capacity of the vegetable oil production. There are no known alternative applications for these wastewaters as raw materials for the production of marketable products. Since the literature has no data on the potential negative impact of fatty acids on the metabolism of the bacterium Xanthomonas campestris, these wastewaters were considered as potential raw materials for the biotechnological production of xanthan. In this research, vegetable oil industry wastewaters were used as the basis for the cultivation media for xanthan production with Xanthomonas campestris ATCC 13951. Examining the process of biosynthesis of xanthan on vegetable oil industry wastewaters as the basis for the cultivation media was performed to obtain insight into the possibility of its use in the aforementioned biotechnological process. Additionally, it was important to experimentally determine the absence of substances that have an inhibitory effect on the metabolism of the production microorganism. Xanthan content, rheological parameters of the cultivation media, carbon conversion into xanthan and conversions of the most significant nutrients for biosynthesis (carbon, nitrogen and phosphorus sources) were determined as indicators of the success of biosynthesis. The obtained results show that biotechnological production of the biopolymer xanthan by bacterium Xanthomonas campestris on vegetable oil industry wastewaters based cultivation media simultaneously provides preservation of the environment and economic benefits which is a sustainable solution to the problem of wastewater treatment.

Keywords: biotechnology, sustainable bioprocess, vegetable oil industry wastewaters, Xanthomonas campestris

Procedia PDF Downloads 118
29156 Analyzing On-Line Process Data for Industrial Production Quality Control

Authors: Hyun-Woo Cho

Abstract:

The monitoring of industrial production quality has to be implemented to alarm early warning for unusual operating conditions. Furthermore, identification of their assignable causes is necessary for a quality control purpose. For such tasks many multivariate statistical techniques have been applied and shown to be quite effective tools. This work presents a process data-based monitoring scheme for production processes. For more reliable results some additional steps of noise filtering and preprocessing are considered. It may lead to enhanced performance by eliminating unwanted variation of the data. The performance evaluation is executed using data sets from test processes. The proposed method is shown to provide reliable quality control results, and thus is more effective in quality monitoring in the example. For practical implementation of the method, an on-line data system must be available to gather historical and on-line data. Recently large amounts of data are collected on-line in most processes and implementation of the current scheme is feasible and does not give additional burdens to users.

Keywords: detection, filtering, monitoring, process data

Procedia PDF Downloads 516
29155 Domain Adaptive Dense Retrieval with Query Generation

Authors: Rui Yin, Haojie Wang, Xun Li

Abstract:

Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then, the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. We also explore contrastive learning as a method for training domain-adapted dense retrievers and show that it leads to strong performance in various retrieval settings. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.

Keywords: dense retrieval, query generation, contrastive learning, unsupervised training

Procedia PDF Downloads 59
29154 Churn Prediction for Telecommunication Industry Using Artificial Neural Networks

Authors: Ulas Vural, M. Ergun Okay, E. Mesut Yildiz

Abstract:

Telecommunication service providers demand accurate and precise prediction of customer churn probabilities to increase the effectiveness of their customer relation services. The large amount of customer data owned by the service providers is suitable for analysis by machine learning methods. In this study, expenditure data of customers are analyzed by using an artificial neural network (ANN). The ANN model is applied to the data of customers with different billing duration. The proposed model successfully predicts the churn probabilities at 83% accuracy for only three months expenditure data and the prediction accuracy increases up to 89% when the nine month data is used. The experiments also show that the accuracy of ANN model increases on an extended feature set with information of the changes on the bill amounts.

Keywords: customer relationship management, churn prediction, telecom industry, deep learning, artificial neural networks

Procedia PDF Downloads 117
29153 Grid and Market Integration of Large Scale Wind Farms using Advanced Predictive Data Mining Techniques

Authors: Umit Cali

Abstract:

The integration of intermittent energy sources like wind farms into the electricity grid has become an important challenge for the utilization and control of electric power systems, because of the fluctuating behaviour of wind power generation. Wind power predictions improve the economic and technical integration of large amounts of wind energy into the existing electricity grid. Trading, balancing, grid operation, controllability and safety issues increase the importance of predicting power output from wind power operators. Therefore, wind power forecasting systems have to be integrated into the monitoring and control systems of the transmission system operator (TSO) and wind farm operators/traders. The wind forecasts are relatively precise for the time period of only a few hours, and, therefore, relevant with regard to Spot and Intraday markets. In this work predictive data mining techniques are applied to identify a statistical and neural network model or set of models that can be used to predict wind power output of large onshore and offshore wind farms. These advanced data analytic methods helps us to amalgamate the information in very large meteorological, oceanographic and SCADA data sets into useful information and manageable systems. Accurate wind power forecasts are beneficial for wind plant operators, utility operators, and utility customers. An accurate forecast allows grid operators to schedule economically efficient generation to meet the demand of electrical customers. This study is also dedicated to an in-depth consideration of issues such as the comparison of day ahead and the short-term wind power forecasting results, determination of the accuracy of the wind power prediction and the evaluation of the energy economic and technical benefits of wind power forecasting.

Keywords: renewable energy sources, wind power, forecasting, data mining, big data, artificial intelligence, energy economics, power trading, power grids

Procedia PDF Downloads 481
29152 Identification of the Usage of Some Special Places in the Prehistoric Site of Tapeh Zagheh through Multi-Elemental Chemical Analysis of the Soil Samples

Authors: Iraj Rezaei, Kamal Al Din Niknami

Abstract:

Tapeh Zagheh is an important prehistoric site located in the central plateau of Iran, which has settlement layers of the Neolithic and Chalcolithic periods. For this research, 38 soil samples were collected from different parts of the site, as well as two samples from its outside as witnesses. Then the samples were analyzed by XRF. The purpose of this research was to identify some places with special usage for human activities in Tapeh Zagheh by measuring the amount of some special elements in the soil. The result of XRF analysis shows a significant amount of P and K in samples No.3 (fourth floor) and No.4 (third floor), probably due to certain activities such as food preparation and consumption. Samples No.9 and No.10 can be considered suitable examples of the hearths of the prehistoric period in the central plateau of Iran. The color of these samples was completely darkened due to the presence of ash, charcoal, and burnt materials. According to the XRF results, the soil of these hearths has very high amounts of elements such as P, Ca, Mn, S, K, and significant amounts of Ti, Fe, and Na. In addition, the elemental composition of sample No. 14, which was taken from a home waster, also has very high amounts of P, Mn, Mg, Ti, and Fe and high amounts of K and Ca. Sample No. 11, which is related to soil containing large amounts of waster of the kiln, along with a very strong increase in Cl and Na, the amount of elements such as K, Mg, and S has also increased significantly. It seems that the reason for the increase of elements such as Ti and Fe in some Tapeh Zagheh floors (for example, samples number 1, 2, 3, 4, 5) was the use of materials such as ocher mud or fire ash in the composition of these floors. Sample No. 13, which was taken from an oven located in the FIX trench, has very high amounts of Mn, Ti, and Fe and high amounts of P and Ca. Sample No. 15, which is related to House No. VII (probably related to a pen or a place where animals were kept) has much more phosphate compared to the control samples, which is probably due to the addition of animal excrement and urine to the soil. Sample No. 29 was taken from the north of the industrial area of Zagheh village (place of pottery kilns). The very low amount of index elements in sample No. 29 shows that the industrial activities did not extend to the mentioned point, and therefore, the range of this point can be considered as the boundary between the residential part of the Zagheh village and its industrial part.

Keywords: prehistory, multi-elemental analysis, Tapeh Zagheh, XRF

Procedia PDF Downloads 65
29151 REDUCER: An Architectural Design Pattern for Reducing Large and Noisy Data Sets

Authors: Apkar Salatian

Abstract:

To relieve the burden of reasoning on a point to point basis, in many domains there is a need to reduce large and noisy data sets into trends for qualitative reasoning. In this paper we propose and describe a new architectural design pattern called REDUCER for reducing large and noisy data sets that can be tailored for particular situations. REDUCER consists of 2 consecutive processes: Filter which takes the original data and removes outliers, inconsistencies or noise; and Compression which takes the filtered data and derives trends in the data. In this seminal article, we also show how REDUCER has successfully been applied to 3 different case studies.

Keywords: design pattern, filtering, compression, architectural design

Procedia PDF Downloads 179
29150 Evaluation and Assessment of Bioinformatics Methods and Their Applications

Authors: Fatemeh Nokhodchi Bonab

Abstract:

Bioinformatics, in its broad sense, involves application of computer processes to solve biological problems. A wide range of computational tools are needed to effectively and efficiently process large amounts of data being generated as a result of recent technological innovations in biology and medicine. A number of computational tools have been developed or adapted to deal with the experimental riches of complex and multivariate data and transition from data collection to information or knowledge. These bioinformatics tools are being evaluated and applied in various medical areas including early detection, risk assessment, classification, and prognosis of cancer. The goal of these efforts is to develop and identify bioinformatics methods with optimal sensitivity, specificity, and predictive capabilities. The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems.

Keywords: methods, applications, transcriptional regulatory systems, techniques

Procedia PDF Downloads 91
29149 Recommender System Based on Mining Graph Databases for Data-Intensive Applications

Authors: Mostafa Gamal, Hoda K. Mohamed, Islam El-Maddah, Ali Hamdi

Abstract:

In recent years, many digital documents on the web have been created due to the rapid growth of ’social applications’ communities or ’Data-intensive applications’. The evolution of online-based multimedia data poses new challenges in storing and querying large amounts of data for online recommender systems. Graph data models have been shown to be more efficient than relational data models for processing complex data. This paper will explain the key differences between graph and relational databases, their strengths and weaknesses, and why using graph databases is the best technology for building a realtime recommendation system. Also, The paper will discuss several similarity metrics algorithms that can be used to compute a similarity score of pairs of nodes based on their neighbourhoods or their properties. Finally, the paper will discover how NLP strategies offer the premise to improve the accuracy and coverage of realtime recommendations by extracting the information from the stored unstructured knowledge, which makes up the bulk of the world’s data to enrich the graph database with this information. As the size and number of data items are increasing rapidly, the proposed system should meet current and future needs.

Keywords: graph databases, NLP, recommendation systems, similarity metrics

Procedia PDF Downloads 70
29148 Data Recording for Remote Monitoring of Autonomous Vehicles

Authors: Rong-Terng Juang

Abstract:

Autonomous vehicles offer the possibility of significant benefits to social welfare. However, fully automated cars might not be going to happen in the near further. To speed the adoption of the self-driving technologies, many governments worldwide are passing laws requiring data recorders for the testing of autonomous vehicles. Currently, the self-driving vehicle, (e.g., shuttle bus) has to be monitored from a remote control center. When an autonomous vehicle encounters an unexpected driving environment, such as road construction or an obstruction, it should request assistance from a remote operator. Nevertheless, large amounts of data, including images, radar and lidar data, etc., have to be transmitted from the vehicle to the remote center. Therefore, this paper proposes a data compression method of in-vehicle networks for remote monitoring of autonomous vehicles. Firstly, the time-series data are rearranged into a multi-dimensional signal space. Upon the arrival, for controller area networks (CAN), the new data are mapped onto a time-data two-dimensional space associated with the specific CAN identity. Secondly, the data are sampled based on differential sampling. Finally, the whole set of data are encoded using existing algorithms such as Huffman, arithmetic and codebook encoding methods. To evaluate system performance, the proposed method was deployed on an in-house built autonomous vehicle. The testing results show that the amount of data can be reduced as much as 1/7 compared to the raw data.

Keywords: autonomous vehicle, data compression, remote monitoring, controller area networks (CAN), Lidar

Procedia PDF Downloads 129
29147 Determination of the Risks of Heart Attack at the First Stage as Well as Their Control and Resource Planning with the Method of Data Mining

Authors: İbrahi̇m Kara, Seher Arslankaya

Abstract:

Frequently preferred in the field of engineering in particular, data mining has now begun to be used in the field of health as well since the data in the health sector have reached great dimensions. With data mining, it is aimed to reveal models from the great amounts of raw data in agreement with the purpose and to search for the rules and relationships which will enable one to make predictions about the future from the large amount of data set. It helps the decision-maker to find the relationships among the data which form at the stage of decision-making. In this study, it is aimed to determine the risk of heart attack at the first stage, to control it, and to make its resource planning with the method of data mining. Through the early and correct diagnosis of heart attacks, it is aimed to reveal the factors which affect the diseases, to protect health and choose the right treatment methods, to reduce the costs in health expenditures, and to shorten the durations of patients’ stay at hospitals. In this way, the diagnosis and treatment costs of a heart attack will be scrutinized, which will be useful to determine the risk of the disease at the first stage, to control it, and to make its resource planning.

Keywords: data mining, decision support systems, heart attack, health sector

Procedia PDF Downloads 328