Search results for: Data Mining
24531 Microarray Gene Expression Data Dimensionality Reduction Using PCA
Authors: Fuad M. Alkoot
Abstract:
Different experimental technologies such as microarray sequencing have been proposed to generate high-resolution genetic data, in order to understand the complex dynamic interactions between complex diseases and the biological system components of genes and gene products. However, the generated samples have a very large dimension reaching thousands. Therefore, hindering all attempts to design a classifier system that can identify diseases based on such data. Additionally, the high overlap in the class distributions makes the task more difficult. The data we experiment with is generated for the identification of autism. It includes 142 samples, which is small compared to the large dimension of the data. The classifier systems trained on this data yield very low classification rates that are almost equivalent to a guess. We aim at reducing the data dimension and improve it for classification. Here, we experiment with applying a multistage PCA on the genetic data to reduce its dimensionality. Results show a significant improvement in the classification rates which increases the possibility of building an automated system for autism detection.Keywords: PCA, gene expression, dimensionality reduction, classification, autism
Procedia PDF Downloads 56024530 Data Science-Based Key Factor Analysis and Risk Prediction of Diabetic
Authors: Fei Gao, Rodolfo C. Raga Jr.
Abstract:
This research proposal will ascertain the major risk factors for diabetes and to design a predictive model for risk assessment. The project aims to improve diabetes early detection and management by utilizing data science techniques, which may improve patient outcomes and healthcare efficiency. The phase relation values of each attribute were used to analyze and choose the attributes that might influence the examiner's survival probability using Diabetes Health Indicators Dataset from Kaggle’s data as the research data. We compare and evaluate eight machine learning algorithms. Our investigation begins with comprehensive data preprocessing, including feature engineering and dimensionality reduction, aimed at enhancing data quality. The dataset, comprising health indicators and medical data, serves as a foundation for training and testing these algorithms. A rigorous cross-validation process is applied, and we assess their performance using five key metrics like accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). After analyzing the data characteristics, investigate their impact on the likelihood of diabetes and develop corresponding risk indicators.Keywords: diabetes, risk factors, predictive model, risk assessment, data science techniques, early detection, data analysis, Kaggle
Procedia PDF Downloads 7524529 A Methodology to Integrate Data in the Company Based on the Semantic Standard in the Context of Industry 4.0
Authors: Chang Qin, Daham Mustafa, Abderrahmane Khiat, Pierre Bienert, Paulo Zanini
Abstract:
Nowadays, companies are facing lots of challenges in the process of digital transformation, which can be a complex and costly undertaking. Digital transformation involves the collection and analysis of large amounts of data, which can create challenges around data management and governance. Furthermore, it is also challenged to integrate data from multiple systems and technologies. Although with these pains, companies are still pursuing digitalization because by embracing advanced technologies, companies can improve efficiency, quality, decision-making, and customer experience while also creating different business models and revenue streams. In this paper, the issue that data is stored in data silos with different schema and structures is focused. The conventional approaches to addressing this issue involve utilizing data warehousing, data integration tools, data standardization, and business intelligence tools. However, these approaches primarily focus on the grammar and structure of the data and neglect the importance of semantic modeling and semantic standardization, which are essential for achieving data interoperability. In this session, the challenge of data silos in Industry 4.0 is addressed by developing a semantic modeling approach compliant with Asset Administration Shell (AAS) models as an efficient standard for communication in Industry 4.0. The paper highlights how our approach can facilitate the data mapping process and semantic lifting according to existing industry standards such as ECLASS and other industrial dictionaries. It also incorporates the Asset Administration Shell technology to model and map the company’s data and utilize a knowledge graph for data storage and exploration.Keywords: data interoperability in industry 4.0, digital integration, industrial dictionary, semantic modeling
Procedia PDF Downloads 9424528 Economic Characteristics of Bitcoin: "An Analytical Study"
Authors: Abdelhalem Shahen
Abstract:
The world is now experiencing a digital revolution and greatly accelerated technological developments, in addition to the transition from the economy in its traditional form to the digital economy, which has resulted in the emergence of new tools that are appropriate to those developments, and from this, this paper attempts to explore the economic characteristics of the bitcoin currency that circulated recently. Due to the many advantages that distinguish it from money in its traditional forms, which have a range of economic effects. The study found that Bitcoin is among the technological innovations, which contain a set of characteristics that are worth studying, those that make it the focus of attention, such as the digital currency, the peer-to-peer property, Lower and Faster Transaction Costs, transparency, decentralized control, privacy, and Double-Spending, as well as security and Cryptographic, and finally mining.Keywords: Digital Economics, Digital Currencies, Bitcoin, Features of Bitcoin
Procedia PDF Downloads 13824527 Enhancing Plant Throughput in Mineral Processing Through Multimodal Artificial Intelligence
Authors: Muhammad Bilal Shaikh
Abstract:
Mineral processing plants play a pivotal role in extracting valuable minerals from raw ores, contributing significantly to various industries. However, the optimization of plant throughput remains a complex challenge, necessitating innovative approaches for increased efficiency and productivity. This research paper investigates the application of Multimodal Artificial Intelligence (MAI) techniques to address this challenge, aiming to improve overall plant throughput in mineral processing operations. The integration of multimodal AI leverages a combination of diverse data sources, including sensor data, images, and textual information, to provide a holistic understanding of the complex processes involved in mineral extraction. The paper explores the synergies between various AI modalities, such as machine learning, computer vision, and natural language processing, to create a comprehensive and adaptive system for optimizing mineral processing plants. The primary focus of the research is on developing advanced predictive models that can accurately forecast various parameters affecting plant throughput. Utilizing historical process data, machine learning algorithms are trained to identify patterns, correlations, and dependencies within the intricate network of mineral processing operations. This enables real-time decision-making and process optimization, ultimately leading to enhanced plant throughput. Incorporating computer vision into the multimodal AI framework allows for the analysis of visual data from sensors and cameras positioned throughout the plant. This visual input aids in monitoring equipment conditions, identifying anomalies, and optimizing the flow of raw materials. The combination of machine learning and computer vision enables the creation of predictive maintenance strategies, reducing downtime and improving the overall reliability of mineral processing plants. Furthermore, the integration of natural language processing facilitates the extraction of valuable insights from unstructured textual data, such as maintenance logs, research papers, and operator reports. By understanding and analyzing this textual information, the multimodal AI system can identify trends, potential bottlenecks, and areas for improvement in plant operations. This comprehensive approach enables a more nuanced understanding of the factors influencing throughput and allows for targeted interventions. The research also explores the challenges associated with implementing multimodal AI in mineral processing plants, including data integration, model interpretability, and scalability. Addressing these challenges is crucial for the successful deployment of AI solutions in real-world industrial settings. To validate the effectiveness of the proposed multimodal AI framework, the research conducts case studies in collaboration with mineral processing plants. The results demonstrate tangible improvements in plant throughput, efficiency, and cost-effectiveness. The paper concludes with insights into the broader implications of implementing multimodal AI in mineral processing and its potential to revolutionize the industry by providing a robust, adaptive, and data-driven approach to optimizing plant operations. In summary, this research contributes to the evolving field of mineral processing by showcasing the transformative potential of multimodal artificial intelligence in enhancing plant throughput. The proposed framework offers a holistic solution that integrates machine learning, computer vision, and natural language processing to address the intricacies of mineral extraction processes, paving the way for a more efficient and sustainable future in the mineral processing industry.Keywords: multimodal AI, computer vision, NLP, mineral processing, mining
Procedia PDF Downloads 6824526 Big Data Analytics and Data Security in the Cloud via Fully Homomorphic Encryption
Authors: Waziri Victor Onomza, John K. Alhassan, Idris Ismaila, Noel Dogonyaro Moses
Abstract:
This paper describes the problem of building secure computational services for encrypted information in the Cloud Computing without decrypting the encrypted data; therefore, it meets the yearning of computational encryption algorithmic aspiration model that could enhance the security of big data for privacy, confidentiality, availability of the users. The cryptographic model applied for the computational process of the encrypted data is the Fully Homomorphic Encryption Scheme. We contribute theoretical presentations in high-level computational processes that are based on number theory and algebra that can easily be integrated and leveraged in the Cloud computing with detail theoretic mathematical concepts to the fully homomorphic encryption models. This contribution enhances the full implementation of big data analytics based cryptographic security algorithm.Keywords: big data analytics, security, privacy, bootstrapping, homomorphic, homomorphic encryption scheme
Procedia PDF Downloads 38024525 Protecting Privacy and Data Security in Online Business
Authors: Bilquis Ferdousi
Abstract:
With the exponential growth of the online business, the threat to consumers’ privacy and data security has become a serious challenge. This literature review-based study focuses on a better understanding of those threats and what legislative measures have been taken to address those challenges. Research shows that people are increasingly involved in online business using different digital devices and platforms, although this practice varies based on age groups. The threat to consumers’ privacy and data security is a serious hindrance in developing trust among consumers in online businesses. There are some legislative measures taken at the federal and state level to protect consumers’ privacy and data security. The study was based on an extensive review of current literature on protecting consumers’ privacy and data security and legislative measures that have been taken.Keywords: privacy, data security, legislation, online business
Procedia PDF Downloads 10624524 Flowing Online Vehicle GPS Data Clustering Using a New Parallel K-Means Algorithm
Authors: Orhun Vural, Oguz Bayat, Rustu Akay, Osman N. Ucan
Abstract:
This study presents a new parallel approach clustering of GPS data. Evaluation has been made by comparing execution time of various clustering algorithms on GPS data. This paper aims to propose a parallel based on neighborhood K-means algorithm to make it faster. The proposed parallelization approach assumes that each GPS data represents a vehicle and to communicate between vehicles close to each other after vehicles are clustered. This parallelization approach has been examined on different sized continuously changing GPS data and compared with serial K-means algorithm and other serial clustering algorithms. The results demonstrated that proposed parallel K-means algorithm has been shown to work much faster than other clustering algorithms.Keywords: parallel k-means algorithm, parallel clustering, clustering algorithms, clustering on flowing data
Procedia PDF Downloads 22224523 An Analysis of Privacy and Security for Internet of Things Applications
Authors: Dhananjay Singh, M. Abdullah-Al-Wadud
Abstract:
The Internet of Things is a concept of a large scale ecosystem of wireless actuators. The actuators are defined as things in the IoT, those which contribute or produces some data to the ecosystem. However, ubiquitous data collection, data security, privacy preserving, large volume data processing, and intelligent analytics are some of the key challenges into the IoT technologies. In order to solve the security requirements, challenges and threats in the IoT, we have discussed a message authentication mechanism for IoT applications. Finally, we have discussed data encryption mechanism for messages authentication before propagating into IoT networks.Keywords: Internet of Things (IoT), message authentication, privacy, security
Procedia PDF Downloads 38224522 The Influence of the Regional Sectoral Structure on the Socio-Economic Development of the Arkhangelsk Region
Authors: K. G. Sorokozherdyev, E. A. Efimov
Abstract:
The socio-economic development of regions and countries is an important research issue. Today, in the face of many negative events in the global and regional economies, it is especially important to identify those areas that can serve as sources of economic growth and the basis for the well-being of the population. This study aims to identify the most important sectors of the economy of the Arkhangelsk region that can contribute to the socio-economic development of the region as a whole. For research, the Arkhangelsk region was taken as one of the typical Russian regions that do not have significant reserves of hydrocarbons nor there are located any large industrial complexes. In this regard, the question of possible origins of economic growth seems especially relevant. The basis of this study constitutes the distributed lag regression model (ADL model) developed by the authors, which is based on quarterly data on the socio-economic development of the Arkhangelsk region for the period 2004-2016. As a result, we obtained three equations reflecting the dynamics of three indicators of the socio-economic development of the region -the average wage, the regional GRP, and the birth rate. The influencing factors are the shares in GRP of such sectors as agriculture, mining, manufacturing, construction, wholesale and retail trade, hotels and restaurants, as well as the financial sector. The study showed that the greatest influence on the socio-economic development of the region is exerted by such industries as wholesale and retail trade, construction, and industrial sectors. The study can be the basis for forecasting and modeling the socio-economic development of the Arkhangelsk region in the short and medium term. It also can be helpful while analyzing the effectiveness of measures aimed at stimulating those or other industries of the region. The model can be used in developing a regional development strategy.Keywords: regional economic development, regional sectoral structure, ADL model, Arkhangelsk region
Procedia PDF Downloads 10024521 Frequent Pattern Mining for Digenic Human Traits
Authors: Atsuko Okazaki, Jurg Ott
Abstract:
Some genetic diseases (‘digenic traits’) are due to the interaction between two DNA variants. For example, certain forms of Retinitis Pigmentosa (a genetic form of blindness) occur in the presence of two mutant variants, one in the ROM1 gene and one in the RDS gene, while the occurrence of only one of these mutant variants leads to a completely normal phenotype. Detecting such digenic traits by genetic methods is difficult. A common approach to finding disease-causing variants is to compare 100,000s of variants between individuals with a trait (cases) and those without the trait (controls). Such genome-wide association studies (GWASs) have been very successful but hinge on genetic effects of single variants, that is, there should be a difference in allele or genotype frequencies between cases and controls at a disease-causing variant. Frequent pattern mining (FPM) methods offer an avenue at detecting digenic traits even in the absence of single-variant effects. The idea is to enumerate pairs of genotypes (genotype patterns) with each of the two genotypes originating from different variants that may be located at very different genomic positions. What is needed is for genotype patterns to be significantly more common in cases than in controls. Let Y = 2 refer to cases and Y = 1 to controls, with X denoting a specific genotype pattern. We are seeking association rules, ‘X → Y’, with high confidence, P(Y = 2|X), significantly higher than the proportion of cases, P(Y = 2) in the study. Clearly, generally available FPM methods are very suitable for detecting disease-associated genotype patterns. We use fpgrowth as the basic FPM algorithm and built a framework around it to enumerate high-frequency digenic genotype patterns and to evaluate their statistical significance by permutation analysis. Application to a published dataset on opioid dependence furnished results that could not be found with classical GWAS methodology. There were 143 cases and 153 healthy controls, each genotyped for 82 variants in eight genes of the opioid system. The aim was to find out whether any of these variants were disease-associated. The single-variant analysis did not lead to significant results. Application of our FPM implementation resulted in one significant (p < 0.01) genotype pattern with both genotypes in the pattern being heterozygous and originating from two variants on different chromosomes. This pattern occurred in 14 cases and none of the controls. Thus, the pattern seems quite specific to this form of substance abuse and is also rather predictive of disease. An algorithm called Multifactor Dimension Reduction (MDR) was developed some 20 years ago and has been in use in human genetics ever since. This and our algorithms share some similar properties, but they are also very different in other respects. The main difference seems to be that our algorithm focuses on patterns of genotypes while the main object of inference in MDR is the 3 × 3 table of genotypes at two variants.Keywords: digenic traits, DNA variants, epistasis, statistical genetics
Procedia PDF Downloads 12224520 Cognitive Science Based Scheduling in Grid Environment
Authors: N. D. Iswarya, M. A. Maluk Mohamed, N. Vijaya
Abstract:
Grid is infrastructure that allows the deployment of distributed data in large size from multiple locations to reach a common goal. Scheduling data intensive applications becomes challenging as the size of data sets are very huge in size. Only two solutions exist in order to tackle this challenging issue. First, computation which requires huge data sets to be processed can be transferred to the data site. Second, the required data sets can be transferred to the computation site. In the former scenario, the computation cannot be transferred since the servers are storage/data servers with little or no computational capability. Hence, the second scenario can be considered for further exploration. During scheduling, transferring huge data sets from one site to another site requires more network bandwidth. In order to mitigate this issue, this work focuses on incorporating cognitive science in scheduling. Cognitive Science is the study of human brain and its related activities. Current researches are mainly focused on to incorporate cognitive science in various computational modeling techniques. In this work, the problem solving approach of human brain is studied and incorporated during the data intensive scheduling in grid environments. Here, a cognitive engine is designed and deployed in various grid sites. The intelligent agents present in CE will help in analyzing the request and creating the knowledge base. Depending upon the link capacity, decision will be taken whether to transfer data sets or to partition the data sets. Prediction of next request is made by the agents to serve the requesting site with data sets in advance. This will reduce the data availability time and data transfer time. Replica catalog and Meta data catalog created by the agents assist in decision making process.Keywords: data grid, grid workflow scheduling, cognitive artificial intelligence
Procedia PDF Downloads 39424519 Heritage and Tourism in the Era of Big Data: Analysis of Chinese Cultural Tourism in Catalonia
Authors: Xinge Liao, Francesc Xavier Roige Ventura, Dolores Sanchez Aguilera
Abstract:
With the development of the Internet, the study of tourism behavior has rapidly expanded from the traditional physical market to the online market. Data on the Internet is characterized by dynamic changes, and new data appear all the time. In recent years the generation of a large volume of data was characterized, such as forums, blogs, and other sources, which have expanded over time and space, together they constitute large-scale Internet data, known as Big Data. This data of technological origin that derives from the use of devices and the activity of multiple users is becoming a source of great importance for the study of geography and the behavior of tourists. The study will focus on cultural heritage tourist practices in the context of Big Data. The research will focus on exploring the characteristics and behavior of Chinese tourists in relation to the cultural heritage of Catalonia. Geographical information, target image, perceptions in user-generated content will be studied through data analysis from Weibo -the largest social networks of blogs in China. Through the analysis of the behavior of heritage tourists in the Big Data environment, this study will understand the practices (activities, motivations, perceptions) of cultural tourists and then understand the needs and preferences of tourists in order to better guide the sustainable development of tourism in heritage sites.Keywords: Barcelona, Big Data, Catalonia, cultural heritage, Chinese tourism market, tourists’ behavior
Procedia PDF Downloads 13824518 Towards A Framework for Using Open Data for Accountability: A Case Study of A Program to Reduce Corruption
Authors: Darusalam, Jorish Hulstijn, Marijn Janssen
Abstract:
Media has revealed a variety of corruption cases in the regional and local governments all over the world. Many governments pursued many anti-corruption reforms and have created a system of checks and balances. Three types of corruption are faced by citizens; administrative corruption, collusion and extortion. Accountability is one of the benchmarks for building transparent government. The public sector is required to report the results of the programs that have been implemented so that the citizen can judge whether the institution has been working such as economical, efficient and effective. Open Data is offering solutions for the implementation of good governance in organizations who want to be more transparent. In addition, Open Data can create transparency and accountability to the community. The objective of this paper is to build a framework of open data for accountability to combating corruption. This paper will investigate the relationship between open data, and accountability as part of anti-corruption initiatives. This research will investigate the impact of open data implementation on public organization.Keywords: open data, accountability, anti-corruption, framework
Procedia PDF Downloads 33724517 Analysis of Urban Population Using Twitter Distribution Data: Case Study of Makassar City, Indonesia
Authors: Yuyun Wabula, B. J. Dewancker
Abstract:
In the past decade, the social networking app has been growing very rapidly. Geolocation data is one of the important features of social media that can attach the user's location coordinate in the real world. This paper proposes the use of geolocation data from the Twitter social media application to gain knowledge about urban dynamics, especially on human mobility behavior. This paper aims to explore the relation between geolocation Twitter with the existence of people in the urban area. Firstly, the study will analyze the spread of people in the particular area, within the city using Twitter social media data. Secondly, we then match and categorize the existing place based on the same individuals visiting. Then, we combine the Twitter data from the tracking result and the questionnaire data to catch the Twitter user profile. To do that, we used the distribution frequency analysis to learn the visitors’ percentage. To validate the hypothesis, we compare it with the local population statistic data and land use mapping released by the city planning department of Makassar local government. The results show that there is the correlation between Twitter geolocation and questionnaire data. Thus, integration the Twitter data and survey data can reveal the profile of the social media users.Keywords: geolocation, Twitter, distribution analysis, human mobility
Procedia PDF Downloads 31424516 Data-Centric Anomaly Detection with Diffusion Models
Authors: Sheldon Liu, Gordon Wang, Lei Liu, Xuefeng Liu
Abstract:
Anomaly detection, also referred to as one-class classification, plays a crucial role in identifying product images that deviate from the expected distribution. This study introduces Data-centric Anomaly Detection with Diffusion Models (DCADDM), presenting a systematic strategy for data collection and further diversifying the data with image generation via diffusion models. The algorithm addresses data collection challenges in real-world scenarios and points toward data augmentation with the integration of generative AI capabilities. The paper explores the generation of normal images using diffusion models. The experiments demonstrate that with 30% of the original normal image size, modeling in an unsupervised setting with state-of-the-art approaches can achieve equivalent performances. With the addition of generated images via diffusion models (10% equivalence of the original dataset size), the proposed algorithm achieves better or equivalent anomaly localization performance.Keywords: diffusion models, anomaly detection, data-centric, generative AI
Procedia PDF Downloads 8224515 Regulation on the Protection of Personal Data Versus Quality Data Assurance in the Healthcare System Case Report
Authors: Elizabeta Krstić Vukelja
Abstract:
Digitization of personal data is a consequence of the development of information and communication technologies that create a new work environment with many advantages and challenges, but also potential threats to privacy and personal data protection. Regulation (EU) 2016/679 of the European Parliament and of the Council is becoming a law and obligation that should address the issues of personal data protection and information security. The existence of the Regulation leads to the conclusion that national legislation in the field of virtual environment, protection of the rights of EU citizens and processing of their personal data is insufficiently effective. In the health system, special emphasis is placed on the processing of special categories of personal data, such as health data. The healthcare industry is recognized as a particularly sensitive area in which a large amount of medical data is processed, the digitization of which enables quick access and quick identification of the health insured. The protection of the individual requires quality IT solutions that guarantee the technical protection of personal categories. However, the real problems are the technical and human nature and the spatial limitations of the application of the Regulation. Some conclusions will be drawn by analyzing the implementation of the basic principles of the Regulation on the example of the Croatian health care system and comparing it with similar activities in other EU member states.Keywords: regulation, healthcare system, personal dana protection, quality data assurance
Procedia PDF Downloads 3824514 Parallel Vector Processing Using Multi Level Orbital DATA
Authors: Nagi Mekhiel
Abstract:
Many applications use vector operations by applying single instruction to multiple data that map to different locations in conventional memory. Transferring data from memory is limited by access latency and bandwidth affecting the performance gain of vector processing. We present a memory system that makes all of its content available to processors in time so that processors need not to access the memory, we force each location to be available to all processors at a specific time. The data move in different orbits to become available to other processors in higher orbits at different time. We use this memory to apply parallel vector operations to data streams at first orbit level. Data processed in the first level move to upper orbit one data element at a time, allowing a processor in that orbit to apply another vector operation to deal with serial code limitations inherited in all parallel applications and interleaved it with lower level vector operations.Keywords: Memory Organization, Parallel Processors, Serial Code, Vector Processing
Procedia PDF Downloads 27024513 Reconstructability Analysis for Landslide Prediction
Authors: David Percy
Abstract:
Landslides are a geologic phenomenon that affects a large number of inhabited places and are constantly being monitored and studied for the prediction of future occurrences. Reconstructability analysis (RA) is a methodology for extracting informative models from large volumes of data that work exclusively with discrete data. While RA has been used in medical applications and social science extensively, we are introducing it to the spatial sciences through applications like landslide prediction. Since RA works exclusively with discrete data, such as soil classification or bedrock type, working with continuous data, such as porosity, requires that these data are binned for inclusion in the model. RA constructs models of the data which pick out the most informative elements, independent variables (IVs), from each layer that predict the dependent variable (DV), landslide occurrence. Each layer included in the model retains its classification data as a primary encoding of the data. Unlike other machine learning algorithms that force the data into one-hot encoding type of schemes, RA works directly with the data as it is encoded, with the exception of continuous data, which must be binned. The usual physical and derived layers are included in the model, and testing our results against other published methodologies, such as neural networks, yields accuracy that is similar but with the advantage of a completely transparent model. The results of an RA session with a data set are a report on every combination of variables and their probability of landslide events occurring. In this way, every combination of informative state combinations can be examined.Keywords: reconstructability analysis, machine learning, landslides, raster analysis
Procedia PDF Downloads 6624512 Data Analytics in Hospitality Industry
Authors: Tammy Wee, Detlev Remy, Arif Perdana
Abstract:
In the recent years, data analytics has become the buzzword in the hospitality industry. The hospitality industry is another example of a data-rich industry that has yet fully benefited from the insights of data analytics. Effective use of data analytics can change how hotels operate, market and position themselves competitively in the hospitality industry. However, at the moment, the data obtained by individual hotels remain under-utilized. This research is a preliminary research on data analytics in the hospitality industry, using an in-depth face-to-face interview on one hotel as a start to a multi-level research. The main case study of this research, hotel A, is a chain brand of international hotel that has been systematically gathering and collecting data on its own customer for the past five years. The data collection points begin from the moment a guest book a room until the guest leave the hotel premises, which includes room reservation, spa booking, and catering. Although hotel A has been gathering data intelligence on its customer for some time, they have yet utilized the data to its fullest potential, and they are aware of their limitation as well as the potential of data analytics. Currently, the utilization of data analytics in hotel A is limited in the area of customer service improvement, namely to enhance the personalization of service for each individual customer. Hotel A is able to utilize the data to improve and enhance their service which in turn, encourage repeated customers. According to hotel A, 50% of their guests returned to their hotel, and 70% extended nights because of the personalized service. Apart from using the data analytics for enhancing customer service, hotel A also uses the data in marketing. Hotel A uses the data analytics to predict or forecast the change in consumer behavior and demand, by tracking their guest’s booking preference, payment preference and demand shift between properties. However, hotel A admitted that the data they have been collecting was not fully utilized due to two challenges. The first challenge of using data analytics in hotel A is the data is not clean. At the moment, the data collection of one guest profile is meaningful only for one department in the hotel but meaningless for another department. Cleaning up the data and getting standards correctly for usage by different departments are some of the main concerns of hotel A. The second challenge of using data analytics in hotel A is the non-integral internal system. At the moment, the internal system used by hotel A do not integrate with each other well, limiting the ability to collect data systematically. Hotel A is considering another system to replace the current one for more comprehensive data collection. Hotel proprietors recognized the potential of data analytics as reported in this research, however, the current challenges of implementing a system to collect data come with a cost. This research has identified the current utilization of data analytics and the challenges faced when it comes to implementing data analytics.Keywords: data analytics, hospitality industry, customer relationship management, hotel marketing
Procedia PDF Downloads 18024511 Evaluation of Ensemble Classifiers for Intrusion Detection
Authors: M. Govindarajan
Abstract:
One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. In this research work, new ensemble classification methods are proposed with homogeneous ensemble classifier using bagging and heterogeneous ensemble classifier using arcing and their performances are analyzed in terms of accuracy. A Classifier ensemble is designed using Radial Basis Function (RBF) and Support Vector Machine (SVM) as base classifiers. The feasibility and the benefits of the proposed approaches are demonstrated by the means of standard datasets of intrusion detection. The main originality of the proposed approach is based on three main parts: preprocessing phase, classification phase, and combining phase. A wide range of comparative experiments is conducted for standard datasets of intrusion detection. The performance of the proposed homogeneous and heterogeneous ensemble classifiers are compared to the performance of other standard homogeneous and heterogeneous ensemble methods. The standard homogeneous ensemble methods include Error correcting output codes, Dagging and heterogeneous ensemble methods include majority voting, stacking. The proposed ensemble methods provide significant improvement of accuracy compared to individual classifiers and the proposed bagged RBF and SVM performs significantly better than ECOC and Dagging and the proposed hybrid RBF-SVM performs significantly better than voting and stacking. Also heterogeneous models exhibit better results than homogeneous models for standard datasets of intrusion detection.Keywords: data mining, ensemble, radial basis function, support vector machine, accuracy
Procedia PDF Downloads 24824510 Realization of a (GIS) for Drilling (DWS) through the Adrar Region
Authors: Djelloul Benatiallah, Ali Benatiallah, Abdelkader Harouz
Abstract:
Geographic Information Systems (GIS) include various methods and computer techniques to model, capture digitally, store, manage, view and analyze. Geographic information systems have the characteristic to appeal to many scientific and technical field, and many methods. In this article we will present a complete and operational geographic information system, following the theoretical principles of data management and adapting to spatial data, especially data concerning the monitoring of drinking water supply wells (DWS) Adrar region. The expected results of this system are firstly an offer consulting standard features, updating and editing beneficiaries and geographical data, on the other hand, provides specific functionality contractors entered data, calculations parameterized and statistics.Keywords: GIS, DWS, drilling, Adrar
Procedia PDF Downloads 30924509 Generic Data Warehousing for Consumer Electronics Retail Industry
Authors: S. Habte, K. Ouazzane, P. Patel, S. Patel
Abstract:
The dynamic and highly competitive nature of the consumer electronics retail industry means that businesses in this industry are experiencing different decision making challenges in relation to pricing, inventory control, consumer satisfaction and product offerings. To overcome the challenges facing retailers and create opportunities, we propose a generic data warehousing solution which can be applied to a wide range of consumer electronics retailers with a minimum configuration. The solution includes a dimensional data model, a template SQL script, a high level architectural descriptions, ETL tool developed using C#, a set of APIs, and data access tools. It has been successfully applied by ASK Outlets Ltd UK resulting in improved productivity and enhanced sales growth.Keywords: consumer electronics, data warehousing, dimensional data model, generic, retail industry
Procedia PDF Downloads 41324508 Sequential Data Assimilation with High-Frequency (HF) Radar Surface Current
Authors: Lei Ren, Michael Hartnett, Stephen Nash
Abstract:
The abundant measured surface current from HF radar system in coastal area is assimilated into model to improve the modeling forecasting ability. A simple sequential data assimilation scheme, Direct Insertion (DI), is applied to update model forecast states. The influence of Direct Insertion data assimilation over time is analyzed at one reference point. Vector maps of surface current from models are compared with HF radar measurements. Root-Mean-Squared-Error (RMSE) between modeling results and HF radar measurements is calculated during the last four days with no data assimilation.Keywords: data assimilation, CODAR, HF radar, surface current, direct insertion
Procedia PDF Downloads 57424507 Measured versus Default Interstate Traffic Data in New Mexico, USA
Authors: M. A. Hasan, M. R. Islam, R. A. Tarefder
Abstract:
This study investigates how the site specific traffic data differs from the Mechanistic Empirical Pavement Design Software default values. Two Weigh-in-Motion (WIM) stations were installed in Interstate-40 (I-40) and Interstate-25 (I-25) to developed site specific data. A computer program named WIM Data Analysis Software (WIMDAS) was developed using Microsoft C-Sharp (.Net) for quality checking and processing of raw WIM data. A complete year data from November 2013 to October 2014 was analyzed using the developed WIM Data Analysis Program. After that, the vehicle class distribution, directional distribution, lane distribution, monthly adjustment factor, hourly distribution, axle load spectra, average number of axle per vehicle, axle spacing, lateral wander distribution, and wheelbase distribution were calculated. Then a comparative study was done between measured data and AASHTOWare default values. It was found that the measured general traffic inputs for I-40 and I-25 significantly differ from the default values.Keywords: AASHTOWare, traffic, weigh-in-motion, axle load distribution
Procedia PDF Downloads 34324506 Design of Knowledge Management System with Geographic Information System
Authors: Angga Hidayah Ramadhan, Luciana Andrawina, M. Azani Hasibuan
Abstract:
Data will be as a core of the decision if it has a good treatment or process, which is process that data into information, and information into knowledge to make a wisdom or decision. Today, many companies have not realize it include XYZ University Admission Directorate as executor of National Admission called Seleksi Masuk Bersama (SMB) that during the time, the workers only uses their feeling to make a decision. Whereas if it done, then that company can analyze the data to make a right decision to get a pin sales from student candidate or registrant that follow SMB as many as possible. Therefore, needs Knowledge Management System (KMS) with Geographic Information System (GIS) use 5C4C that can process that company data becomes more useful and can help make decisions. This information system can process data into information based on the pin sold data with 5C (Contextualized, Categorize, Calculation, Correction, Condensed) and convert information into knowledge with 4C (Comparing, Consequence, Connection, Conversation) that has been several steps until these data can be useful to make easier to take a decision or wisdom, resolve problems, communicate, and quicker to learn to the employees have not experience and also for ease of viewing/visualization based on spatial data that equipped with GIS functionality that can be used to indicate events in each province with indicator that facilitate in this system. The system also have a function to save the tacit on the system then to be proceed into explicit in expert system based on the problems that will be found from the consequences of information. With the system each team can make a decision with same ways, structured, and the important is based on the actual event/data.Keywords: 5C4C, data, information, knowledge
Procedia PDF Downloads 46224505 In situ Stabilization of Arsenic in Soils with Birnessite and Goethite
Authors: Saeed Bagherifam, Trevor Brown, Chris Fellows, Ravi Naidu
Abstract:
Over the last century, rapid urbanization, industrial emissions, and mining activities have resulted in widespread contamination of the environment by heavy metal(loid)s. Arsenic (As) is a toxic metalloid belonging to group 15 of the periodic table, which occurs naturally at low concentrations in soils and the earth’s crust, although concentrations can be significantly elevated in natural systems as a result of dispersion from anthropogenic sources, e.g., mining activities. Bioavailability is the fraction of a contaminant in soils that is available for uptake by plants, food chains, and humans and therefore presents the greatest risk to terrestrial ecosystems. Numerous attempts have been made to establish in situ and ex-situ technologies of remedial action for remediation of arsenic-contaminated soils. In situ stabilization techniques are based on deactivation or chemical immobilization of metalloid(s) in soil by means of soil amendments, which consequently reduce the bioavailability (for biota) and bioaccessibility (for humans) of metalloids due to the formation of low-solubility products or precipitates. This study investigated the effectiveness of two different types of synthetic manganese and iron oxides (birnessite and goethite) for stabilization of As in a soil spiked with 1000 mg kg⁻¹ of As and treated with 10% dosages of soil amendments. Birnessite was made using HCl and KMnO₄, and goethite was synthesized by the dropwise addition of KOH into Fe(NO₃) solution. The resulting contaminated soils were subjected to a series of chemical extraction studies including sequential extraction (BCR method), single-step extraction with distilled (DI) water, 2M HNO₃ and simplified bioaccessibility extraction tests (SBET) for estimation of bioaccessible fractions of As in two different soil fractions ( < 250 µm and < 2 mm). Concentrations of As in samples were measured using inductively coupled plasma mass spectrometry (ICP-MS). The results showed that soil with birnessite reduced bioaccessibility of As by up to 92% in both soil fractions. Furthermore, the results of single-step extractions revealed that the application of both birnessite and Goethite reduced DI water and HNO₃ extractable amounts of arsenic by 75, 75, 91, and 57%, respectively. Moreover, the results of the sequential extraction studies showed that both birnessite and goethite dramatically reduced the exchangeable fraction of As in soils. However, the amounts of recalcitrant fractions were higher in birnessite, and Goethite amended soils. The results revealed that the application of both birnessite and goethite significantly reduced bioavailability and the exchangeable fraction of As in contaminated soils, and therefore birnessite and Goethite amendments might be considered as promising adsorbents for stabilization and remediation of As contaminated soils.Keywords: arsenic, bioavailability, in situ stabilisation, metalloid(s) contaminated soils
Procedia PDF Downloads 13524504 A Policy Strategy for Building Energy Data Management in India
Authors: Shravani Itkelwar, Deepak Tewari, Bhaskar Natarajan
Abstract:
The energy consumption data plays a vital role in energy efficiency policy design, implementation, and impact assessment. Any demand-side energy management intervention's success relies on the availability of accurate, comprehensive, granular, and up-to-date data on energy consumption. The Building sector, including residential and commercial, is one of the largest consumers of energy in India after the Industrial sector. With economic growth and increasing urbanization, the building sector is projected to grow at an unprecedented rate, resulting in a 5.6 times escalation in energy consumption till 2047 compared to 2017. Therefore, energy efficiency interventions will play a vital role in decoupling the floor area growth and associated energy demand, thereby increasing the need for robust data. In India, multiple institutions are involved in the collection and dissemination of data. This paper focuses on energy consumption data management in the building sector in India for both residential and commercial segments. It evaluates the robustness of data available through administrative and survey routes to estimate the key performance indicators and identify critical data gaps for making informed decisions. The paper explores several issues in the data, such as lack of comprehensiveness, non-availability of disaggregated data, the discrepancy in different data sources, inconsistent building categorization, and others. The identified data gaps are justified with appropriate examples. Moreover, the paper prioritizes required data in order of relevance to policymaking and groups it into "available," "easy to get," and "hard to get" categories. The paper concludes with recommendations to address the data gaps by leveraging digital initiatives, strengthening institutional capacity, institutionalizing exclusive building energy surveys, and standardization of building categorization, among others, to strengthen the management of building sector energy consumption data.Keywords: energy data, energy policy, energy efficiency, buildings
Procedia PDF Downloads 18524503 A Survey on Data-Centric and Data-Aware Techniques for Large Scale Infrastructures
Authors: Silvina Caíno-Lores, Jesús Carretero
Abstract:
Large scale computing infrastructures have been widely developed with the core objective of providing a suitable platform for high-performance and high-throughput computing. These systems are designed to support resource-intensive and complex applications, which can be found in many scientific and industrial areas. Currently, large scale data-intensive applications are hindered by the high latencies that result from the access to vastly distributed data. Recent works have suggested that improving data locality is key to move towards exascale infrastructures efficiently, as solutions to this problem aim to reduce the bandwidth consumed in data transfers, and the overheads that arise from them. There are several techniques that attempt to move computations closer to the data. In this survey we analyse the different mechanisms that have been proposed to provide data locality for large scale high-performance and high-throughput systems. This survey intends to assist scientific computing community in understanding the various technical aspects and strategies that have been reported in recent literature regarding data locality. As a result, we present an overview of locality-oriented techniques, which are grouped in four main categories: application development, task scheduling, in-memory computing and storage platforms. Finally, the authors include a discussion on future research lines and synergies among the former techniques.Keywords: data locality, data-centric computing, large scale infrastructures, cloud computing
Procedia PDF Downloads 25924502 Wind Speed Data Analysis in Colombia in 2013 and 2015
Authors: Harold P. Villota, Alejandro Osorio B.
Abstract:
The energy meteorology is an area for study energy complementarity and the use of renewable sources in interconnected systems. Due to diversify the energy matrix in Colombia with wind sources, is necessary to know the data bases about this one. However, the time series given by 260 automatic weather stations have empty, and no apply data, so the purpose is to fill the time series selecting two years to characterize, impute and use like base to complete the data between 2005 and 2020.Keywords: complementarity, wind speed, renewable, colombia, characteri, characterization, imputation
Procedia PDF Downloads 164