Search results for: multi-source data fusion
24832 Multivariate Assessment of Mathematics Test Scores of Students in Qatar
Authors: Ali Rashash Alzahrani, Elizabeth Stojanovski
Abstract:
Data on various aspects of education are collected at the institutional and government level regularly. In Australia, for example, students at various levels of schooling undertake examinations in numeracy and literacy as part of NAPLAN testing, enabling longitudinal assessment of such data as well as comparisons between schools and states within Australia. Another source of educational data collected internationally is via the PISA study which collects data from several countries when students are approximately 15 years of age and enables comparisons in the performance of science, mathematics and English between countries as well as ranking of countries based on performance in these standardised tests. As well as student and school outcomes based on the tests taken as part of the PISA study, there is a wealth of other data collected in the study including parental demographics data and data related to teaching strategies used by educators. Overall, an abundance of educational data is available which has the potential to be used to help improve educational attainment and teaching of content in order to improve learning outcomes. A multivariate assessment of such data enables multiple variables to be considered simultaneously and will be used in the present study to help develop profiles of students based on performance in mathematics using data obtained from the PISA study.Keywords: cluster analysis, education, mathematics, profiles
Procedia PDF Downloads 12324831 Impact of Western Music Instruments on Indian Classical Music
Authors: Hukam Chand
Abstract:
Over the past few years, the performance of Indian classical music has been improved a lot due to the technical inclusion of western instruments. Infect, the Indian classical music is all about raags which portray a mood and sentiments expressed through a microtonal scale based on natural harmonic series. And, most of the western instruments are not based on natural harmonic series and the tonal system is the only system which has considerable influence on the Indian classical music. However, the use of western instruments has been growing day by day in one way or the other by the Indian artists due to their quality of harmony. As a result of which, there are some common instruments such as harmonium, violin, guitar, saxophone, synthesizer which are being used commonly by Indian and western artists. On the other hand, a lot of fusion has taken place in the music of both sides due to the similar characteristics in their instruments. For example, harmonium which was originally the western instrument has now acquired an important position in Indian classical music to perform raags. Besides, a lot of suggestions for improving in the Indian music have been given by the artists for technical modification in the western instruments to cater the needs of Indian music through melody approach. Pt. Vishav Mohan Bhatt an Indian musician has developed Mohan Veena (called guitar) to perform raags. N. Rajam the Indian lady Violinist has made a remarkable work on Indian classical music by accompanied with vocal music. The purpose of the present research paper is to highlight the changes in Indian Classical Music through performance by using modified western music instruments.Keywords: Indian classical music, Western instruments, harmonium, guitar, Violin and impact
Procedia PDF Downloads 51924830 Dataset Quality Index:Development of Composite Indicator Based on Standard Data Quality Indicators
Authors: Sakda Loetpiparwanich, Preecha Vichitthamaros
Abstract:
Nowadays, poor data quality is considered one of the majority costs for a data project. The data project with data quality awareness almost as much time to data quality processes while data project without data quality awareness negatively impacts financial resources, efficiency, productivity, and credibility. One of the processes that take a long time is defining the expectations and measurements of data quality because the expectation is different up to the purpose of each data project. Especially, big data project that maybe involves with many datasets and stakeholders, that take a long time to discuss and define quality expectations and measurements. Therefore, this study aimed at developing meaningful indicators to describe overall data quality for each dataset to quick comparison and priority. The objectives of this study were to: (1) Develop a practical data quality indicators and measurements, (2) Develop data quality dimensions based on statistical characteristics and (3) Develop Composite Indicator that can describe overall data quality for each dataset. The sample consisted of more than 500 datasets from public sources obtained by random sampling. After datasets were collected, there are five steps to develop the Dataset Quality Index (SDQI). First, we define standard data quality expectations. Second, we find any indicators that can measure directly to data within datasets. Thirdly, each indicator aggregates to dimension using factor analysis. Next, the indicators and dimensions were weighted by an effort for data preparing process and usability. Finally, the dimensions aggregate to Composite Indicator. The results of these analyses showed that: (1) The developed useful indicators and measurements contained ten indicators. (2) the developed data quality dimension based on statistical characteristics, we found that ten indicators can be reduced to 4 dimensions. (3) The developed Composite Indicator, we found that the SDQI can describe overall datasets quality of each dataset and can separate into 3 Level as Good Quality, Acceptable Quality, and Poor Quality. The conclusion, the SDQI provide an overall description of data quality within datasets and meaningful composition. We can use SQDI to assess for all data in the data project, effort estimation, and priority. The SDQI also work well with Agile Method by using SDQI to assessment in the first sprint. After passing the initial evaluation, we can add more specific data quality indicators into the next sprint.Keywords: data quality, dataset quality, data quality management, composite indicator, factor analysis, principal component analysis
Procedia PDF Downloads 13824829 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm
Authors: Ameur Abdelkader, Abed Bouarfa Hafida
Abstract:
Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.Keywords: predictive analysis, big data, predictive analysis algorithms, CART algorithm
Procedia PDF Downloads 13924828 Hierarchical Clustering Algorithms in Data Mining
Authors: Z. Abdullah, A. R. Hamdan
Abstract:
Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the areas in data mining and it can be classified into partition, hierarchical, density based, and grid-based. Therefore, in this paper, we do a survey and review for four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON, and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems, as well as deriving more robust and scalable algorithms for clustering.Keywords: clustering, unsupervised learning, algorithms, hierarchical
Procedia PDF Downloads 88324827 Dissimilarity Measure for General Histogram Data and Its Application to Hierarchical Clustering
Authors: K. Umbleja, M. Ichino
Abstract:
Symbolic data mining has been developed to analyze data in very large datasets. It is also useful in cases when entry specific details should remain hidden. Symbolic data mining is quickly gaining popularity as datasets in need of analyzing are becoming ever larger. One type of such symbolic data is a histogram, which enables to save huge amounts of information into a single variable with high-level of granularity. Other types of symbolic data can also be described in histograms, therefore making histogram a very important and general symbolic data type - a method developed for histograms - can also be applied to other types of symbolic data. Due to its complex structure, analyzing histograms is complicated. This paper proposes a method, which allows to compare two histogram-valued variables and therefore find a dissimilarity between two histograms. Proposed method uses the Ichino-Yaguchi dissimilarity measure for mixed feature-type data analysis as a base and develops a dissimilarity measure specifically for histogram data, which allows to compare histograms with different number of bins and bin widths (so called general histogram). Proposed dissimilarity measure is then used as a measure for clustering. Furthermore, linkage method based on weighted averages is proposed with the concept of cluster compactness to measure the quality of clustering. The method is then validated with application on real datasets. As a result, the proposed dissimilarity measure is found producing adequate and comparable results with general histograms without the loss of detail or need to transform the data.Keywords: dissimilarity measure, hierarchical clustering, histograms, symbolic data analysis
Procedia PDF Downloads 16024826 WiFi Data Offloading: Bundling Method in a Canvas Business Model
Authors: Majid Mokhtarnia, Alireza Amini
Abstract:
Mobile operators deal with increasing in the data traffic as a critical issue. As a result, a vital responsibility of the operators is to deal with such a trend in order to create added values. This paper addresses a bundling method in a Canvas business model in a WiFi Data Offloading (WDO) strategy by which some elements of the model may be affected. In the proposed method, it is supposed to sell a number of data packages for subscribers in which there are some packages with a free given volume of data-offloaded WiFi complimentary. The paper on hands analyses this method in the views of attractiveness and profitability. The results demonstrate that the quality of implementation of the WDO strongly affects the final result and helps the decision maker to make the best one.Keywords: bundling, canvas business model, telecommunication, WiFi data offloading
Procedia PDF Downloads 19824825 Functional Cell Surface Display Using Ice Nucleation Protein from Erwina ananas on Escherischia coli
Authors: Mei Yuin Joanne Wee, Rosli Md. Illias
Abstract:
Cell surface display is the expression of a protein with an anchoring motif on the surface of the cell. This approach offers advantages when used in bioconversion in terms of easier purification steps and more efficient enzymatic reaction. A surface display system using ice nucleation protein (InaA) from Erwina ananas as an anchoring motif has been constructed to display xylanase (xyl) on the surface of Escherischia coli. The InaA was truncated so that it is made up of the N- and C-terminal domain (INPANC-xyl) and it has successfully directed xylanase to the surface of the cell. A study was also done on xylanase fused to two other ice nucleation proteins, InaK (INPKNC-xyl) and InaZ (INPZNC-xyl) from Pseudomonas syringae KCTC 1832 and Pseudomonas syringae S203 respectively. Surface localization of the fusion protein was verified using SDS-PAGE and Western blot on the cell fractions and all anchoring motifs were successfully displayed on the outer membrane of E. coli. Upon comparison, whole-cell activity of INPANC-xyl was more than six and five times higher than INPKNC-xyl and INPZNC-xyl respectively. Furthermore, the expression of INPANC-xyl on the surface of E. coli did not inhibit the growth of the cell. This is the first report of surface display system using ice nucleation protein, InaA from E. ananas. From this study, this anchoring motif offers an attractive alternative to the current surface display systems.Keywords: cell surface display, Escherischia coli, ice nucleation protein, xylanase
Procedia PDF Downloads 38924824 Distributed Perceptually Important Point Identification for Time Series Data Mining
Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung
Abstract:
In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining
Procedia PDF Downloads 43124823 Knowledge Discovery and Data Mining Techniques in Textile Industry
Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler
Abstract:
This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.Keywords: data mining, textile production, decision trees, classification
Procedia PDF Downloads 34924822 Numerical Analysis of Crack's Effects in a Dissimilar Welded Joint
Authors: Daniel N. L. Alves, Marcelo C. Rodrigues, Jose G. de Almeida
Abstract:
The search for structural efficiency in mechanical systems has been strongly exerted with aim of economic optimization and structural safety. As soon, to understand the response of materials when submitted to adverse conditions is essential to design a safety project. This work investigates the presence of cracks in dissimilar welded joints (DWJ). Its fracture toughness responses depend upon the heterogeneity present in these joints. Thus, this work aim analyzing the behavior of the crack tip zone located in a buttery dissimilar welded joint (ASTM A-36, Inconel, and AISI 8630 M) used in the union of pipes present in the offshore oil production lines. The crack was placed 1 mm from fusion line (FL) Inconel-AISI 8630 M toward the AISI 8630 M. Finite Element Method (FEM) was used to analyze stress and strain fields generated during the loading imposed on the specimen. It was possible observing critical stress area by the numerical tool as well as a preferential plastic flow was also observed in the sample of dissimilar welded joint, which can be considered a harbinger of the crack growth path. The results obtained through numerical analysis showed a convergent behavior in relation to the plastic flow, qualitatively and quantitatively, in agreement with previous performed.Keywords: crack, dissimilar welded joint, numerical analysis, strain field, the stress field
Procedia PDF Downloads 17024821 Investigation of Delivery of Triple Play Data in GE-PON Fiber to the Home Network
Authors: Ashima Anurag Sharma
Abstract:
Optical fiber based networks can deliver performance that can support the increasing demands for high speed connections. One of the new technologies that have emerged in recent years is Passive Optical Networks. This research paper is targeted to show the simultaneous delivery of triple play service (data, voice, and video). The comparison between various data rates is presented. It is demonstrated that as we increase the data rate, number of users to be decreases due to increase in bit error rate.Keywords: BER, PON, TDMPON, GPON, CWDM, OLT, ONT
Procedia PDF Downloads 52724820 Microarray Gene Expression Data Dimensionality Reduction Using PCA
Authors: Fuad M. Alkoot
Abstract:
Different experimental technologies such as microarray sequencing have been proposed to generate high-resolution genetic data, in order to understand the complex dynamic interactions between complex diseases and the biological system components of genes and gene products. However, the generated samples have a very large dimension reaching thousands. Therefore, hindering all attempts to design a classifier system that can identify diseases based on such data. Additionally, the high overlap in the class distributions makes the task more difficult. The data we experiment with is generated for the identification of autism. It includes 142 samples, which is small compared to the large dimension of the data. The classifier systems trained on this data yield very low classification rates that are almost equivalent to a guess. We aim at reducing the data dimension and improve it for classification. Here, we experiment with applying a multistage PCA on the genetic data to reduce its dimensionality. Results show a significant improvement in the classification rates which increases the possibility of building an automated system for autism detection.Keywords: PCA, gene expression, dimensionality reduction, classification, autism
Procedia PDF Downloads 55924819 Data Science-Based Key Factor Analysis and Risk Prediction of Diabetic
Authors: Fei Gao, Rodolfo C. Raga Jr.
Abstract:
This research proposal will ascertain the major risk factors for diabetes and to design a predictive model for risk assessment. The project aims to improve diabetes early detection and management by utilizing data science techniques, which may improve patient outcomes and healthcare efficiency. The phase relation values of each attribute were used to analyze and choose the attributes that might influence the examiner's survival probability using Diabetes Health Indicators Dataset from Kaggle’s data as the research data. We compare and evaluate eight machine learning algorithms. Our investigation begins with comprehensive data preprocessing, including feature engineering and dimensionality reduction, aimed at enhancing data quality. The dataset, comprising health indicators and medical data, serves as a foundation for training and testing these algorithms. A rigorous cross-validation process is applied, and we assess their performance using five key metrics like accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). After analyzing the data characteristics, investigate their impact on the likelihood of diabetes and develop corresponding risk indicators.Keywords: diabetes, risk factors, predictive model, risk assessment, data science techniques, early detection, data analysis, Kaggle
Procedia PDF Downloads 7324818 A Methodology to Integrate Data in the Company Based on the Semantic Standard in the Context of Industry 4.0
Authors: Chang Qin, Daham Mustafa, Abderrahmane Khiat, Pierre Bienert, Paulo Zanini
Abstract:
Nowadays, companies are facing lots of challenges in the process of digital transformation, which can be a complex and costly undertaking. Digital transformation involves the collection and analysis of large amounts of data, which can create challenges around data management and governance. Furthermore, it is also challenged to integrate data from multiple systems and technologies. Although with these pains, companies are still pursuing digitalization because by embracing advanced technologies, companies can improve efficiency, quality, decision-making, and customer experience while also creating different business models and revenue streams. In this paper, the issue that data is stored in data silos with different schema and structures is focused. The conventional approaches to addressing this issue involve utilizing data warehousing, data integration tools, data standardization, and business intelligence tools. However, these approaches primarily focus on the grammar and structure of the data and neglect the importance of semantic modeling and semantic standardization, which are essential for achieving data interoperability. In this session, the challenge of data silos in Industry 4.0 is addressed by developing a semantic modeling approach compliant with Asset Administration Shell (AAS) models as an efficient standard for communication in Industry 4.0. The paper highlights how our approach can facilitate the data mapping process and semantic lifting according to existing industry standards such as ECLASS and other industrial dictionaries. It also incorporates the Asset Administration Shell technology to model and map the company’s data and utilize a knowledge graph for data storage and exploration.Keywords: data interoperability in industry 4.0, digital integration, industrial dictionary, semantic modeling
Procedia PDF Downloads 9224817 Big Data Analytics and Data Security in the Cloud via Fully Homomorphic Encryption
Authors: Waziri Victor Onomza, John K. Alhassan, Idris Ismaila, Noel Dogonyaro Moses
Abstract:
This paper describes the problem of building secure computational services for encrypted information in the Cloud Computing without decrypting the encrypted data; therefore, it meets the yearning of computational encryption algorithmic aspiration model that could enhance the security of big data for privacy, confidentiality, availability of the users. The cryptographic model applied for the computational process of the encrypted data is the Fully Homomorphic Encryption Scheme. We contribute theoretical presentations in high-level computational processes that are based on number theory and algebra that can easily be integrated and leveraged in the Cloud computing with detail theoretic mathematical concepts to the fully homomorphic encryption models. This contribution enhances the full implementation of big data analytics based cryptographic security algorithm.Keywords: big data analytics, security, privacy, bootstrapping, homomorphic, homomorphic encryption scheme
Procedia PDF Downloads 37624816 Protecting Privacy and Data Security in Online Business
Authors: Bilquis Ferdousi
Abstract:
With the exponential growth of the online business, the threat to consumers’ privacy and data security has become a serious challenge. This literature review-based study focuses on a better understanding of those threats and what legislative measures have been taken to address those challenges. Research shows that people are increasingly involved in online business using different digital devices and platforms, although this practice varies based on age groups. The threat to consumers’ privacy and data security is a serious hindrance in developing trust among consumers in online businesses. There are some legislative measures taken at the federal and state level to protect consumers’ privacy and data security. The study was based on an extensive review of current literature on protecting consumers’ privacy and data security and legislative measures that have been taken.Keywords: privacy, data security, legislation, online business
Procedia PDF Downloads 10424815 Flowing Online Vehicle GPS Data Clustering Using a New Parallel K-Means Algorithm
Authors: Orhun Vural, Oguz Bayat, Rustu Akay, Osman N. Ucan
Abstract:
This study presents a new parallel approach clustering of GPS data. Evaluation has been made by comparing execution time of various clustering algorithms on GPS data. This paper aims to propose a parallel based on neighborhood K-means algorithm to make it faster. The proposed parallelization approach assumes that each GPS data represents a vehicle and to communicate between vehicles close to each other after vehicles are clustered. This parallelization approach has been examined on different sized continuously changing GPS data and compared with serial K-means algorithm and other serial clustering algorithms. The results demonstrated that proposed parallel K-means algorithm has been shown to work much faster than other clustering algorithms.Keywords: parallel k-means algorithm, parallel clustering, clustering algorithms, clustering on flowing data
Procedia PDF Downloads 21924814 An Analysis of Privacy and Security for Internet of Things Applications
Authors: Dhananjay Singh, M. Abdullah-Al-Wadud
Abstract:
The Internet of Things is a concept of a large scale ecosystem of wireless actuators. The actuators are defined as things in the IoT, those which contribute or produces some data to the ecosystem. However, ubiquitous data collection, data security, privacy preserving, large volume data processing, and intelligent analytics are some of the key challenges into the IoT technologies. In order to solve the security requirements, challenges and threats in the IoT, we have discussed a message authentication mechanism for IoT applications. Finally, we have discussed data encryption mechanism for messages authentication before propagating into IoT networks.Keywords: Internet of Things (IoT), message authentication, privacy, security
Procedia PDF Downloads 38124813 High Temperature Properties of Diffusion Brazed Joints of in 939 Ni-Base Superalloy
Authors: Hyunki Kang, Hi Won Jeong
Abstract:
The gas turbine operates for a long period of time under harsh, cyclic conditions of high temperature and pressure, where high turbine inlet temperature (TIT) can range from 1273 to 1873K. Therefore, Ni-base superalloys such as IN738, IN939, Rene 45, Rene 71, Rene 80, Mar M 247, CM 247, and CMSX-4 with excellent mechanical properties and resistance to creep, corrosion and oxidation at high temperatures are indeed used. Among the alloying additions for these alloys, aluminum (Al) and titanium (Ti) form gamma prime and enhance the high-temperature properties. However, when crack-damaged high-temperature turbine components such as blade and vane are repaired by fusion welding, they cause cracks. For example, when arc welding is applied to certain superalloys that contain Al and Ti with more than 3 wt.% and T3.5 wt%, respectively, such as IN738, IN939, Rene 80, Mar M 247, and CM 247, aging cracks occur. Therefore, repair technologies using diffusion brazing, which has less heat input into the base material, are being developed. Analysis of microstructural evolution of the brazed joints with a base metal of IN 939 Ni-base superalloy using brazing different filler metals was also carried out using X-ray diffraction, OEM, SEM-EDS, and EPMA. Stress rupture and high-temperature tensile strength properties were also measured to analyze the effects of different brazing heat cycles. The boron amount in the diffusion-affected zone (DAZ) was decreased towards the base metal and the formation of borides at grain boundaries was detected through EPMA.Keywords: gas turbine, diffusion brazing, superalloy, gas turbine repair
Procedia PDF Downloads 3924812 Cognitive Science Based Scheduling in Grid Environment
Authors: N. D. Iswarya, M. A. Maluk Mohamed, N. Vijaya
Abstract:
Grid is infrastructure that allows the deployment of distributed data in large size from multiple locations to reach a common goal. Scheduling data intensive applications becomes challenging as the size of data sets are very huge in size. Only two solutions exist in order to tackle this challenging issue. First, computation which requires huge data sets to be processed can be transferred to the data site. Second, the required data sets can be transferred to the computation site. In the former scenario, the computation cannot be transferred since the servers are storage/data servers with little or no computational capability. Hence, the second scenario can be considered for further exploration. During scheduling, transferring huge data sets from one site to another site requires more network bandwidth. In order to mitigate this issue, this work focuses on incorporating cognitive science in scheduling. Cognitive Science is the study of human brain and its related activities. Current researches are mainly focused on to incorporate cognitive science in various computational modeling techniques. In this work, the problem solving approach of human brain is studied and incorporated during the data intensive scheduling in grid environments. Here, a cognitive engine is designed and deployed in various grid sites. The intelligent agents present in CE will help in analyzing the request and creating the knowledge base. Depending upon the link capacity, decision will be taken whether to transfer data sets or to partition the data sets. Prediction of next request is made by the agents to serve the requesting site with data sets in advance. This will reduce the data availability time and data transfer time. Replica catalog and Meta data catalog created by the agents assist in decision making process.Keywords: data grid, grid workflow scheduling, cognitive artificial intelligence
Procedia PDF Downloads 39224811 Heritage and Tourism in the Era of Big Data: Analysis of Chinese Cultural Tourism in Catalonia
Authors: Xinge Liao, Francesc Xavier Roige Ventura, Dolores Sanchez Aguilera
Abstract:
With the development of the Internet, the study of tourism behavior has rapidly expanded from the traditional physical market to the online market. Data on the Internet is characterized by dynamic changes, and new data appear all the time. In recent years the generation of a large volume of data was characterized, such as forums, blogs, and other sources, which have expanded over time and space, together they constitute large-scale Internet data, known as Big Data. This data of technological origin that derives from the use of devices and the activity of multiple users is becoming a source of great importance for the study of geography and the behavior of tourists. The study will focus on cultural heritage tourist practices in the context of Big Data. The research will focus on exploring the characteristics and behavior of Chinese tourists in relation to the cultural heritage of Catalonia. Geographical information, target image, perceptions in user-generated content will be studied through data analysis from Weibo -the largest social networks of blogs in China. Through the analysis of the behavior of heritage tourists in the Big Data environment, this study will understand the practices (activities, motivations, perceptions) of cultural tourists and then understand the needs and preferences of tourists in order to better guide the sustainable development of tourism in heritage sites.Keywords: Barcelona, Big Data, Catalonia, cultural heritage, Chinese tourism market, tourists’ behavior
Procedia PDF Downloads 13724810 Towards A Framework for Using Open Data for Accountability: A Case Study of A Program to Reduce Corruption
Authors: Darusalam, Jorish Hulstijn, Marijn Janssen
Abstract:
Media has revealed a variety of corruption cases in the regional and local governments all over the world. Many governments pursued many anti-corruption reforms and have created a system of checks and balances. Three types of corruption are faced by citizens; administrative corruption, collusion and extortion. Accountability is one of the benchmarks for building transparent government. The public sector is required to report the results of the programs that have been implemented so that the citizen can judge whether the institution has been working such as economical, efficient and effective. Open Data is offering solutions for the implementation of good governance in organizations who want to be more transparent. In addition, Open Data can create transparency and accountability to the community. The objective of this paper is to build a framework of open data for accountability to combating corruption. This paper will investigate the relationship between open data, and accountability as part of anti-corruption initiatives. This research will investigate the impact of open data implementation on public organization.Keywords: open data, accountability, anti-corruption, framework
Procedia PDF Downloads 33624809 The Taste of Macau: An Exploratory Study of Destination Food Image
Authors: Jianlun Zhang, Christine Lim
Abstract:
Local food is one of the most attractive elements to tourists. The role of local cuisine in destination branding is very important because it is the distinctive identity that helps tourists remember the destination. The objectives of this study are: (1) Test the direct relation between the cognitive image of destination food and tourists’ intention to eat local food. (2) Examine the mediating effect of tourists’ desire to try destination food on the relationship between the cognitive image of local food and tourists’ intention to eat destination food. (3) Study the moderating effect of tourists’ perceived difficulties in finding local food on the relationship between tourists’ desire to try destination food and tourists’ intention to eat local food. To achieve the goals of this study, Macanese cuisine is selected as the destination food. Macau is located in Southeastern China and is a former colonial city of Portugal. The taste and texture of Macanese cuisine are unique because it is a fusion of cuisine from many countries and regions of mainland China. As people travel to seek authentically exotic experience, it is important to investigate if the food image of Macau leaves a good impression on tourists and motivate them to try local cuisine. A total of 449 Chinese tourists were involved in this study. To analyze the data collected, partial least square-structural equation modelling (PLS-SEM) technique is employed. Results suggest that the cognitive image of Macanese cuisine has a direct effect on tourists’ intention to eat Macanese cuisine. Tourists’ desire to try Macanese cuisine mediates the cognitive image-intention relationship. Tourists’ perceived difficulty of finding Macanese cuisine moderates the desire-intention relationship. The lower tourists’ perceived difficulty in finding Macanese cuisine is, the stronger the desire-intention relationship it will be. There are several practical implications of this study. First, the government tourism website can develop an authentic storyline about the evolvement of local cuisine, which provides an opportunity for tourists to taste the history of the destination and create a novel experience for them. Second, the government should consider the development of food events, restaurants, and hawker businesses. Third, to lower tourists’ perceived difficulty in finding local cuisine, there should be locations of restaurants and hawker stalls with clear instructions for finding them on the websites of the government tourism office, popular tourism sites, and public transportation stations in the destination. Fourth, in the post-COVID-19 era, travel risk will be a major concern for tourists. Therefore, when promoting local food, the government tourism website should post images that show food safety and hygiene.Keywords: cognitive image of destination food, desire to try destination food, intention to eat food in the destination, perceived difficulties of finding local cuisine, PLS-SEM
Procedia PDF Downloads 18924808 Syndromic Surveillance Framework Using Tweets Data Analytics
Authors: David Ming Liu, Benjamin Hirsch, Bashir Aden
Abstract:
Syndromic surveillance is to detect or predict disease outbreaks through the analysis of medical sources of data. Using social media data like tweets to do syndromic surveillance becomes more and more popular with the aid of open platform to collect data and the advantage of microblogging text and mobile geographic location features. In this paper, a Syndromic Surveillance Framework is presented with machine learning kernel using tweets data analytics. Influenza and the three cities Abu Dhabi, Al Ain and Dubai of United Arabic Emirates are used as the test disease and trial areas. Hospital cases data provided by the Health Authority of Abu Dhabi (HAAD) are used for the correlation purpose. In our model, Latent Dirichlet allocation (LDA) engine is adapted to do supervised learning classification and N-Fold cross validation confusion matrix are given as the simulation results with overall system recall 85.595% performance achieved.Keywords: Syndromic surveillance, Tweets, Machine Learning, data mining, Latent Dirichlet allocation (LDA), Influenza
Procedia PDF Downloads 11524807 Analysis of Urban Population Using Twitter Distribution Data: Case Study of Makassar City, Indonesia
Authors: Yuyun Wabula, B. J. Dewancker
Abstract:
In the past decade, the social networking app has been growing very rapidly. Geolocation data is one of the important features of social media that can attach the user's location coordinate in the real world. This paper proposes the use of geolocation data from the Twitter social media application to gain knowledge about urban dynamics, especially on human mobility behavior. This paper aims to explore the relation between geolocation Twitter with the existence of people in the urban area. Firstly, the study will analyze the spread of people in the particular area, within the city using Twitter social media data. Secondly, we then match and categorize the existing place based on the same individuals visiting. Then, we combine the Twitter data from the tracking result and the questionnaire data to catch the Twitter user profile. To do that, we used the distribution frequency analysis to learn the visitors’ percentage. To validate the hypothesis, we compare it with the local population statistic data and land use mapping released by the city planning department of Makassar local government. The results show that there is the correlation between Twitter geolocation and questionnaire data. Thus, integration the Twitter data and survey data can reveal the profile of the social media users.Keywords: geolocation, Twitter, distribution analysis, human mobility
Procedia PDF Downloads 31424806 Analysis and Rule Extraction of Coronary Artery Disease Data Using Data Mining
Authors: Rezaei Hachesu Peyman, Oliyaee Azadeh, Salahzadeh Zahra, Alizadeh Somayyeh, Safaei Naser
Abstract:
Coronary Artery Disease (CAD) is one major cause of disability in adults and one main cause of death in developed. In this study, data mining techniques including Decision Trees, Artificial neural networks (ANNs), and Support Vector Machine (SVM) analyze CAD data. Data of 4948 patients who had suffered from heart diseases were included in the analysis. CAD is the target variable, and 24 inputs or predictor variables are used for the classification. The performance of these techniques is compared in terms of sensitivity, specificity, and accuracy. The most significant factor influencing CAD is chest pain. Elderly males (age > 53) have a high probability to be diagnosed with CAD. SVM algorithm is the most useful way for evaluation and prediction of CAD patients as compared to non-CAD ones. Application of data mining techniques in analyzing coronary artery diseases is a good method for investigating the existing relationships between variables.Keywords: classification, coronary artery disease, data-mining, knowledge discovery, extract
Procedia PDF Downloads 65724805 Sensor Data Analysis for a Large Mining Major
Authors: Sudipto Shanker Dasgupta
Abstract:
One of the largest mining companies wanted to look at health analytics for their driverless trucks. These trucks were the key to their supply chain logistics. The automated trucks had multi-level sub-assemblies which would send out sensor information. The use case that was worked on was to capture the sensor signal from the truck subcomponents and analyze the health of the trucks from repair and replacement purview. Open source software was used to stream the data into a clustered Hadoop setup in Amazon Web Services cloud and Apache Spark SQL was used to analyze the data. All of this was achieved through a 10 node amazon 32 core, 64 GB RAM setup real-time analytics was achieved on ‘300 million records’. To check the scalability of the system, the cluster was increased to 100 node setup. This talk will highlight how Open Source software was used to achieve the above use case and the insights on the high data throughput on a cloud set up.Keywords: streaming analytics, data science, big data, Hadoop, high throughput, sensor data
Procedia PDF Downloads 40224804 Effects of in silico (Virtual Lab) And in vitro (inside the Classroom) Labs in the Academic Performance of Senior High School Students in General Biology
Authors: Mark Archei O. Javier
Abstract:
The Fourth Industrial Revolution (FIR) is a major industrial era characterized by the fusion of technologies that is blurring the lines between the physical, digital, and biological spheres. Since this era teaches us how to thrive in the fast-paced developing world, it is important to be able to adapt. With this, there is a need to make learning and teaching in the bioscience laboratory more challenging and engaging. The goal of the research is to find out if using in silico and in vitro laboratory activities compared to the conventional conduct laboratory activities would have positive impacts on the academic performance of the learners. The potential contribution of the research is that it would improve the teachers’ methods in delivering the content to the students when it comes to topics that need laboratory activities. This study will develop a method by which teachers can provide learning materials to the students. A one-tailed t-Test for independent samples was used to determine the significant difference in the pre- and post-test scores of students. The tests of hypotheses were done at a 0.05 level of significance. Based on the results of the study, the gain scores of the experimental group are greater than the gain scores of the control group. This implies that using in silico and in vitro labs for the experimental group is more effective than the conventional method of doing laboratory activities.Keywords: academic performance, general biology, in silico laboratory, in vivo laboratory, virtual laboratory
Procedia PDF Downloads 18824803 Data-Centric Anomaly Detection with Diffusion Models
Authors: Sheldon Liu, Gordon Wang, Lei Liu, Xuefeng Liu
Abstract:
Anomaly detection, also referred to as one-class classification, plays a crucial role in identifying product images that deviate from the expected distribution. This study introduces Data-centric Anomaly Detection with Diffusion Models (DCADDM), presenting a systematic strategy for data collection and further diversifying the data with image generation via diffusion models. The algorithm addresses data collection challenges in real-world scenarios and points toward data augmentation with the integration of generative AI capabilities. The paper explores the generation of normal images using diffusion models. The experiments demonstrate that with 30% of the original normal image size, modeling in an unsupervised setting with state-of-the-art approaches can achieve equivalent performances. With the addition of generated images via diffusion models (10% equivalence of the original dataset size), the proposed algorithm achieves better or equivalent anomaly localization performance.Keywords: diffusion models, anomaly detection, data-centric, generative AI
Procedia PDF Downloads 81