Search results for: learning using labeled and unlabelled data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8800

Search results for: learning using labeled and unlabelled data

7360 Development of Prediction Models of Day-Ahead Hourly Building Electricity Consumption and Peak Power Demand Using the Machine Learning Method

Authors: Dalin Si, Azizan Aziz, Bertrand Lasternas

Abstract:

To encourage building owners to purchase electricity at the wholesale market and reduce building peak demand, this study aims to develop models that predict day-ahead hourly electricity consumption and demand using artificial neural network (ANN) and support vector machine (SVM). All prediction models are built in Python, with tool Scikit-learn and Pybrain. The input data for both consumption and demand prediction are time stamp, outdoor dry bulb temperature, relative humidity, air handling unit (AHU), supply air temperature and solar radiation. Solar radiation, which is unavailable a day-ahead, is predicted at first, and then this estimation is used as an input to predict consumption and demand. Models to predict consumption and demand are trained in both SVM and ANN, and depend on cooling or heating, weekdays or weekends. The results show that ANN is the better option for both consumption and demand prediction. It can achieve 15.50% to 20.03% coefficient of variance of root mean square error (CVRMSE) for consumption prediction and 22.89% to 32.42% CVRMSE for demand prediction, respectively. To conclude, the presented models have potential to help building owners to purchase electricity at the wholesale market, but they are not robust when used in demand response control.

Keywords: Building energy prediction, data mining, demand response, electricity market.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2212
7359 The Design of English Materials to Communicate the Identity of Mueang District, Samut Songkram for Ecotourism

Authors: Kitda Praraththajariya

Abstract:

The main purpose of this research was to study how to communicate the identity of the Mueang district, SamutSongkram province for ecotourism. The qualitative data was collected through studying related materials, exploring the area, in-depth interviews with three groups of people: three directly responsible officers who were key informants of the district, twenty foreign tourists and five Thai tourist guides. A content analysis was used to analyze the qualitative data. The two main findings of the study were as follows: 1. The identity of Amphur (District) Mueang, SamutSongkram province. This establishment was near the Mouth of Maekong River for normal people and tourists, consisting of rest accommodations. There are restaurants where food and drinks are served, rich mangrove forests, Hoy Lod (Razor Clam) and mangrove trees. Don Hoy Lod, is characterized by muddy beaches, is a coastal wetland for Ramsar Site. It is at 1099th ranging where the greatest number of Hoy Lod (Razor Clam) can be seen from March to May each year. 2. The communication of the identity of AmphurMueang, SamutSongkram province which the researcher could find and design to present in English materials can be summed up in 4 items: 1) The history of AmphurMueang, SamutSongkram province 2) WatPhetSamutWorrawihan 3) The Learning source of Ecotourism: Don Hoy Lod and Mangrove forest 4) How to keep AmphurMueang, SamutSongkram province for ecotourism.

Keywords: Foreigner tourists, signified, semiotics, ecotourism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1854
7358 Development of Multimodal e-Slide Presentation to Support Self-Learning for the Visually Impaired

Authors: Rustam Asnawi, Wan Fatimah Wan Ahmad

Abstract:

Currently electronic slide (e-slide) is one of the most common styles in educational presentation. Unfortunately, the utilization of e-slide for the visually impaired is uncommon since they are unable to see the content of such e-slides which are usually composed of text, images and animation. This paper proposes a model for presenting e-slide in multimodal presentation i.e. using conventional slide concurrent with voicing, in both languages Malay and English. At the design level, live multimedia presentation concept is used, while at the implementation level several components are used. The text content of each slide is extracted using COM component, Microsoft Speech API for voicing the text in English language and the text in Malay language is voiced using dictionary approach. To support the accessibility, an auditory user interface is provided as an additional feature. A prototype of such model named as VSlide has been developed and introduced.

Keywords: presentation, self-learning, slide, visually impaired

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1579
7357 Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure

Authors: S.Aranganayagi, K.Thangavel

Abstract:

K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.

Keywords: Clustering, categorical data, K-Modes, weighted dissimilarity measure

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3698
7356 Collaborative Stylistic Group Project: A Drama Practical Analysis Application

Authors: Omnia F. Elkommos

Abstract:

In the course of teaching stylistics to undergraduate students of the Department of English Language and Literature, Faculty of Arts and Humanities, the linguistic tool kit of theories comes in handy and useful for the better understanding of the different literary genres: Poetry, drama, and short stories. In the present paper, a model of teaching of stylistics is compiled and suggested. It is a collaborative group project technique for use in the undergraduate diverse specialisms (Literature, Linguistics and Translation tracks) class. Students initially are introduced to the different linguistic tools and theories suitable for each literary genre. The second step is to apply these linguistic tools to texts. Students are required to watch videos performing the poems or play, for example, and search the net for interpretations of the texts by other authorities. They should be using a template (prepared by the researcher) that has guided questions leading students along in their analysis. Finally, a practical analysis would be written up using the practical analysis essay template (also prepared by the researcher). As per collaborative learning, all the steps include activities that are student-centered addressing differentiation and considering their three different specialisms. In the process of selecting the proper tools, the actual application and analysis discussion, students are given tasks that request their collaboration. They also work in small groups and the groups collaborate in seminars and group discussions. At the end of the course/module, students present their work also collaboratively and reflect and comment on their learning experience. The module/course uses a drama play that lends itself to the task: ‘The Bond’ by Amy Lowell and Robert Frost. The project results in an interpretation of its theme, characterization and plot. The linguistic tools are drawn from pragmatics, and discourse analysis among others.

Keywords: Applied linguistic theories, collaborative learning, cooperative principle, discourse analysis, drama analysis, group project, online acting performance, pragmatics, speech act theory, stylistics, technology enhanced learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1090
7355 Mobile Phone as a Tool for Data Collection in Field Research

Authors: Sandro Mourão, Karla Okada

Abstract:

The necessity of accurate and timely field data is shared among organizations engaged in fundamentally different activities, public services or commercial operations. Basically, there are three major components in the process of the qualitative research: data collection, interpretation and organization of data, and analytic process. Representative technological advancements in terms of innovation have been made in mobile devices (mobile phone, PDA-s, tablets, laptops, etc). Resources that can be potentially applied on the data collection activity for field researches in order to improve this process. This paper presents and discuss the main features of a mobile phone based solution for field data collection, composed of basically three modules: a survey editor, a server web application and a client mobile application. The data gathering process begins with the survey creation module, which enables the production of tailored questionnaires. The field workforce receives the questionnaire(s) on their mobile phones to collect the interviews responses and sending them back to a server for immediate analysis.

Keywords: Data Gathering, Field Research, Mobile Phone, Survey.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2067
7354 Factors Influencing Rote Student's Intention to Use WBL: Thailand Study

Authors: Watcharawalee Lertlum, Borworn Papasratorn

Abstract:

Conventional WBL is effective for meaningful student, because rote student learn by repeating without thinking or trying to understand. It is impossible to have full benefit from conventional WBL. Understanding of rote student-s intention and what influences it becomes important. Poorly designed user interface will discourage rote student-s cultivation and intention to use WBL. Thus, user interface design is an important factor especially when WBL is used as comprehensive replacement of conventional teaching. This research proposes the influencing factors that can enhance student-s intention to use the system. The enhanced TAM is used for evaluating the proposed factors. The research result points out that factors influencing rote student-s intention are Perceived Usefulness of Homepage Content Structure, Perceived User Friendly Interface, Perceived Hedonic Component, and Perceived (homepage) Visual Attractiveness.

Keywords: E-learning, Web-Based learning, Intention to use, Rote student, Influencing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1631
7353 On Pooling Different Levels of Data in Estimating Parameters of Continuous Meta-Analysis

Authors: N. R. N. Idris, S. Baharom

Abstract:

A meta-analysis may be performed using aggregate data (AD) or an individual patient data (IPD). In practice, studies may be available at both IPD and AD level. In this situation, both the IPD and AD should be utilised in order to maximize the available information. Statistical advantages of combining the studies from different level have not been fully explored. This study aims to quantify the statistical benefits of including available IPD when conducting a conventional summary-level meta-analysis. Simulated meta-analysis were used to assess the influence of the levels of data on overall meta-analysis estimates based on IPD-only, AD-only and the combination of IPD and AD (mixed data, MD), under different study scenario. The percentage relative bias (PRB), root mean-square-error (RMSE) and coverage probability were used to assess the efficiency of the overall estimates. The results demonstrate that available IPD should always be included in a conventional meta-analysis using summary level data as they would significantly increased the accuracy of the estimates.On the other hand, if more than 80% of the available data are at IPD level, including the AD does not provide significant differences in terms of accuracy of the estimates. Additionally, combining the IPD and AD has moderating effects on the biasness of the estimates of the treatment effects as the IPD tends to overestimate the treatment effects, while the AD has the tendency to produce underestimated effect estimates. These results may provide some guide in deciding if significant benefit is gained by pooling the two levels of data when conducting meta-analysis.

Keywords: Aggregate data, combined-level data, Individual patient data, meta analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1746
7352 Electricity Consumption Prediction Model using Neuro-Fuzzy System

Authors: Rahib Abiyev, Vasif H. Abiyev, C. Ardil

Abstract:

In this paper the development of neural network based fuzzy inference system for electricity consumption prediction is considered. The electricity consumption depends on number of factors, such as number of customers, seasons, type-s of customers, number of plants, etc. It is nonlinear process and can be described by chaotic time-series. The structure and algorithms of neuro-fuzzy system for predicting future values of electricity consumption is described. To determine the unknown coefficients of the system, the supervised learning algorithm is used. As a result of learning, the rules of neuro-fuzzy system are formed. The developed system is applied for predicting future values of electricity consumption of Northern Cyprus. The simulation of neuro-fuzzy system has been performed.

Keywords: Fuzzy logic, neural network, neuro-fuzzy system, neuro-fuzzy prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2024
7351 Experimental Model for Instruction of Pre-Service Teachers in ICT Tools and E-learning Environments

Authors: Rachel Baruch

Abstract:

This article describes the implementation of an experimental model for teaching ICT tools and digital environments in teachers training college. In most educational systems in the Western world, new programs were developed in order to bridge the digital gap between teachers and students. In spite of their achievements, these programs are limited due to several factors: The teachers in the schools implement new methods incorporating technological tools into the curriculum, but meanwhile the technology changes and advances. The interface of tools changes frequently, some tools disappear and new ones are invented. These conditions require an experimental model of training the pre-service teachers. The appropriate method for instruction within the domain of ICT tools should be based on exposing the learners to innovations, helping them to gain experience, teaching them how to deal with challenges and difficulties on their own, and training them. This study suggests some principles for this approach and describes step by step the implementation of this model.

Keywords: ICT tools, e-learning, pre-service teachers.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1099
7350 Climate Safe House: A Community Housing Project Tackling Catastrophic Sea Level Rise in Coastal Communities

Authors: Chris Fersterer, Col Fay, Tobias Danielmeier, Kat Achterberg, Scott Willis

Abstract:

New Zealand, an island nation, has an extensive coastline peppered with small communities of iconic buildings known as Bachs. Post WWII, these modest buildings were constructed by their owners as retreats and generally were small, low cost, often using recycled material and often they fell below current acceptable building standards. In the latter part of the 20th century, real estate prices in many of these communities remained low and these areas became permanent residences for people attracted to this affordable lifestyle choice. The Blueskin Resilient Communities Trust (BRCT) is an organisation that recognises the vulnerability of communities in low lying settlements as now being prone to increased flood threat brought about by climate change and sea level rise. Some of the inhabitants of Blueskin Bay, Otago, NZ have already found their properties to be un-insurable because of increased frequency of flood events and property values have slumped accordingly. Territorial authorities also acknowledge this increased risk and have created additional compliance measures for new buildings that are less than 2 m above tidal peaks. Community resilience becomes an additional concern where inhabitants are attracted to a lifestyle associated with a specific location and its people when this lifestyle is unable to be met in a suburban or city context. Traditional models of social housing fail to provide the sense of community connectedness and identity enjoyed by the current residents of Blueskin Bay. BRCT have partnered with the Otago Polytechnic Design School to design a new form of community housing that can react to this environmental change. It is a longitudinal project incorporating participatory approaches as a means of getting people ‘on board’, to understand complex systems and co-develop solutions. In the first period, they are seeking industry support and funding to develop a transportable and fully self-contained housing model that exploits current technologies. BRCT also hope that the building will become an educational tool to highlight climate change issues facing us today. This paper uses the Climate Safe House (CSH) as a case study for education in architectural sustainability through experiential learning offered as part of the Otago Polytechnics Bachelor of Design. Students engage with the project with research methodologies, including site surveys, resident interviews, data sourced from government agencies and physical modelling. The process involves collaboration across design disciplines including product and interior design but also includes connections with industry, both within the education institution and stakeholder industries introduced through BRCT. This project offers a rich learning environment where students become engaged through project based learning within a community of practice, including architecture, construction, energy and other related fields. The design outcomes are expressed in a series of public exhibitions and forums where community input is sought in a truly participatory process.

Keywords: Community resilience, problem based learning, project based learning, case study.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 971
7349 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1182
7348 DIVAD: A Dynamic and Interactive Visual Analytical Dashboard for Exploring and Analyzing Transport Data

Authors: Tin Seong Kam, Ketan Barshikar, Shaun Tan

Abstract:

The advances in location-based data collection technologies such as GPS, RFID etc. and the rapid reduction of their costs provide us with a huge and continuously increasing amount of data about movement of vehicles, people and goods in an urban area. This explosive growth of geospatially-referenced data has far outpaced the planner-s ability to utilize and transform the data into insightful information thus creating an adverse impact on the return on the investment made to collect and manage this data. Addressing this pressing need, we designed and developed DIVAD, a dynamic and interactive visual analytics dashboard to allow city planners to explore and analyze city-s transportation data to gain valuable insights about city-s traffic flow and transportation requirements. We demonstrate the potential of DIVAD through the use of interactive choropleth and hexagon binning maps to explore and analyze large taxi-transportation data of Singapore for different geographic and time zones.

Keywords: Geographic Information System (GIS), MovementData, GeoVisual Analytics, Urban Planning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2394
7347 Image Modeling Using Gibbs-Markov Random Field and Support Vector Machines Algorithm

Authors: Refaat M Mohamed, Ayman El-Baz, Aly A. Farag

Abstract:

This paper introduces a novel approach to estimate the clique potentials of Gibbs Markov random field (GMRF) models using the Support Vector Machines (SVM) algorithm and the Mean Field (MF) theory. The proposed approach is based on modeling the potential function associated with each clique shape of the GMRF model as a Gaussian-shaped kernel. In turn, the energy function of the GMRF will be in the form of a weighted sum of Gaussian kernels. This formulation of the GMRF model urges the use of the SVM with the Mean Field theory applied for its learning for estimating the energy function. The approach has been tested on synthetic texture images and is shown to provide satisfactory results in retrieving the synthesizing parameters.

Keywords: Image Modeling, MRF, Parameters Estimation, SVM Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1643
7346 A Game Design Framework for Vocational Education

Authors: Heide Lukosch, Roy Van Bussel, Sebastiaan Meijer

Abstract:

Serious games have proven to be a useful instrument to engage learners and increase motivation. Nevertheless, a broadly accepted, practical instructional design approach to serious games does not exist. In this paper, we introduce the use of an instructional design model that has not been applied to serious games yet, and has some advantages compared to other design approaches. We present the case of mechanics mechatronics education to illustrate the close match with timing and role of knowledge and information that the instructional design model prescribes and how this has been translated to a rigidly structured game design. The structured approach answers the learning needs of applicable knowledge within the target group. It combines advantages of simulations with strengths of entertainment games to foster learner-s motivation in the best possible way. A prototype of the game will be evaluated along a well-respected evaluation method within an advanced test setting including test and control group.

Keywords: Serious Gaming, Simulation, Complex Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1777
7345 Determining Cluster Boundaries Using Particle Swarm Optimization

Authors: Anurag Sharma, Christian W. Omlin

Abstract:

Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.

Keywords: Particle swarm optimization, self-organizing maps, clustering, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1724
7344 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm

Authors: Ameur Abdelkader, Abed Bouarfa Hafida

Abstract:

Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.

Keywords: Predictive analysis, big data, predictive analysis algorithms. CART algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1083
7343 Comparative Quantitative Study on Learning Outcomes of Major Study Groups of an Information and Communication Technology Bachelor Educational Program

Authors: Kari Björn, Mikael Soini

Abstract:

Higher Education system reforms, especially Finnish system of Universities of Applied Sciences in 2014 are discussed. The new steering model is based on major legislative changes, output-oriented funding and open information. The governmental steering reform, especially the financial model and the resulting institutional level responses, such as a curriculum reforms are discussed, focusing especially in engineering programs. The paper is motivated by management need to establish objective steering-related performance indicators and to apply them consistently across all educational programs. The close relationship to governmental steering and funding model imply that internally derived indicators can be directly applied. Metropolia University of Applied Sciences (MUAS) as a case institution is briefly introduced, focusing on engineering education in Information and Communications Technology (ICT), and its related programs. The reform forced consolidation of previously separate smaller programs into fewer units of student application. New curriculum ICT students have a common first year before they apply for a Major. A framework of parallel and longitudinal comparisons is introduced and used across Majors in two campuses. The new externally introduced performance criteria are applied internally on ICT Majors using data ex-ante and ex-post of program merger.  A comparative performance of the Majors after completion of joint first year is established, focusing on previously omitted Majors for completeness of analysis. Some new research questions resulting from transfer of Majors between campuses and quota setting are discussed. Practical orientation identifies best practices to share or targets needing most attention for improvement. This level of analysis is directly applicable at student group and teaching team level, where corrective actions are possible, when identified. The analysis is quantitative and the nature of the corrective actions are not discussed. Causal relationships and factor analysis are omitted, because campuses, their staff and various pedagogical implementation details contain still too many undetermined factors for our limited data. Such qualitative analysis is left for further research. Further study must, however, be guided by the relevance of the observations.

Keywords: Engineering education, integrated curriculum, learning outcomes, performance measurement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 900
7342 An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique

Authors: Ghada A. Alfattni

Abstract:

Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalanced datasets. Three classification models (Logistic Regression, Support Vector Machine and Nearest Neighbour) were tested with multiple datasets, then the same datasets were oversampled by using SMOTE and applied again to the three models to compare the differences in the performances. Results of experiments show that the highest number of nearest neighbours gives lower values of error rates. 

Keywords: Imbalanced datasets, SMOTE, machine learning, logistic regression, support vector machine, nearest neighbour.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1322
7341 A Business-to-Business Collaboration System That Promotes Data Utilization While Encrypting Information on the Blockchain

Authors: Hiroaki Nasu, Ryota Miyamoto, Yuta Kodera, Yasuyuki Nogami

Abstract:

To promote Industry 4.0 and Society 5.0 and so on, it is important to connect and share data so that every member can trust it. Blockchain (BC) technology is currently attracting attention as the most advanced tool and has been used in the financial field and so on. However, the data collaboration using BC has not progressed sufficiently among companies on the supply chain of the manufacturing industry that handle sensitive data such as product quality, manufacturing conditions, etc. There are two main reasons why data utilization is not sufficiently advanced in the industrial supply chain. The first reason is that manufacturing information is top secret and a source for companies to generate profits. It is difficult to disclose data even between companies with transactions in the supply chain. Blockchain mechanism such as Bitcoin using Public Key Infrastructure (PKI) requires plaintext to be shared between companies in order to verify the identity of the company that sent the data. Another reason is that the merits (scenarios) of collaboration data between companies are not specifically specified in the industrial supply chain. For these problems, this paper proposes a Business to Business (B2B) collaboration system using homomorphic encryption and BC technique. Using the proposed system, each company on the supply chain can exchange confidential information on encrypted data and utilize the data for their own business. In addition, this paper considers a scenario focusing on quality data, which was difficult to collaborate because it is top-secret. In this scenario, we show an implementation scheme and a benefit of concrete data collaboration by proposing a comparison protocol that can grasp the change in quality while hiding the numerical value of quality data.

Keywords: Business to business data collaboration, industrial supply chain, blockchain, homomorphic encryption.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 841
7340 An Approximation of Daily Rainfall by Using a Pixel Value Data Approach

Authors: Sarisa Pinkham, Kanyarat Bussaban

Abstract:

The research aims to approximate the amount of daily rainfall by using a pixel value data approach. The daily rainfall maps from the Thailand Meteorological Department in period of time from January to December 2013 were the data used in this study. The results showed that this approach can approximate the amount of daily rainfall with RMSE=3.343.

Keywords: Daily rainfall, Image processing, Approximation, Pixel value data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1766
7339 Automatic Generation of Ontology from Data Source Directed by Meta Models

Authors: Widad Jakjoud, Mohamed Bahaj, Jamal Bakkas

Abstract:

Through this paper we present a method for automatic generation of ontological model from any data source using Model Driven Architecture (MDA), this generation is dedicated to the cooperation of the knowledge engineering and software engineering. Indeed, reverse engineering of a data source generates a software model (schema of data) that will undergo transformations to generate the ontological model. This method uses the meta-models to validate software and ontological models.

Keywords: Meta model, model, ontology, data source.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2004
7338 Temporal Signal Processing by Inference Bayesian Approach for Detection of Abrupt Variation of Statistical Characteristics of Noisy Signals

Authors: Farhad Asadi, Hossein Sadati

Abstract:

In fields such as neuroscience and especially in cognition modeling of mental processes, uncertainty processing in temporal zone of signal is vital. In this paper, Bayesian online inferences in estimation of change-points location in signal are constructed. This method separated the observed signal into independent series and studies the change and variation of the regime of data locally with related statistical characteristics. We give conditions on simulations of the method when the data characteristics of signals vary, and provide empirical evidence to show the performance of method. It is verified that correlation between series around the change point location and its characteristics such as Signal to Noise Ratios and mean value of signal has important factor on fluctuating in finding proper location of change point. And one of the main contributions of this study is related to representing of these influences of signal statistical characteristics for finding abrupt variation in signal. There are two different structures for simulations which in first case one abrupt change in temporal section of signal is considered with variable position and secondly multiple variations are considered. Finally, influence of statistical characteristic for changing the location of change point is explained in details in simulation results with different artificial signals.

Keywords: Time series, fluctuation in statistical characteristics, optimal learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 575
7337 Steps towards the Development of National Health Data Standards in Developing Countries: An Exploratory Qualitative Study in Saudi Arabia

Authors: Abdullah I. Alkraiji, Thomas W. Jackson, Ian R. Murray

Abstract:

The proliferation of health data standards today is somewhat overlapping and conflicting, resulting in market confusion and leading to increasing proprietary interests. The government role and support in standardization for health data are thought to be crucial in order to establish credible standards for the next decade, to maximize interoperability across the health sector, and to decrease the risks associated with the implementation of non-standard systems. The normative literature missed out the exploration of the different steps required to be undertaken by the government towards the development of national health data standards. Based on the lessons learned from a qualitative study investigating the different issues to the adoption of health data standards in the major tertiary hospitals in Saudi Arabia and the opinions and feedback from different experts in the areas of data exchange and standards and medical informatics in Saudi Arabia and UK, a list of steps required towards the development of national health data standards was constructed. Main steps are the existence of: a national formal reference for health data standards, an agreed national strategic direction for medical data exchange, a national medical information management plan and a national accreditation body, and more important is the change management at the national and organizational level. The outcome of this study can be used by academics and practitioners to develop the planning of health data standards, and in particular those in developing countries.

Keywords: Interoperability, Case Study, Health Data Standards, Medical Data Exchange, Saudi Arabia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2008
7336 Test Data Compression Using a Hybrid of Bitmask Dictionary and 2n Pattern Runlength Coding Methods

Authors: C. Kalamani, K. Paramasivam

Abstract:

In VLSI, testing plays an important role. Major problem in testing are test data volume and test power. The important solution to reduce test data volume and test time is test data compression. The Proposed technique combines the bit maskdictionary and 2n pattern run length-coding method and provides a substantial improvement in the compression efficiency without introducing any additional decompression penalty. This method has been implemented using Mat lab and HDL Language to reduce test data volume and memory requirements. This method is applied on various benchmark test sets and compared the results with other existing methods. The proposed technique can achieve a compression ratio up to 86%.

Keywords: Bit Mask dictionary, 2n pattern run length code, system-on-chip, SOC, test data compression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1933
7335 Causal Relation Identification Using Convolutional Neural Networks and Knowledge Based Features

Authors: Tharini N. de Silva, Xiao Zhibo, Zhao Rui, Mao Kezhi

Abstract:

Causal relation identification is a crucial task in information extraction and knowledge discovery. In this work, we present two approaches to causal relation identification. The first is a classification model trained on a set of knowledge-based features. The second is a deep learning based approach training a model using convolutional neural networks to classify causal relations. We experiment with several different convolutional neural networks (CNN) models based on previous work on relation extraction as well as our own research. Our models are able to identify both explicit and implicit causal relations as well as the direction of the causal relation. The results of our experiments show a higher accuracy than previously achieved for causal relation identification tasks.

Keywords: Causal relation identification, convolutional neural networks, natural Language Processing, Machine Learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2269
7334 A Hybrid Data Mining Method for the Medical Classification of Chest Pain

Authors: Sung Ho Ha, Seong Hyeon Joo

Abstract:

Data mining techniques have been used in medical research for many years and have been known to be effective. In order to solve such problems as long-waiting time, congestion, and delayed patient care, faced by emergency departments, this study concentrates on building a hybrid methodology, combining data mining techniques such as association rules and classification trees. The methodology is applied to real-world emergency data collected from a hospital and is evaluated by comparing with other techniques. The methodology is expected to help physicians to make a faster and more accurate classification of chest pain diseases.

Keywords: Data mining, medical decisions, medical domainknowledge, chest pain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2234
7333 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Soo-Hyeon Jeon, Byeoung Kug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including large volumes of unstructured data and text have been created because of the rapid increase in the use of social media and the Internet. Usually, these documents are categorized for the convenience of users. Because the accuracy of manual categorization is not guaranteed, and such categorization requires a large amount of time and incurs huge costs. Many studies on automatic categorization have been conducted to help mitigate the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorize complex documents with multiple topics because they work on the assumption that individual documents can be categorized into single categories only. Therefore, to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, the learning process employed in these studies involves training using a multi-categorized document set. These methods therefore cannot be applied to the multi-categorization of most documents unless multi-categorized training sets using traditional multi-categorization algorithms are provided. To overcome this limitation, in this study, we review our novel methodology for extending the category of a single-categorized document to multiple categorizes, and then introduce a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: Big Data Analysis, Document Classification, Text Mining, Topic Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1751
7332 Knowledge Discovery and Data Mining Techniques in Textile Industry

Authors: Filiz Ersoz, Taner Ersoz, Erkin Guler

Abstract:

This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.

Keywords: Data mining, textile production, decision trees, classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1544
7331 Game based Learning to Enhance Cognitive and Physical Capabilities of Elderly People: Concepts and Requirements

Authors: Aurelie Aurilla Bechina Arntzen

Abstract:

The last decade has seen an early majority of people The last decade, the role of the of the information communication technologies has increased in improving the social and business life of people. Today, it is recognized that game could contribute to enhance virtual rehabilitation by better engaging patients. Our research study aims to develop a game based system enhancing cognitive and physical capabilities of elderly people. To this end, the project aims to develop a low cost hand held system based on existing game such as Wii, PSP, or Xbox. This paper discusses the concepts and requirements for developing such game for elderly people. Based on the requirement elicitation, we intend to develop a prototype related to sport and dance activities.

Keywords: Elderly people, Game based learning system, Health systems, rehabilitation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2524