Search results for: Data integration
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7914

Search results for: Data integration

7464 Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering

Authors: Yogita, Durga Toshniwal

Abstract:

Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.

Keywords: Concept Evolution, Irrelevant Attributes, Streaming Data, Unsupervised Outlier Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2598
7463 The Effect of Measurement Distribution on System Identification and Detection of Behavior of Nonlinearities of Data

Authors: Mohammad Javad Mollakazemi, Farhad Asadi, Aref Ghafouri

Abstract:

In this paper, we considered and applied parametric modeling for some experimental data of dynamical system. In this study, we investigated the different distribution of output measurement from some dynamical systems. Also, with variance processing in experimental data we obtained the region of nonlinearity in experimental data and then identification of output section is applied in different situation and data distribution. Finally, the effect of the spanning the measurement such as variance to identification and limitation of this approach is explained.

Keywords: Gaussian process, Nonlinearity distribution, Particle filter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1676
7462 Exponentially Weighted Simultaneous Estimation of Several Quantiles

Authors: Valeriy Naumov, Olli Martikainen

Abstract:

In this paper we propose new method for simultaneous generating multiple quantiles corresponding to given probability levels from data streams and massive data sets. This method provides a basis for development of single-pass low-storage quantile estimation algorithms, which differ in complexity, storage requirement and accuracy. We demonstrate that such algorithms may perform well even for heavy-tailed data.

Keywords: Quantile estimation, data stream, heavy-taileddistribution, tail index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490
7461 Enhanced Data Access Control of Cooperative Environment used for DMU Based Design

Authors: Wei Lifan, Zhang Huaiyu, Yang Yunbin, Li Jia

Abstract:

Through the analysis of the process digital design based on digital mockup, the fact indicates that a distributed cooperative supporting environment is the foundation conditions to adopt design approach based on DMU. Data access authorization is concerned firstly because the value and sensitivity of the data for the enterprise. The access control for administrators is often rather weak other than business user. So authors established an enhanced system to avoid the administrators accessing the engineering data by potential approach and without authorization. Thus the data security is improved.

Keywords: access control, DMU, PLM, virtual prototype.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1432
7460 Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process

Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek

Abstract:

Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.

Keywords: Die-Map Clustering, Feature Extraction, Pattern Recognition, Semiconductor Manufacturing Process.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3107
7459 Speed Characteristics of Mixed Traffic Flow on Urban Arterials

Authors: Ashish Dhamaniya, Satish Chandra

Abstract:

Speed and traffic volume data are collected on different sections of four lane and six lane roads in three metropolitan cities in India. Speed data are analyzed to fit the statistical distribution to individual vehicle speed data and all vehicles speed data. It is noted that speed data of individual vehicle generally follows a normal distribution but speed data of all vehicle combined at a section of urban road may or may not follow the normal distribution depending upon the composition of traffic stream. A new term Speed Spread Ratio (SSR) is introduced in this paper which is the ratio of difference in 85th and 50th percentile speed to the difference in 50th and 15th percentile speed. If SSR is unity then speed data are truly normally distributed. It is noted that on six lane urban roads, speed data follow a normal distribution only when SSR is in the range of 0.86 – 1.11. The range of SSR is validated on four lane roads also.

Keywords: Normal distribution, percentile speed, speed spread ratio, traffic volume.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4203
7458 Community-Based Destination Sustainable Development: Case of Cicada Walking Street, Hua Hin, Thailand

Authors: Pongsiri Kingkan

Abstract:

This paper aims to study the role and activities of the participants and the impact of activities created in the local area in order to sustainably develop the local areas. This study applied both qualitative and quantitative approaches presented in descriptive style; the data was collected via survey, observation and in-depth interviews with samples. The results illustrated five sorts of roles of participants of the Cicada Walking-street and four types of creative activities; recreation based, art based, cultural based, and live events. Integration of local characteristics, arts and cultures were presented creatively and interestingly. Participants are various. The roles of the participants found in the Cicada Market are group of the property and area management, entrepreneurs, leisure (entertaining persons), local people, and tourists. The good impacts on local communities are those in terms of economy, environmental friendly and local arts and cultures promoting. On the other hand, the traffic congestion, waste and the increasing of energy consumption are negative impacts from area development.

Keywords: Creative Tourism Activity, Destination Development, Sustainable Development, Walking Street.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1699
7457 A Comparative Study between Discrete Wavelet Transform and Maximal Overlap Discrete Wavelet Transform for Testing Stationarity

Authors: Amel Abdoullah Ahmed Dghais, Mohd Tahir Ismail

Abstract:

In this paper the core objective is to apply discrete wavelet transform and maximal overlap discrete wavelet transform functions namely Haar, Daubechies2, Symmlet4, Coiflet2 and discrete approximation of the Meyer wavelets in non stationary financial time series data from Dow Jones index (DJIA30) of US stock market. The data consists of 2048 daily data of closing index from December 17, 2004 to October 23, 2012. Unit root test affirms that the data is non stationary in the level. A comparison between the results to transform non stationary data to stationary data using aforesaid transforms is given which clearly shows that the decomposition stock market index by discrete wavelet transform is better than maximal overlap discrete wavelet transform for original data.

Keywords: Discrete wavelet transform, maximal overlap discrete wavelet transform, stationarity, autocorrelation function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4684
7456 Comparative Study of Transformed and Concealed Data in Experimental Designs and Analyses

Authors: K. Chinda, P. Luangpaiboon

Abstract:

This paper presents the comparative study of coded data methods for finding the benefit of concealing the natural data which is the mercantile secret. Influential parameters of the number of replicates (rep), treatment effects (τ) and standard deviation (σ) against the efficiency of each transformation method are investigated. The experimental data are generated via computer simulations under the specified condition of the process with the completely randomized design (CRD). Three ways of data transformation consist of Box-Cox, arcsine and logit methods. The difference values of F statistic between coded data and natural data (Fc-Fn) and hypothesis testing results were determined. The experimental results indicate that the Box-Cox results are significantly different from natural data in cases of smaller levels of replicates and seem to be improper when the parameter of minus lambda has been assigned. On the other hand, arcsine and logit transformations are more robust and obviously, provide more precise numerical results. In addition, the alternate ways to select the lambda in the power transformation are also offered to achieve much more appropriate outcomes.

Keywords: Experimental Designs, Box-Cox, Arcsine, Logit Transformations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1582
7455 A First Course in Numerical Methods with “Mathematica“

Authors: Andrei A. Kolyshkin

Abstract:

In the present paper some recommendations for the use of software package “Mathematica" in a basic numerical analysis course are presented. The methods which are covered in the course include solution of systems of linear equations, nonlinear equations and systems of nonlinear equations, numerical integration, interpolation and solution of ordinary differential equations. A set of individual assignments developed for the course covering all the topics is discussed in detail.

Keywords: Numerical methods, "Mathematica", e-learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3625
7454 Design of a Low Cost Motion Data Acquisition Setup for Mechatronic Systems

Authors: Barış Can Yalçın

Abstract:

Motion sensors have been commonly used as a valuable component in mechatronic systems, however, many mechatronic designs and applications that need motion sensors cost enormous amount of money, especially high-tech systems. Design of a software for communication protocol between data acquisition card and motion sensor is another issue that has to be solved. This study presents how to design a low cost motion data acquisition setup consisting of MPU 6050 motion sensor (gyro and accelerometer in 3 axes) and Arduino Mega2560 microcontroller. Design parameters are calibration of the sensor, identification and communication between sensor and data acquisition card, interpretation of data collected by the sensor.

Keywords: Calibration of sensors, data acquisition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4304
7453 Conceptual Multidimensional Model

Authors: Manpreet Singh, Parvinder Singh, Suman

Abstract:

The data is available in abundance in any business organization. It includes the records for finance, maintenance, inventory, progress reports etc. As the time progresses, the data keep on accumulating and the challenge is to extract the information from this data bank. Knowledge discovery from these large and complex databases is the key problem of this era. Data mining and machine learning techniques are needed which can scale to the size of the problems and can be customized to the application of business. For the development of accurate and required information for particular problem, business analyst needs to develop multidimensional models which give the reliable information so that they can take right decision for particular problem. If the multidimensional model does not possess the advance features, the accuracy cannot be expected. The present work involves the development of a Multidimensional data model incorporating advance features. The criterion of computation is based on the data precision and to include slowly change time dimension. The final results are displayed in graphical form.

Keywords: Multidimensional, data precision.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1418
7452 Real Time Approach for Data Placement in Wireless Sensor Networks

Authors: Sanjeev Gupta, Mayank Dave

Abstract:

The issue of real-time and reliable report delivery is extremely important for taking effective decision in a real world mission critical Wireless Sensor Network (WSN) based application. The sensor data behaves differently in many ways from the data in traditional databases. WSNs need a mechanism to register, process queries, and disseminate data. In this paper we propose an architectural framework for data placement and management. We propose a reliable and real time approach for data placement and achieving data integrity using self organized sensor clusters. Instead of storing information in individual cluster heads as suggested in some protocols, in our architecture we suggest storing of information of all clusters within a cell in the corresponding base station. For data dissemination and action in the wireless sensor network we propose to use Action and Relay Stations (ARS). To reduce average energy dissipation of sensor nodes, the data is sent to the nearest ARS rather than base station. We have designed our architecture in such a way so as to achieve greater energy savings, enhanced availability and reliability.

Keywords: Cluster head, data reliability, real time communication, wireless sensor networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1777
7451 Problems of Innovation Development of Wireless Data Transfer Branch in the Cellular Market of Kazakhstan

Authors: Yessengeldy Kuanyshpayev

Abstract:

Now in some countries of the world the cellular market is on the point of saturation, in others - positive dynamics of development kept on. The reasons for it are also different, but there are united by their general susceptibility to innovation changes, if they are really innovative. If to take as an example the cellular market of Kazakhstan it is defined by the low percent of smart phones at consumers, the low population density, undercapacity of the 3G channel, and absence of universal access to the LTE technology that limits dynamical growth of this branch. These moments are aggravated by failures of starting commercial projects by private companies which prevent to be implemented and widely adopted to a new product among consumers. The object of the research is possible integration of wireless and program technologies at which introduction the idea can regenerate in an innovation. The analysis of existing projects in the market and the possible union of the technologies through a prism of theoretical bases of innovative activity shows that efficiency of the company by development and introduction of innovations is possible only thanks to strict observance of all terms and conditions of the innovative process which main term is profit. Despite that fact that on a global scale the innovativeness issue of companies is very popular, there are no researches about possibility of innovative breaks in the field of wireless access to the Internet in the cellular market of Kazakhstan.

Keywords: Cellular market, commercialization, innovation, the effectiveness of company.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2041
7450 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: A classifier, Algorithms decision tree, knowledge extraction, Support Vector Machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1836
7449 A Software Framework for Predicting Oil-Palm Yield from Climate Data

Authors: Mohd. Noor Md. Sap, A. Majid Awan

Abstract:

Intelligent systems based on machine learning techniques, such as classification, clustering, are gaining wide spread popularity in real world applications. This paper presents work on developing a software system for predicting crop yield, for example oil-palm yield, from climate and plantation data. At the core of our system is a method for unsupervised partitioning of data for finding spatio-temporal patterns in climate data using kernel methods which offer strength to deal with complex data. This work gets inspiration from the notion that a non-linear data transformation into some high dimensional feature space increases the possibility of linear separability of the patterns in the transformed space. Therefore, it simplifies exploration of the associated structure in the data. Kernel methods implicitly perform a non-linear mapping of the input data into a high dimensional feature space by replacing the inner products with an appropriate positive definite function. In this paper we present a robust weighted kernel k-means algorithm incorporating spatial constraints for clustering the data. The proposed algorithm can effectively handle noise, outliers and auto-correlation in the spatial data, for effective and efficient data analysis by exploring patterns and structures in the data, and thus can be used for predicting oil-palm yield by analyzing various factors affecting the yield.

Keywords: Pattern analysis, clustering, kernel methods, spatial data, crop yield

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1937
7448 The Link between Unemployment and Inflation Using Johansen’s Co-Integration Approach and Vector Error Correction Modelling

Authors: Sagaren Pillay

Abstract:

In this paper bi-annual time series data on unemployment rates (from the Labour Force Survey) are expanded to quarterly rates and linked to quarterly unemployment rates (from the Quarterly Labour Force Survey). The resultant linked series and the consumer price index (CPI) series are examined using Johansen’s cointegration approach and vector error correction modeling. The study finds that both the series are integrated of order one and are cointegrated. A statistically significant co-integrating relationship is found to exist between the time series of unemployment rates and the CPI. Given this significant relationship, the study models this relationship using Vector Error Correction Models (VECM), one with a restriction on the deterministic term and the other with no restriction.

A formal statistical confirmation of the existence of a unique linear and lagged relationship between inflation and unemployment for the period between September 2000 and June 2011 is presented. For the given period, the CPI was found to be an unbiased predictor of the unemployment rate. This relationship can be explored further for the development of appropriate forecasting models incorporating other study variables.

Keywords: Forecasting, lagged, linear, relationship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2510
7447 Development of Manufacturing Simulation Model for Semiconductor Fabrication

Authors: Syahril Ridzuan Ab Rahim, Ibrahim Ahmad, Mohd Azizi Chik, Ahmad Zafir Md. Rejab, and U. Hashim

Abstract:

This research presents the development of simulation modeling for WIP management in semiconductor fabrication. Manufacturing simulation modeling is needed for productivity optimization analysis due to the complex process flows involved more than 35 percent re-entrance processing steps more than 15 times at same equipment. Furthermore, semiconductor fabrication required to produce high product mixed with total processing steps varies from 300 to 800 steps and cycle time between 30 to 70 days. Besides the complexity, expansive wafer cost that potentially impact the company profits margin once miss due date is another motivation to explore options to experiment any analysis using simulation modeling. In this paper, the simulation model is developed using existing commercial software platform AutoSched AP, with customized integration with Manufacturing Execution Systems (MES) and Advanced Productivity Family (APF) for data collections used to configure the model parameters and data source. Model parameters such as processing steps cycle time, equipment performance, handling time, efficiency of operator are collected through this customization. Once the parameters are validated, few customizations are made to ensure the prior model is executed. The accuracy for the simulation model is validated with the actual output per day for all equipments. The comparison analysis from result of the simulation model compared to actual for achieved 95 percent accuracy for 30 days. This model later was used to perform various what if analysis to understand impacts on cycle time and overall output. By using this simulation model, complex manufacturing environment like semiconductor fabrication (fab) now have alternative source of validation for any new requirements impact analysis.

Keywords: Advanced Productivity Family (APF), Complementary Metal Oxide Semiconductor (CMOS), Manufacturing Execution Systems (MES), Work In Progress (WIP).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3158
7446 A Proposal for U-City (Smart City) Service Method Using Real-Time Digital Map

Authors: SangWon Han, MuWook Pyeon, Sujung Moon, DaeKyo Seo

Abstract:

Recently, technologies based on three-dimensional (3D) space information are being developed and quality of life is improving as a result. Research on real-time digital map (RDM) is being conducted now to provide 3D space information. RDM is a service that creates and supplies 3D space information in real time based on location/shape detection. Research subjects on RDM include the construction of 3D space information with matching image data, complementing the weaknesses of image acquisition using multi-source data, and data collection methods using big data. Using RDM will be effective for space analysis using 3D space information in a U-City and for other space information utilization technologies.

Keywords: RDM, multi-source data, big data, U-City.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 777
7445 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects.  Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: Data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1446
7444 Energy Efficient Resource Allocation and Scheduling in Cloud Computing Platform

Authors: Shuen-Tai Wang, Ying-Chuan Chen, Yu-Ching Lin

Abstract:

There has been renewal of interest in the relation between Green IT and cloud computing in recent years. Cloud computing has to be a highly elastic environment which provides stable services to users. The growing use of cloud computing facilities has caused marked energy consumption, putting negative pressure on electricity cost of computing center or data center. Each year more and more network devices, storages and computers are purchased and put to use, but it is not just the number of computers that is driving energy consumption upward. We could foresee that the power consumption of cloud computing facilities will double, triple, or even more in the next decade. This paper aims at resource allocation and scheduling technologies that are short of or have not well developed yet to reduce energy utilization in cloud computing platform. In particular, our approach relies on recalling services dynamically onto appropriate amount of the machines according to user’s requirement and temporarily shutting down the machines after finish in order to conserve energy. We present initial work on integration of resource and power management system that focuses on reducing power consumption such that they suffice for meeting the minimizing quality of service required by the cloud computing platform.

Keywords: Cloud computing, energy utilization, power consumption, resource allocation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1407
7443 Distributed Data-Mining by Probability-Based Patterns

Authors: M. Kargar, F. Gharbalchi

Abstract:

In this paper a new method is suggested for distributed data-mining by the probability patterns. These patterns use decision trees and decision graphs. The patterns are cared to be valid, novel, useful, and understandable. Considering a set of functions, the system reaches to a good pattern or better objectives. By using the suggested method we will be able to extract the useful information from massive and multi-relational data bases.

Keywords: Data-mining, Decision tree, Decision graph, Pattern, Relationship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1516
7442 Trade Openness and Its Effects on Economic Growth in Selected South Asian Countries: A Panel Data Study

Authors: Samra Bajwa, Muhammad W. Siddiqi

Abstract:

The study investigates the causal link between trade openness and economic growth for four South Asian countries for period 1972-1985 and 1986-2007 to examine the scenario before and after the implementation of SAARC. Panel cointegration and FMOLS techniques are employed for short run and long run estimates. In 1972-85 short run unidirectional causality from GDP to openness is found whereas, in 1986-2007 there exists bi-directional causality between GDP and openness. The long run elasticity magnitude between GDP and openness contains negative sign in 1972-85 which shows that there exists long run negative relationship. While in time period 1986-2007 the elasticity magnitude has positive sign that indicates positive causation between GDP and openness. So it can be concluded that after the implementation of SAARC overall situation of selected countries got better. Also long run coefficient of error term suggests that short term equilibrium adjustments are driven by adjustment back to long run equilibrium.

Keywords: Causality, Economic Growth, Panel Co-integration, SAARC, Trade Openness.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2605
7441 K-Means for Spherical Clusters with Large Variance in Sizes

Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in sizes. In this paper, we introduce a competent procedure to overcome this problem. The proposed method is based on shifting the center of the large cluster toward the small cluster, and recomputing the membership of small cluster points, the experimental results reveal that the proposed algorithm produces satisfactory results.

Keywords: K-Means, Data Clustering, Cluster Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3251
7440 Representing Data without Lost Compression Properties in Time Series: A Review

Authors: Nabilah Filzah Mohd Radzuan, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan

Abstract:

Uncertain data is believed to be an important issue in building up a prediction model. The main objective in the time series uncertainty analysis is to formulate uncertain data in order to gain knowledge and fit low dimensional model prior to a prediction task. This paper discusses the performance of a number of techniques in dealing with uncertain data specifically those which solve uncertain data condition by minimizing the loss of compression properties.

Keywords: Compression properties, uncertainty, uncertain time series, mining technique, weather prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1586
7439 Are XBRL-based Financial Reports Better than Non-XBRL Reports? A Quality Assessment

Authors: Zhenkun Wang, Simon S. Gao

Abstract:

Using a scoring system, this paper provides a comparative assessment of the quality of data between XBRL formatted financial reports and non-XBRL financial reports. It shows a major improvement in the quality of data of XBRL formatted financial reports. Although XBRL formatted financial reports do not show much advantage in the quality at the beginning, XBRL financial reports lately display a large improvement in the quality of data in almost all aspects. With the improved XBRL web data managing, presentation and analysis applications, XBRL formatted financial reports have a much better accessibility, are more accurate and better in timeliness.

Keywords: Data Quality; Financial Report; Information; XBRL

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2512
7438 Modeling of Random Variable with Digital Probability Hyper Digraph: Data-Oriented Approach

Authors: A. Habibizad Navin, M. Naghian Fesharaki, M. Mirnia, M. Kargar

Abstract:

In this paper we introduce Digital Probability Hyper Digraph for modeling random variable as the hierarchical data-oriented model.

Keywords: Data-Oriented Models, Data Structure, DigitalProbability Hyper Digraph, Random Variable, Statistic andProbability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1233
7437 Wireless Transmission of Big Data Using Novel Secure Algorithm

Authors: K. Thiagarajan, K. Saranya, A. Veeraiah, B. Sudha

Abstract:

This paper presents a novel algorithm for secure, reliable and flexible transmission of big data in two hop wireless networks using cooperative jamming scheme. Two hop wireless networks consist of source, relay and destination nodes. Big data has to transmit from source to relay and from relay to destination by deploying security in physical layer. Cooperative jamming scheme determines transmission of big data in more secure manner by protecting it from eavesdroppers and malicious nodes of unknown location. The novel algorithm that ensures secure and energy balance transmission of big data, includes selection of data transmitting region, segmenting the selected region, determining probability ratio for each node (capture node, non-capture and eavesdropper node) in every segment, evaluating the probability using binary based evaluation. If it is secure transmission resume with the two- hop transmission of big data, otherwise prevent the attackers by cooperative jamming scheme and transmit the data in two-hop transmission.

Keywords: Big data, cooperative jamming, energy balance, physical layer, two-hop transmission, wireless security.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2152
7436 Exploring the Potential of Chatbots in Higher Education: A Preliminary Study

Authors: S. Studente, S. Ellis, S. F. Garivaldis

Abstract:

We report upon a study introducing a chatbot to develop learning communities at a London University, with a largely international student base. The focus of the chatbot was twofold; to ease the transition for students into their first year of university study, and to increase study engagement. Four learning communities were created using the chatbot; level 3 foundation, level 4 undergraduate, level 6 undergraduate and level 7 post-graduate. Students and programme leaders were provided with access to the chat bot via mobile app prior to their study induction and throughout the autumn term of 2019. At the end of the term, data were collected via questionnaires and focus groups with students and teaching staff to allow for identification of benefits and challenges. Findings indicated a positive correlation between study engagement and engagement with peers. Students reported that the chatbot enabled them to obtain support and connect to their programme leader. Both staff and students also made recommendation on how engagement could be further enhanced using the bot in terms of; clearly specified purpose, integration with existing university systems, leading by example and connectivity. Extending upon these recommendations, a second pilot study is planned for September 2020, for which the focus will be upon improving attendance rates, student satisfaction and module pass rates.

Keywords: Chatbot, e-learning, learning communities, student engagement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1583
7435 Study of Efficiency and Capability LZW++ Technique in Data Compression

Authors: Yusof. Mohd Kamir, Mat Deris. Mohd Sufian, Abidin. Ahmad Faisal Amri

Abstract:

The purpose of this paper is to show efficiency and capability LZWµ in data compression. The LZWµ technique is enhancement from existing LZW technique. The modification the existing LZW is needed to produce LZWµ technique. LZW read one by one character at one time. Differ with LZWµ technique, where the LZWµ read three characters at one time. This paper focuses on data compression and tested efficiency and capability LZWµ by different data format such as doc type, pdf type and text type. Several experiments have been done by different types of data format. The results shows LZWµ technique is better compared to existing LZW technique in term of file size.

Keywords: Data Compression, Huffman Encoding, LZW, LZWµ, RLL, Size.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2050