Search results for: Data science
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7710

Search results for: Data science

7440 Data Mining Classification Methods Applied in Drug Design

Authors: Mária Stachová, Lukáš Sobíšek

Abstract:

Data mining incorporates a group of statistical methods used to analyze a set of information, or a data set. It operates with models and algorithms, which are powerful tools with the great potential. They can help people to understand the patterns in certain chunk of information so it is obvious that the data mining tools have a wide area of applications. For example in the theoretical chemistry data mining tools can be used to predict moleculeproperties or improve computer-assisted drug design. Classification analysis is one of the major data mining methodologies. The aim of thecontribution is to create a classification model, which would be able to deal with a huge data set with high accuracy. For this purpose logistic regression, Bayesian logistic regression and random forest models were built using R software. TheBayesian logistic regression in Latent GOLD software was created as well. These classification methods belong to supervised learning methods. It was necessary to reduce data matrix dimension before construct models and thus the factor analysis (FA) was used. Those models were applied to predict the biological activity of molecules, potential new drug candidates.

Keywords: data mining, classification, drug design, QSAR

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2805
7439 EPR Hiding in Medical Images for Telemedicine

Authors: K. A. Navas, S. Archana Thampy, M. Sasikumar

Abstract:

Medical image data hiding has strict constrains such as high imperceptibility, high capacity and high robustness. Achieving these three requirements simultaneously is highly cumbersome. Some works have been reported in the literature on data hiding, watermarking and stegnography which are suitable for telemedicine applications. None is reliable in all aspects. Electronic Patient Report (EPR) data hiding for telemedicine demand it blind and reversible. This paper proposes a novel approach to blind reversible data hiding based on integer wavelet transform. Experimental results shows that this scheme outperforms the prior arts in terms of zero BER (Bit Error Rate), higher PSNR (Peak Signal to Noise Ratio), and large EPR data embedding capacity with WPSNR (Weighted Peak Signal to Noise Ratio) around 53 dB, compared with the existing reversible data hiding schemes.

Keywords: Biomedical imaging, Data security, Datacommunication, Teleconferencing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2704
7438 A Robust Method for Encrypted Data Hiding Technique Based on Neighborhood Pixels Information

Authors: Ali Shariq Imran, M. Younus Javed, Naveed Sarfraz Khattak

Abstract:

This paper presents a novel method for data hiding based on neighborhood pixels information to calculate the number of bits that can be used for substitution and modified Least Significant Bits technique for data embedding. The modified solution is independent of the nature of the data to be hidden and gives correct results along with un-noticeable image degradation. The technique, to find the number of bits that can be used for data hiding, uses the green component of the image as it is less sensitive to human eye and thus it is totally impossible for human eye to predict whether the image is encrypted or not. The application further encrypts the data using a custom designed algorithm before embedding bits into image for further security. The overall process consists of three main modules namely embedding, encryption and extraction cm.

Keywords: Data hiding, image processing, information security, stagonography.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2307
7437 Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering

Authors: Yogita, Durga Toshniwal

Abstract:

Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.

Keywords: Concept Evolution, Irrelevant Attributes, Streaming Data, Unsupervised Outlier Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2597
7436 The Effect of Measurement Distribution on System Identification and Detection of Behavior of Nonlinearities of Data

Authors: Mohammad Javad Mollakazemi, Farhad Asadi, Aref Ghafouri

Abstract:

In this paper, we considered and applied parametric modeling for some experimental data of dynamical system. In this study, we investigated the different distribution of output measurement from some dynamical systems. Also, with variance processing in experimental data we obtained the region of nonlinearity in experimental data and then identification of output section is applied in different situation and data distribution. Finally, the effect of the spanning the measurement such as variance to identification and limitation of this approach is explained.

Keywords: Gaussian process, Nonlinearity distribution, Particle filter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1675
7435 The Relationship between Competency-Based Learning and Learning Efficiency of Media Communication Students at Suan Sunandha Rajabhat University

Authors: Somtop Keawchuer

Abstract:

This research aims to study (1) the relationship between competency-based learning and learning efficiency of new media communication students at Suan Sunandha University (2) the demographic factor effect on learning efficiency of students at Suan Sunandha University. This research method will use quantitative research; data was collected by questionnaires distributed to students from new media communication in management science faculty of Suan Sunandha Rajabhat University for 1340 sample by purposive sampling method. Data was analyzed by descriptive statistic including percentage, mean, standard deviation and inferential statistic including T-test, ANOVA and Pearson correlation for hypothesis testing. The results showed that the competency-based learning in term of ability to communicate, ability to think and solve the problem, life skills and ability to use technology has a significant relationship with learning efficiency in term of the cognitive domain, psychomotor domain and affective domain at the 0.05 level and which is in harmony with the research hypotheses.

Keywords: Competency-based learning, learning efficiency, new media communication students, Suan Sunandha Rajabhat University.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1133
7434 Exponentially Weighted Simultaneous Estimation of Several Quantiles

Authors: Valeriy Naumov, Olli Martikainen

Abstract:

In this paper we propose new method for simultaneous generating multiple quantiles corresponding to given probability levels from data streams and massive data sets. This method provides a basis for development of single-pass low-storage quantile estimation algorithms, which differ in complexity, storage requirement and accuracy. We demonstrate that such algorithms may perform well even for heavy-tailed data.

Keywords: Quantile estimation, data stream, heavy-taileddistribution, tail index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490
7433 Synchronous Courses Attendance in Distance Higher Education: Case Study of a Computer Science Department

Authors: Thierry Eude

Abstract:

The use of videoconferencing platforms adapted to teaching offers students the opportunity to take distance education courses in much the same way as traditional in-class training. The sessions can be recorded and they allow students the option of following the courses synchronously or asynchronously. Three typical profiles can then be distinguished: students who choose to follow the courses synchronously, students who could attend the course in synchronous mode but choose to follow the session off-line, and students who follow the course asynchronously as they cannot attend the course when it is offered because of professional or personal constraints. Our study consists of observing attendance at all distance education courses offered in the synchronous mode by the Computer Science and Software Engineering Department at Laval University during 10 consecutive semesters. The aim is to identify factors that influence students in their choice of attending the distance courses in synchronous mode. It was found that participation tends to be relatively stable over the years for any one semester (fall, winter summer) and is similar from one course to another, although students may be increasingly familiar with the synchronous distance education courses. Average participation is around 28%. There may be deviations, but they concern only a few courses during certain semesters, suggesting that these deviations would only have occurred because of the composition of particular promotions during specific semesters. Furthermore, course schedules have a great influence on the attendance rate. The highest rates are all for courses which are scheduled outside office hours.

Keywords: Attendance, distance undergraduate education in computer science, student behavior, synchronous e-learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1397
7432 Validation Testing for Temporal Neural Networks for RBF Recognition

Authors: Khaled E. A. Negm

Abstract:

A neuron can emit spikes in an irregular time basis and by averaging over a certain time window one would ignore a lot of information. It is known that in the context of fast information processing there is no sufficient time to sample an average firing rate of the spiking neurons. The present work shows that the spiking neurons are capable of computing the radial basis functions by storing the relevant information in the neurons' delays. One of the fundamental findings of the this research also is that when using overlapping receptive fields to encode the data patterns it increases the network-s clustering capacity. The clustering algorithm that is discussed here is interesting from computer science and neuroscience point of view as well as from a perspective.

Keywords: Temporal Neurons, RBF Recognition, Perturbation, On Line Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1446
7431 Enhanced Data Access Control of Cooperative Environment used for DMU Based Design

Authors: Wei Lifan, Zhang Huaiyu, Yang Yunbin, Li Jia

Abstract:

Through the analysis of the process digital design based on digital mockup, the fact indicates that a distributed cooperative supporting environment is the foundation conditions to adopt design approach based on DMU. Data access authorization is concerned firstly because the value and sensitivity of the data for the enterprise. The access control for administrators is often rather weak other than business user. So authors established an enhanced system to avoid the administrators accessing the engineering data by potential approach and without authorization. Thus the data security is improved.

Keywords: access control, DMU, PLM, virtual prototype.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1431
7430 Pattern Recognition Using Feature Based Die-Map Clusteringin the Semiconductor Manufacturing Process

Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek

Abstract:

Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.

Keywords: Die-Map Clustering, Feature Extraction, Pattern Recognition, Semiconductor Manufacturing Process.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3105
7429 Speed Characteristics of Mixed Traffic Flow on Urban Arterials

Authors: Ashish Dhamaniya, Satish Chandra

Abstract:

Speed and traffic volume data are collected on different sections of four lane and six lane roads in three metropolitan cities in India. Speed data are analyzed to fit the statistical distribution to individual vehicle speed data and all vehicles speed data. It is noted that speed data of individual vehicle generally follows a normal distribution but speed data of all vehicle combined at a section of urban road may or may not follow the normal distribution depending upon the composition of traffic stream. A new term Speed Spread Ratio (SSR) is introduced in this paper which is the ratio of difference in 85th and 50th percentile speed to the difference in 50th and 15th percentile speed. If SSR is unity then speed data are truly normally distributed. It is noted that on six lane urban roads, speed data follow a normal distribution only when SSR is in the range of 0.86 – 1.11. The range of SSR is validated on four lane roads also.

Keywords: Normal distribution, percentile speed, speed spread ratio, traffic volume.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4202
7428 A Comparative Study between Discrete Wavelet Transform and Maximal Overlap Discrete Wavelet Transform for Testing Stationarity

Authors: Amel Abdoullah Ahmed Dghais, Mohd Tahir Ismail

Abstract:

In this paper the core objective is to apply discrete wavelet transform and maximal overlap discrete wavelet transform functions namely Haar, Daubechies2, Symmlet4, Coiflet2 and discrete approximation of the Meyer wavelets in non stationary financial time series data from Dow Jones index (DJIA30) of US stock market. The data consists of 2048 daily data of closing index from December 17, 2004 to October 23, 2012. Unit root test affirms that the data is non stationary in the level. A comparison between the results to transform non stationary data to stationary data using aforesaid transforms is given which clearly shows that the decomposition stock market index by discrete wavelet transform is better than maximal overlap discrete wavelet transform for original data.

Keywords: Discrete wavelet transform, maximal overlap discrete wavelet transform, stationarity, autocorrelation function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4681
7427 Comparative Study of Transformed and Concealed Data in Experimental Designs and Analyses

Authors: K. Chinda, P. Luangpaiboon

Abstract:

This paper presents the comparative study of coded data methods for finding the benefit of concealing the natural data which is the mercantile secret. Influential parameters of the number of replicates (rep), treatment effects (τ) and standard deviation (σ) against the efficiency of each transformation method are investigated. The experimental data are generated via computer simulations under the specified condition of the process with the completely randomized design (CRD). Three ways of data transformation consist of Box-Cox, arcsine and logit methods. The difference values of F statistic between coded data and natural data (Fc-Fn) and hypothesis testing results were determined. The experimental results indicate that the Box-Cox results are significantly different from natural data in cases of smaller levels of replicates and seem to be improper when the parameter of minus lambda has been assigned. On the other hand, arcsine and logit transformations are more robust and obviously, provide more precise numerical results. In addition, the alternate ways to select the lambda in the power transformation are also offered to achieve much more appropriate outcomes.

Keywords: Experimental Designs, Box-Cox, Arcsine, Logit Transformations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1582
7426 Design of a Low Cost Motion Data Acquisition Setup for Mechatronic Systems

Authors: Barış Can Yalçın

Abstract:

Motion sensors have been commonly used as a valuable component in mechatronic systems, however, many mechatronic designs and applications that need motion sensors cost enormous amount of money, especially high-tech systems. Design of a software for communication protocol between data acquisition card and motion sensor is another issue that has to be solved. This study presents how to design a low cost motion data acquisition setup consisting of MPU 6050 motion sensor (gyro and accelerometer in 3 axes) and Arduino Mega2560 microcontroller. Design parameters are calibration of the sensor, identification and communication between sensor and data acquisition card, interpretation of data collected by the sensor.

Keywords: Calibration of sensors, data acquisition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4298
7425 Conceptual Multidimensional Model

Authors: Manpreet Singh, Parvinder Singh, Suman

Abstract:

The data is available in abundance in any business organization. It includes the records for finance, maintenance, inventory, progress reports etc. As the time progresses, the data keep on accumulating and the challenge is to extract the information from this data bank. Knowledge discovery from these large and complex databases is the key problem of this era. Data mining and machine learning techniques are needed which can scale to the size of the problems and can be customized to the application of business. For the development of accurate and required information for particular problem, business analyst needs to develop multidimensional models which give the reliable information so that they can take right decision for particular problem. If the multidimensional model does not possess the advance features, the accuracy cannot be expected. The present work involves the development of a Multidimensional data model incorporating advance features. The criterion of computation is based on the data precision and to include slowly change time dimension. The final results are displayed in graphical form.

Keywords: Multidimensional, data precision.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1417
7424 Real Time Approach for Data Placement in Wireless Sensor Networks

Authors: Sanjeev Gupta, Mayank Dave

Abstract:

The issue of real-time and reliable report delivery is extremely important for taking effective decision in a real world mission critical Wireless Sensor Network (WSN) based application. The sensor data behaves differently in many ways from the data in traditional databases. WSNs need a mechanism to register, process queries, and disseminate data. In this paper we propose an architectural framework for data placement and management. We propose a reliable and real time approach for data placement and achieving data integrity using self organized sensor clusters. Instead of storing information in individual cluster heads as suggested in some protocols, in our architecture we suggest storing of information of all clusters within a cell in the corresponding base station. For data dissemination and action in the wireless sensor network we propose to use Action and Relay Stations (ARS). To reduce average energy dissipation of sensor nodes, the data is sent to the nearest ARS rather than base station. We have designed our architecture in such a way so as to achieve greater energy savings, enhanced availability and reliability.

Keywords: Cluster head, data reliability, real time communication, wireless sensor networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1777
7423 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: A classifier, Algorithms decision tree, knowledge extraction, Support Vector Machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1835
7422 A Software Framework for Predicting Oil-Palm Yield from Climate Data

Authors: Mohd. Noor Md. Sap, A. Majid Awan

Abstract:

Intelligent systems based on machine learning techniques, such as classification, clustering, are gaining wide spread popularity in real world applications. This paper presents work on developing a software system for predicting crop yield, for example oil-palm yield, from climate and plantation data. At the core of our system is a method for unsupervised partitioning of data for finding spatio-temporal patterns in climate data using kernel methods which offer strength to deal with complex data. This work gets inspiration from the notion that a non-linear data transformation into some high dimensional feature space increases the possibility of linear separability of the patterns in the transformed space. Therefore, it simplifies exploration of the associated structure in the data. Kernel methods implicitly perform a non-linear mapping of the input data into a high dimensional feature space by replacing the inner products with an appropriate positive definite function. In this paper we present a robust weighted kernel k-means algorithm incorporating spatial constraints for clustering the data. The proposed algorithm can effectively handle noise, outliers and auto-correlation in the spatial data, for effective and efficient data analysis by exploring patterns and structures in the data, and thus can be used for predicting oil-palm yield by analyzing various factors affecting the yield.

Keywords: Pattern analysis, clustering, kernel methods, spatial data, crop yield

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1937
7421 A Proposal for U-City (Smart City) Service Method Using Real-Time Digital Map

Authors: SangWon Han, MuWook Pyeon, Sujung Moon, DaeKyo Seo

Abstract:

Recently, technologies based on three-dimensional (3D) space information are being developed and quality of life is improving as a result. Research on real-time digital map (RDM) is being conducted now to provide 3D space information. RDM is a service that creates and supplies 3D space information in real time based on location/shape detection. Research subjects on RDM include the construction of 3D space information with matching image data, complementing the weaknesses of image acquisition using multi-source data, and data collection methods using big data. Using RDM will be effective for space analysis using 3D space information in a U-City and for other space information utilization technologies.

Keywords: RDM, multi-source data, big data, U-City.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 776
7420 Technology and Its Social Implications: Myths and Realities in the Interpretation of the Concept

Authors: E. V. Veraszto, J. T. F. Camargo, D. Silva, N. A. Miranda, F. O. Simon, S. F. Amaral, L. V. Freitas

Abstract:

The concept of technology as well as itself has evolved continuously over time, such that, nowadays, this concept is still marked by myths and realities. Even the concept of science is frequently misunderstood as technology. In this way, this paper presents different forms of interpretation of the concept of technology in the course of history, as well as the social and cultural aspects associated with it, through an analysis made by means of insights from sociological studies of science and technology and its multiple relations with society. Through the analysis of contents, the paper presents a classification of how technology is interpreted in the social sphere and search channel efforts to show how a broader understanding can contribute to better interpretations of how scientific and technological development influences the environment in which we operate. The text also presents a particular point of view for the interpretation of the concept from the analysis throughout the whole work.

Keywords: Technology, conceptions of technology, technological myths, definition of technology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1502
7419 Agile Methodology for Modeling and Design of Data Warehouses -AM4DW-

Authors: Nieto Bernal Wilson, Carmona Suarez Edgar

Abstract:

The organizations have structured and unstructured information in different formats, sources, and systems. Part of these come from ERP under OLTP processing that support the information system, however these organizations in OLAP processing level, presented some deficiencies, part of this problematic lies in that does not exist interesting into extract knowledge from their data sources, as also the absence of operational capabilities to tackle with these kind of projects.  Data Warehouse and its applications are considered as non-proprietary tools, which are of great interest to business intelligence, since they are repositories basis for creating models or patterns (behavior of customers, suppliers, products, social networks and genomics) and facilitate corporate decision making and research. The following paper present a structured methodology, simple, inspired from the agile development models as Scrum, XP and AUP. Also the models object relational, spatial data models, and the base line of data modeling under UML and Big data, from this way sought to deliver an agile methodology for the developing of data warehouses, simple and of easy application. The methodology naturally take into account the application of process for the respectively information analysis, visualization and data mining, particularly for patterns generation and derived models from the objects facts structured.

Keywords: Data warehouse, model data, big data, object fact, object relational fact, process developed data warehouse.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1446
7418 Distributed Data-Mining by Probability-Based Patterns

Authors: M. Kargar, F. Gharbalchi

Abstract:

In this paper a new method is suggested for distributed data-mining by the probability patterns. These patterns use decision trees and decision graphs. The patterns are cared to be valid, novel, useful, and understandable. Considering a set of functions, the system reaches to a good pattern or better objectives. By using the suggested method we will be able to extract the useful information from massive and multi-relational data bases.

Keywords: Data-mining, Decision tree, Decision graph, Pattern, Relationship.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1515
7417 A New Approaches for Seismic Signals Discrimination

Authors: M. Benbrahim, K. Benjelloun, A. Ibenbrahim, M. Kasmi, E. Ardil

Abstract:

The automatic discrimination of seismic signals is an important practical goal for the earth-science observatories due to the large amount of information that they receive continuously. An essential discrimination task is to allocate the incoming signal to a group associated with the kind of physical phenomena producing it. In this paper, we present new techniques for seismic signals classification: local, regional and global discrimination. These techniques were tested on seismic signals from the data base of the National Geophysical Institute of the Centre National pour la Recherche Scientifique et Technique (Morocco) by using the Moroccan software for seismic signals analysis.

Keywords: Seismic signals, local discrimination, regionaldiscrimination, global discrimination, Moroccan software for seismicsignals analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1511
7416 K-Means for Spherical Clusters with Large Variance in Sizes

Authors: A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, M. A. Ramadan

Abstract:

Data clustering is an important data exploration technique with many applications in data mining. The k-means algorithm is well known for its efficiency in clustering large data sets. However, this algorithm is suitable for spherical shaped clusters of similar sizes and densities. The quality of the resulting clusters decreases when the data set contains spherical shaped with large variance in sizes. In this paper, we introduce a competent procedure to overcome this problem. The proposed method is based on shifting the center of the large cluster toward the small cluster, and recomputing the membership of small cluster points, the experimental results reveal that the proposed algorithm produces satisfactory results.

Keywords: K-Means, Data Clustering, Cluster Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3249
7415 Representing Data without Lost Compression Properties in Time Series: A Review

Authors: Nabilah Filzah Mohd Radzuan, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan

Abstract:

Uncertain data is believed to be an important issue in building up a prediction model. The main objective in the time series uncertainty analysis is to formulate uncertain data in order to gain knowledge and fit low dimensional model prior to a prediction task. This paper discusses the performance of a number of techniques in dealing with uncertain data specifically those which solve uncertain data condition by minimizing the loss of compression properties.

Keywords: Compression properties, uncertainty, uncertain time series, mining technique, weather prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1583
7414 Are XBRL-based Financial Reports Better than Non-XBRL Reports? A Quality Assessment

Authors: Zhenkun Wang, Simon S. Gao

Abstract:

Using a scoring system, this paper provides a comparative assessment of the quality of data between XBRL formatted financial reports and non-XBRL financial reports. It shows a major improvement in the quality of data of XBRL formatted financial reports. Although XBRL formatted financial reports do not show much advantage in the quality at the beginning, XBRL financial reports lately display a large improvement in the quality of data in almost all aspects. With the improved XBRL web data managing, presentation and analysis applications, XBRL formatted financial reports have a much better accessibility, are more accurate and better in timeliness.

Keywords: Data Quality; Financial Report; Information; XBRL

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2512
7413 The Risk Assessment of Nano-particles and Investigation of Their Environmental Impact

Authors: Nader Nabhani, Amir Tofighi

Abstract:

Nanotechnology is the science of creating, using and manipulating objects which have at least one dimension in range of 0.1 to 100 nanometers. In other words, nanotechnology is reconstructing a substance using its individual atoms and arranging them in a way that is desirable for our purpose. The main reason that nanotechnology has been attracting attentions is the unique properties that objects show when they are formed at nano-scale. These differing characteristics that nano-scale materials show compared to their nature-existing form is both useful in creating high quality products and dangerous when being in contact with body or spread in environment. In order to control and lower the risk of such nano-scale particles, the main following three topics should be considered: 1) First of all, these materials would cause long term diseases that may show their effects on body years after being penetrated in human organs and since this science has become recently developed in industrial scale not enough information is available about their hazards on body. 2) The second is that these particles can easily spread out in environment and remain in air, soil or water for very long time, besides their high ability to penetrate body skin and causing new kinds of diseases. 3) The third one is that to protect body and environment against the danger of these particles, the protective barriers must be finer than these small objects and such defenses are hard to accomplish. This paper will review, discuss and assess the risks that human and environment face as this new science develops at a high rate.

Keywords: Nanotechnology, risk assessment, environment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1952
7412 Modeling of Random Variable with Digital Probability Hyper Digraph: Data-Oriented Approach

Authors: A. Habibizad Navin, M. Naghian Fesharaki, M. Mirnia, M. Kargar

Abstract:

In this paper we introduce Digital Probability Hyper Digraph for modeling random variable as the hierarchical data-oriented model.

Keywords: Data-Oriented Models, Data Structure, DigitalProbability Hyper Digraph, Random Variable, Statistic andProbability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1232
7411 Incorporating Lexical-Semantic Knowledge into Convolutional Neural Network Framework for Pediatric Disease Diagnosis

Authors: Xiaocong Liu, Huazhen Wang, Ting He, Xiaozheng Li, Weihan Zhang, Jian Chen

Abstract:

The utilization of electronic medical record (EMR) data to establish the disease diagnosis model has become an important research content of biomedical informatics. Deep learning can automatically extract features from the massive data, which brings about breakthroughs in the study of EMR data. The challenge is that deep learning lacks semantic knowledge, which leads to impracticability in medical science. This research proposes a method of incorporating lexical-semantic knowledge from abundant entities into a convolutional neural network (CNN) framework for pediatric disease diagnosis. Firstly, medical terms are vectorized into Lexical Semantic Vectors (LSV), which are concatenated with the embedded word vectors of word2vec to enrich the feature representation. Secondly, the semantic distribution of medical terms serves as Semantic Decision Guide (SDG) for the optimization of deep learning models. The study evaluates the performance of LSV-SDG-CNN model on four kinds of Chinese EMR datasets. Additionally, CNN, LSV-CNN, and SDG-CNN are designed as baseline models for comparison. The experimental results show that LSV-SDG-CNN model outperforms baseline models on four kinds of Chinese EMR datasets. The best configuration of the model yielded an F1 score of 86.20%. The results clearly demonstrate that CNN has been effectively guided and optimized by lexical-semantic knowledge, and LSV-SDG-CNN model improves the disease classification accuracy with a clear margin.

Keywords: lexical semantics, feature representation, semantic decision, convolutional neural network, electronic medical record

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 527