Search results for: Data availability
6752 Load Balancing in Heterogeneous P2P Systems using Mobile Agents
Authors: Neeraj Nehra, R. B. Patel, V. K. Bhat
Abstract:
Use of the Internet and the World-Wide-Web (WWW) has become widespread in recent years and mobile agent technology has proliferated at an equally rapid rate. In this scenario load balancing becomes important for P2P systems. Beside P2P systems can be highly heterogeneous, i.e., they may consists of peers that range from old desktops to powerful servers connected to internet through high-bandwidth lines. There are various loads balancing policies came into picture. Primitive one is Message Passing Interface (MPI). Its wide availability and portability make it an attractive choice; however the communication requirements are sometimes inefficient when implementing the primitives provided by MPI. In this scenario we use the concept of mobile agent because Mobile agent (MA) based approach have the merits of high flexibility, efficiency, low network traffic, less communication latency as well as highly asynchronous. In this study we present decentralized load balancing scheme using mobile agent technology in which when a node is overloaded, task migrates to less utilized nodes so as to share the workload. However, the decision of which nodes receive migrating task is made in real-time by defining certain load balancing policies. These policies are executed on PMADE (A Platform for Mobile Agent Distribution and Execution) in decentralized manner using JuxtaNet and various load balancing metrics are discussed.Keywords: Mobile Agents, Agent host, Agent Submitter, PMADE.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17436751 Survey on Image Mining Using Genetic Algorithm
Authors: Jyoti Dua
Abstract:
One image is worth more than thousand words. Images if analyzed can reveal useful information. Low level image processing deals with the extraction of specific feature from a single image. Now the question arises: What technique should be used to extract patterns of very large and detailed image database? The answer of the question is: “Image Mining”. Image Mining deals with the extraction of image data relationship, implicit knowledge, and another pattern from the collection of images or image database. It is nothing but the extension of Data Mining. In the following paper, not only we are going to scrutinize the current techniques of image mining but also present a new technique for mining images using Genetic Algorithm.
Keywords: Image Mining, Data Mining, Genetic Algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24456750 Landscape Visual Classification Using Land use and Contour Data for Tourism and Planning Decision Making in Cameron Highlands District
Authors: Hosni, N., Shinozaki, M.
Abstract:
Cameron Highlands is known for upland tourism area with vast natural wealth, mountainous landscape endowed with rich diverse species as well as people traditions and cultures. With these various resources, CH possesses an interesting visual and panorama that can be offered to the tourist. However this benefit may not be utilized without obtaining the understanding of existing landscape structure and visual. Given a limited data, this paper attempts to classify landscape visual of Cameron Highlands using land use and contour data. Visual points of view were determined from the given tourist attraction points in the CH Local Plan 2003-2015. The result shows landscape visual and structure categories offered in the study area. The result can be used for further analysis to determine the best alternative tourist trails for tourism planning and decision making using readily available data.Keywords: Visibility, landscape visual, urban planning, GIS
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23736749 Sampling of Variables in Discrete-Event Simulation using the Example of Inventory Evolutions in Job-Shop-Systems Based on Deterministic and Non-Deterministic Data
Authors: Bernd Scholz-Reiter, Christian Toonen, Jan Topi Tervo, Dennis Lappe
Abstract:
Time series analysis often requires data that represents the evolution of an observed variable in equidistant time steps. In order to collect this data sampling is applied. While continuous signals may be sampled, analyzed and reconstructed applying Shannon-s sampling theorem, time-discrete signals have to be dealt with differently. In this article we consider the discrete-event simulation (DES) of job-shop-systems and study the effects of different sampling rates on data quality regarding completeness and accuracy of reconstructed inventory evolutions. At this we discuss deterministic as well as non-deterministic behavior of system variables. Error curves are deployed to illustrate and discuss the sampling rate-s impact and to derive recommendations for its wellfounded choice.Keywords: discrete-event simulation, job-shop-system, sampling rate.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18286748 School Design and Energy Efficiency
Authors: B. Su
Abstract:
Auckland has a temperate climate with comfortable warm, dry summers and mild, wet winters. An Auckland school normally does not need air conditioning for cooling during the summer and only need heating during the winter. The space hating energy is the major portion of winter school energy consumption and the winter energy consumption is major portion of annual school energy consumption. School building thermal design should focus on the winter thermal performance for reducing the space heating energy. A number of Auckland schools- design data and energy consumption data are used for this study. This pilot study investigates the relationships between their energy consumption data and school building design data to improve future school design for energy efficiency.Keywords: Building energy efficiency, building thermal performance, school building design, school energy consumption
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18836747 Automatic Visualization Pipeline Formation for Medical Datasets on Grid Computing Environment
Authors: Aboamama Atahar Ahmed, Muhammad Shafie Abd Latiff, Kamalrulnizam Abu Bakar, Zainul AhmadRajion
Abstract:
Distance visualization of large datasets often takes the direction of remote viewing and zooming techniques of stored static images. However, the continuous increase in the size of datasets and visualization operation causes insufficient performance with traditional desktop computers. Additionally, the visualization techniques such as Isosurface depend on the available resources of the running machine and the size of datasets. Moreover, the continuous demand for powerful computing powers and continuous increase in the size of datasets results an urgent need for a grid computing infrastructure. However, some issues arise in current grid such as resources availability at the client machines which are not sufficient enough to process large datasets. On top of that, different output devices and different network bandwidth between the visualization pipeline components often result output suitable for one machine and not suitable for another. In this paper we investigate how the grid services could be used to support remote visualization of large datasets and to break the constraint of physical co-location of the resources by applying the grid computing technologies. We show our grid enabled architecture to visualize large medical datasets (circa 5 million polygons) for remote interactive visualization on modest resources clients.
Keywords: Visualization, Grid computing, Medical datasets, visualization techniques, thin clients, Globus toolkit, VTK.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17516746 A Thought on Exotic Statistical Distributions
Authors: R K Sinha
Abstract:
The statistical distributions are modeled in explaining nature of various types of data sets. Although these distributions are mostly uni-modal, it is quite common to see multiple modes in the observed distribution of the underlying variables, which make the precise modeling unrealistic. The observed data do not exhibit smoothness not necessarily due to randomness, but could also be due to non-randomness resulting in zigzag curves, oscillations, humps etc. The present paper argues that trigonometric functions, which have not been used in probability functions of distributions so far, have the potential to take care of this, if incorporated in the distribution appropriately. A simple distribution (named as, Sinoform Distribution), involving trigonometric functions, is illustrated in the paper with a data set. The importance of trigonometric functions is demonstrated in the paper, which have the characteristics to make statistical distributions exotic. It is possible to have multiple modes, oscillations and zigzag curves in the density, which could be suitable to explain the underlying nature of select data set.Keywords: Exotic Statistical Distributions, Kurtosis, Mixture Distributions, Multi-modal
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16276745 Data and Spatial Analysis for Economy and Education of 28 E.U. Member-States for 2014
Authors: Alexiou Dimitra, Fragkaki Maria
Abstract:
The objective of the paper is the study of geographic, economic and educational variables and their contribution to determine the position of each member-state among the EU-28 countries based on the values of seven variables as given by Eurostat. The Data Analysis methods of Multiple Factorial Correspondence Analysis (MFCA) Principal Component Analysis and Factor Analysis have been used. The cross tabulation tables of data consist of the values of seven variables for the 28 countries for 2014. The data are manipulated using the CHIC Analysis V 1.1 software package. The results of this program using MFCA and Ascending Hierarchical Classification are given in arithmetic and graphical form. For comparison reasons with the same data the Factor procedure of Statistical package IBM SPSS 20 has been used. The numerical and graphical results presented with tables and graphs, demonstrate the agreement between the two methods. The most important result is the study of the relation between the 28 countries and the position of each country in groups or clouds, which are formed according to the values of the corresponding variables.
Keywords: Multiple factorial correspondence analysis, principal component analysis, factor analysis, E.U.-28 countries, statistical package IBM SPSS 20, CHIC Analysis V 1.1 Software, Eurostat.eu statistics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10856744 Comparison between Higher-Order SVD and Third-order Orthogonal Tensor Product Expansion
Authors: Chiharu Okuma, Jun Murakami, Naoki Yamamoto
Abstract:
In digital signal processing it is important to approximate multi-dimensional data by the method called rank reduction, in which we reduce the rank of multi-dimensional data from higher to lower. For 2-dimennsional data, singular value decomposition (SVD) is one of the most known rank reduction techniques. Additional, outer product expansion expanded from SVD was proposed and implemented for multi-dimensional data, which has been widely applied to image processing and pattern recognition. However, the multi-dimensional outer product expansion has behavior of great computation complex and has not orthogonally between the expansion terms. Therefore we have proposed an alterative method, Third-order Orthogonal Tensor Product Expansion short for 3-OTPE. 3-OTPE uses the power method instead of nonlinear optimization method for decreasing at computing time. At the same time the group of B. D. Lathauwer proposed Higher-Order SVD (HOSVD) that is also developed with SVD extensions for multi-dimensional data. 3-OTPE and HOSVD are similarly on the rank reduction of multi-dimensional data. Using these two methods we can obtain computation results respectively, some ones are the same while some ones are slight different. In this paper, we compare 3-OTPE to HOSVD in accuracy of calculation and computing time of resolution, and clarify the difference between these two methods.Keywords: Singular value decomposition (SVD), higher-order SVD (HOSVD), higher-order tensor, outer product expansion, power method.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15626743 Electron-Impact Excitation of Kr 5s, 5p Levels
Authors: Alla A. Mityureva
Abstract:
The available data on the cross sections of electronimpact excitation of krypton 5s and 5p configuration levels out of the ground state are represented in convenient and compact form. The results are obtained by regression through all known published data related to this process.Keywords: Cross section, electron excitation, krypton, regression
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 10876742 Seamless Flow of Voluminous Data in High Speed Network without Congestion Using Feedback Mechanism
Abstract:
Continuously growing needs for Internet applications that transmit massive amount of data have led to the emergence of high speed network. Data transfer must take place without any congestion and hence feedback parameters must be transferred from the receiver end to the sender end so as to restrict the sending rate in order to avoid congestion. Even though TCP tries to avoid congestion by restricting the sending rate and window size, it never announces the sender about the capacity of the data to be sent and also it reduces the window size by half at the time of congestion therefore resulting in the decrease of throughput, low utilization of the bandwidth and maximum delay. In this paper, XCP protocol is used and feedback parameters are calculated based on arrival rate, service rate, traffic rate and queue size and hence the receiver informs the sender about the throughput, capacity of the data to be sent and window size adjustment, resulting in no drastic decrease in window size, better increase in sending rate because of which there is a continuous flow of data without congestion. Therefore as a result of this, there is a maximum increase in throughput, high utilization of the bandwidth and minimum delay. The result of the proposed work is presented as a graph based on throughput, delay and window size. Thus in this paper, XCP protocol is well illustrated and the various parameters are thoroughly analyzed and adequately presented.Keywords: Bandwidth-Delay Product, Congestion Control, Congestion Window, TCP/IP
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14886741 Eco-Connectivity: Sustainable Practices in Telecom Networks Using Big Data
Authors: Tharunika Sridhar
Abstract:
This paper addresses sustainable eco-connectivity within the telecommunications sector studying its importance to tackle the contemporary challenges and data regulation issues. The paper also investigates the role of Big Data and its integration in this context, specific to telecom industry. One of the major focus areas in this paper is studying and examining the pathways explored, that are state-of-the-art ecological infrastructure solutions and sector-led measures derived from expert analyses and reviews. Additionally, the paper analyses critical factors involving cost-effective route planning, and the development of green telecommunications infrastructure that adds qualitative reasoning to the research idea. Furthermore, the study discusses in detail a potential green roadmap towards sustainability by exploring green routing software, eco-friendly infrastructure and other eco-focused initiatives. The paper is also directed at the special linguistic needs of the telecommunications sector by focusing on targeted select range of telecom environment.
Keywords: Big Data, telecom, sustainable telecom sector, telecom networks.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 856740 Comparison of Irradiance Decomposition and Energy Production Methods in a Solar Photovoltaic System
Authors: Tisciane Perpetuo e Oliveira, Dante Inga Narvaez, Marcelo Gradella Villalva
Abstract:
Installations of solar photovoltaic systems have increased considerably in the last decade. Therefore, it has been noticed that monitoring of meteorological data (solar irradiance, air temperature, wind velocity, etc.) is important to predict the potential of a given geographical area in solar energy production. In this sense, the present work compares two computational tools that are capable of estimating the energy generation of a photovoltaic system through correlation analyzes of solar radiation data: PVsyst software and an algorithm based on the PVlib package implemented in MATLAB. In order to achieve the objective, it was necessary to obtain solar radiation data (measured and from a solarimetric database), analyze the decomposition of global solar irradiance in direct normal and horizontal diffuse components, as well as analyze the modeling of the devices of a photovoltaic system (solar modules and inverters) for energy production calculations. Simulated results were compared with experimental data in order to evaluate the performance of the studied methods. Errors in estimation of energy production were less than 30% for the MATLAB algorithm and less than 20% for the PVsyst software.
Keywords: Energy production, meteorological data, irradiance decomposition, solar photovoltaic system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7676739 Development of Software Complex for Digitalization of Enterprise Activities
Authors: G. T. Balakayeva, K. K. Nurlybayeva, M. B. Zhanuzakov
Abstract:
In the proposed work, we have developed software and designed a software architecture for the implementation of enterprise business processes. The proposed software has a multi-level architecture using a domain-specific tool. The developed architecture is a guarantor of the availability, reliability and security of the system and the implementation of business processes, which are the basis for effective enterprise management. Automating business processes, automating the algorithmic stages of an enterprise, developing optimal algorithms for managing activities, controlling and monitoring, reducing risks and improving results help organizations achieve strategic goals quickly and efficiently. The software described in this article can connect to the corporate information system via two methods: a desktop client and a web client. With an appeal to the application server, the desktop client program connects to the information system on the company's work PCs over a local network. Outside the organization, the user can interact with the information system via a web browser, which acts as a web client and connects to a web server. The developed software consists of several integrated modules that share resources and interact with each other through an API. The following technology stack was used during development: Node js, React js, MongoDB, Ngnix, Cloud Technologies, Python.
Keywords: Algorithms, document processing, automation, integrated modules, software architecture, software design, information system.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2076738 Physicians’ Knowledge and Perception of Gene Profiling in Malaysia
Authors: Farahnaz Amini, Woo Yun Kin, Lazwani Kolandaiveloo
Abstract:
Availability of different genetic tests after completion of Human Genome Project increases the physicians’ responsibility to keep themselves update on the potential implementation of these genetic tests in their daily practice. However, due to numbers of barriers, still many of physicians are not either aware of these tests or are not willing to offer or refer their patients for genetic tests. This study was conducted an anonymous, cross-sectional, mailed-based survey to develop a primary data of Malaysian physicians’ level of knowledge and perception of gene profiling. Questionnaire had 29 questions. Total scores on selected questions were used to assess the level of knowledge. The highest possible score was 11. Descriptive statistics, one way ANOVA and chi-squared test was used for statistical analysis. Sixty three completed questionnaires were returned by 27 general practitioners (GPs) and 36 medical specialists. Responders’ age ranges from 24 to 55 years old (mean 30.2 ± 6.4). About 40% of the participants rated themselves as having poor level of knowledge in genetics in general whilst 60% believed that they have fair level of knowledge; however, almost half (46%) of the respondents felt that they were not knowledgeable about available genetic tests. A majority (94%) of the responders were not aware of any lab or company which is offering gene profiling services in Malaysia. Only 4% of participants were aware of using gene profiling for detection of dosage of some drugs. Respondents perceived greater utility of gene profiling for breast cancer (38%) compared to the colorectal familial cancer (3%). The score of knowledge ranged from 2 to 8 (mean 4.38 ± 1.67). Non- significant differences between score of knowledge of GPs and specialists were observed, with score of 4.19 and 4.58 respectively. There was no significant association between any demographic factors and level of knowledge. However, those who graduated between years 2001 to 2005 had higher level of knowledge. Overall, 83% of participants showed relatively high level of perception on value of gene profiling to detect patient’s risk of disease. However, low perception was observed for both statements of using gene profiling for general population in order to alter their lifestyle (25%) as well as having the full sequence of a patient genome for the purpose of determining a patient’s best match for treatment (18%). The lack of clinical guidelines, limited provider knowledge and awareness, lack of time and resources to educate patients, lack of evidence-based clinical information and cost of tests were the most barriers of ordering gene profiling mentioned by physicians. In conclusion Malaysian physicians who participate in this study had mediocre level of knowledge and awareness in gene profiling. The low exposure to the genetic questions and problems might be a key predictor of lack of awareness and knowledge on available genetic tests. Educational and training workshop might be useful in helping Malaysian physicians incorporate genetic profiling into practice for eligible patients.Keywords: Gene Profiling, Knowledge, Malaysia, Physician.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19546737 Cloud Computing Cryptography "State-of-the-Art"
Authors: Omer K. Jasim, Safia Abbas, El-Sayed M. El-Horbaty, Abdel-Badeeh M. Salem
Abstract:
Cloud computing technology is very useful in present day to day life, it uses the internet and the central remote servers to provide and maintain data as well as applications. Such applications in turn can be used by the end users via the cloud communications without any installation. Moreover, the end users’ data files can be accessed and manipulated from any other computer using the internet services. Despite the flexibility of data and application accessing and usage that cloud computing environments provide, there are many questions still coming up on how to gain a trusted environment that protect data and applications in clouds from hackers and intruders. This paper surveys the “keys generation and management” mechanism and encryption/decryption algorithms used in cloud computing environments, we proposed new security architecture for cloud computing environment that considers the various security gaps as much as possible. A new cryptographic environment that implements quantum mechanics in order to gain more trusted with less computation cloud communications is given.
Keywords: Cloud Computing, Cloud Encryption Model, Quantum Key Distribution.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 40946736 Deep iCrawl: An Intelligent Vision-Based Deep Web Crawler
Authors: R.Anita, V.Ganga Bharani, N.Nityanandam, Pradeep Kumar Sahoo
Abstract:
The explosive growth of World Wide Web has posed a challenging problem in extracting relevant data. Traditional web crawlers focus only on the surface web while the deep web keeps expanding behind the scene. Deep web pages are created dynamically as a result of queries posed to specific web databases. The structure of the deep web pages makes it impossible for traditional web crawlers to access deep web contents. This paper, Deep iCrawl, gives a novel and vision-based approach for extracting data from the deep web. Deep iCrawl splits the process into two phases. The first phase includes Query analysis and Query translation and the second covers vision-based extraction of data from the dynamically created deep web pages. There are several established approaches for the extraction of deep web pages but the proposed method aims at overcoming the inherent limitations of the former. This paper also aims at comparing the data items and presenting them in the required order.Keywords: Crawler, Deep web, Web Database
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21566735 Searchable Encryption in Cloud Storage
Authors: Ren-Junn Hwang, Chung-Chien Lu, Jain-Shing Wu
Abstract:
Cloud outsource storage is one of important services in cloud computing. Cloud users upload data to cloud servers to reduce the cost of managing data and maintaining hardware and software. To ensure data confidentiality, users can encrypt their files before uploading them to a cloud system. However, retrieving the target file from the encrypted files exactly is difficult for cloud server. This study proposes a protocol for performing multikeyword searches for encrypted cloud data by applying k-nearest neighbor technology. The protocol ranks the relevance scores of encrypted files and keywords, and prevents cloud servers from learning search keywords submitted by a cloud user. To reduce the costs of file transfer communication, the cloud server returns encrypted files in order of relevance. Moreover, when a cloud user inputs an incorrect keyword and the number of wrong alphabet does not exceed a given threshold; the user still can retrieve the target files from cloud server. In addition, the proposed scheme satisfies security requirements for outsourced data storage.
Keywords: Fault-tolerance search, multi-keywords search, outsource storage, ranked search, searchable encryption.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 30816734 Service Quality vs. Customer Satisfaction: Perspectives of Visitors to a Public University Library
Authors: Norazah Mohd Suki, Norbayah Mohd Suki
Abstract:
This study proposes a conceptual model and empirically tests the relationships between customers and librarians (i.e. tangibles, responsiveness, assurance, reliability and empathy) with a dependent variable (customer satisfaction) regarding library services. The SERVQUAL instrument was administered to 100 respondents which comprises of staff and students at a public higher learning institution in the Federal Territory of Labuan, Malaysia. They were public university library users. Results revealed that all service quality dimensions tested were significant and influenced customer satisfaction of visitors to a public university library. Assurance is the most important factor that influences customer satisfaction with the services rendered by the librarian. It is imperative for the library management to take note that the top five service attributes that gained greatest attention from library visitors- perspective includes employee willingness to help customers, availability of customer representatives online for response to queries, library staff actively and promptly provide services, signs in the building are clear and library staff are friendly and courteous. This study provides valuable results concerning the determinants of the service quality and customer satisfaction of public university library services from the users' perspective.Keywords: Service Quality, Customer Satisfaction, SERVQUAL Model, Multiple Regression Analysis
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 49086733 A Modified AES Based Algorithm for Image Encryption
Authors: M. Zeghid, M. Machhout, L. Khriji, A. Baganne, R. Tourki
Abstract:
With the fast evolution of digital data exchange, security information becomes much important in data storage and transmission. Due to the increasing use of images in industrial process, it is essential to protect the confidential image data from unauthorized access. In this paper, we analyze the Advanced Encryption Standard (AES), and we add a key stream generator (A5/1, W7) to AES to ensure improving the encryption performance; mainly for images characterised by reduced entropy. The implementation of both techniques has been realized for experimental purposes. Detailed results in terms of security analysis and implementation are given. Comparative study with traditional encryption algorithms is shown the superiority of the modified algorithm.Keywords: Cryptography, Encryption, Advanced EncryptionStandard (AES), ECB mode, statistical analysis, key streamgenerator.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 50596732 Incremental Mining of Shocking Association Patterns
Authors: Eiad Yafi, Ahmed Sultan Al-Hegami, M. A. Alam, Ranjit Biswas
Abstract:
Association rules are an important problem in data mining. Massively increasing volume of data in real life databases has motivated researchers to design novel and incremental algorithms for association rules mining. In this paper, we propose an incremental association rules mining algorithm that integrates shocking interestingness criterion during the process of building the model. A new interesting measure called shocking measure is introduced. One of the main features of the proposed approach is to capture the user background knowledge, which is monotonically augmented. The incremental model that reflects the changing data and the user beliefs is attractive in order to make the over all KDD process more effective and efficient. We implemented the proposed approach and experiment it with some public datasets and found the results quite promising.Keywords: Knowledge discovery in databases (KDD), Data mining, Incremental Association rules, Domain knowledge, Interestingness, Shocking rules (SHR).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18676731 Zero Inflated Strict Arcsine Regression Model
Authors: Y. N. Phang, E. F. Loh
Abstract:
Zero inflated strict arcsine model is a newly developed model which is found to be appropriate in modeling overdispersed count data. In this study, we extend zero inflated strict arcsine model to zero inflated strict arcsine regression model by taking into consideration the extra variability caused by extra zeros and covariates in count data. Maximum likelihood estimation method is used in estimating the parameters for this zero inflated strict arcsine regression model.Keywords: Overdispersed count data, maximum likelihood estimation, simulated annealing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17556730 Comparison of Different k-NN Models for Speed Prediction in an Urban Traffic Network
Authors: Seyoung Kim, Jeongmin Kim, Kwang Ryel Ryu
Abstract:
A database that records average traffic speeds measured at five-minute intervals for all the links in the traffic network of a metropolitan city. While learning from this data the models that can predict future traffic speed would be beneficial for the applications such as the car navigation system, building predictive models for every link becomes a nontrivial job if the number of links in a given network is huge. An advantage of adopting k-nearest neighbor (k-NN) as predictive models is that it does not require any explicit model building. Instead, k-NN takes a long time to make a prediction because it needs to search for the k-nearest neighbors in the database at prediction time. In this paper, we investigate how much we can speed up k-NN in making traffic speed predictions by reducing the amount of data to be searched for without a significant sacrifice of prediction accuracy. The rationale behind this is that we had a better look at only the recent data because the traffic patterns not only repeat daily or weekly but also change over time. In our experiments, we build several different k-NN models employing different sets of features which are the current and past traffic speeds of the target link and the neighbor links in its up/down-stream. The performances of these models are compared by measuring the average prediction accuracy and the average time taken to make a prediction using various amounts of data.Keywords: Big data, k-NN, machine learning, traffic speed prediction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13766729 GA Based Optimal Feature Extraction Method for Functional Data Classification
Authors: Jun Wan, Zehua Chen, Yingwu Chen, Zhidong Bai
Abstract:
Classification is an interesting problem in functional data analysis (FDA), because many science and application problems end up with classification problems, such as recognition, prediction, control, decision making, management, etc. As the high dimension and high correlation in functional data (FD), it is a key problem to extract features from FD whereas keeping its global characters, which relates to the classification efficiency and precision to heavens. In this paper, a novel automatic method which combined Genetic Algorithm (GA) and classification algorithm to extract classification features is proposed. In this method, the optimal features and classification model are approached via evolutional study step by step. It is proved by theory analysis and experiment test that this method has advantages in improving classification efficiency, precision and robustness whereas using less features and the dimension of extracted classification features can be controlled.Keywords: Classification, functional data, feature extraction, genetic algorithm, wavelet.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15556728 An Experimental Study of a Self-Supervised Classifier Ensemble
Authors: Neamat El Gayar
Abstract:
Learning using labeled and unlabelled data has received considerable amount of attention in the machine learning community due its potential in reducing the need for expensive labeled data. In this work we present a new method for combining labeled and unlabeled data based on classifier ensembles. The model we propose assumes each classifier in the ensemble observes the input using different set of features. Classifiers are initially trained using some labeled samples. The trained classifiers learn further through labeling the unknown patterns using a teaching signals that is generated using the decision of the classifier ensemble, i.e. the classifiers self-supervise each other. Experiments on a set of object images are presented. Our experiments investigate different classifier models, different fusing techniques, different training sizes and different input features. Experimental results reveal that the proposed self-supervised ensemble learning approach reduces classification error over the single classifier and the traditional ensemble classifier approachs.Keywords: Multiple Classifier Systems, classifier ensembles, learning using labeled and unlabelled data, K-nearest neighbor classifier, Bayes classifier.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16446727 Delay Analysis of Sampled-Data Systems in Hard RTOS
Authors: A. M. Azad, M. Alam, C. M. Hussain
Abstract:
In this paper, we have presented the effect of varying time-delays on performance and stability in the single-channel multirate sampled-data system in hard real-time (RT-Linux) environment. The sampling task require response time that might exceed the capacity of RT-Linux. So a straight implementation with RT-Linux is not feasible, because of the latency of the systems and hence, sampling period should be less to handle this task. The best sampling rate is chosen for the sampled-data system, which is the slowest rate meets all performance requirements. RT-Linux is consistent with its specifications and the resolution of the real-time is considered 0.01 seconds to achieve an efficient result. The test results of our laboratory experiment shows that the multi-rate control technique in hard real-time operating system (RTOS) can improve the stability problem caused by the random access delays and asynchronization.Keywords: Multi-rate, PID, RT-Linux, Sampled-data, Servo.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14446726 A Study of the Adaptive Reuse for School Land Use Strategy: An Application of the Analytic Network Process and Big Data
Authors: Wann-Ming Wey
Abstract:
In today's popularity and progress of information technology, the big data set and its analysis are no longer a major conundrum. Now, we could not only use the relevant big data to analysis and emulate the possible status of urban development in the near future, but also provide more comprehensive and reasonable policy implementation basis for government units or decision-makers via the analysis and emulation results as mentioned above. In this research, we set Taipei City as the research scope, and use the relevant big data variables (e.g., population, facility utilization and related social policy ratings) and Analytic Network Process (ANP) approach to implement in-depth research and discussion for the possible reduction of land use in primary and secondary schools of Taipei City. In addition to enhance the prosperous urban activities for the urban public facility utilization, the final results of this research could help improve the efficiency of urban land use in the future. Furthermore, the assessment model and research framework established in this research also provide a good reference for schools or other public facilities land use and adaptive reuse strategies in the future.
Keywords: Adaptive reuse, analytic network process, big data, land use strategy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9216725 A Review and Comparative Analysis on Cluster Ensemble Methods
Authors: S. Sarumathi, P. Ranjetha, C. Saraswathy, M. Vaishnavi, S. Geetha
Abstract:
Clustering is an unsupervised learning technique for aggregating data objects into meaningful classes so that intra cluster similarity is maximized and inter cluster similarity is minimized in data mining. However, no single clustering algorithm proves to be the most effective in producing the best result. As a result, a new challenging technique known as the cluster ensemble approach has blossomed in order to determine the solution to this problem. For the cluster analysis issue, this new technique is a successful approach. The cluster ensemble's main goal is to combine similar clustering solutions in a way that achieves the precision while also improving the quality of individual data clustering. Because of the massive and rapid creation of new approaches in the field of data mining, the ongoing interest in inventing novel algorithms necessitates a thorough examination of current techniques and future innovation. This paper presents a comparative analysis of various cluster ensemble approaches, including their methodologies, formal working process, and standard accuracy and error rates. As a result, the society of clustering practitioners will benefit from this exploratory and clear research, which will aid in determining the most appropriate solution to the problem at hand.
Keywords: Clustering, cluster ensemble methods, consensus function, data mining, unsupervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8226724 Simultaneous Clustering and Feature Selection Method for Gene Expression Data
Authors: T. Chandrasekhar, K. Thangavel, E. N. Sathishkumar
Abstract:
Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this work K-Means algorithms has been applied for clustering of Gene Expression Data. Further, rough set based Quick reduct algorithm has been applied for each cluster in order to select the most similar genes having high correlation. Then the ACV measure is used to evaluate the refined clusters and classification is used to evaluate the proposed method. They could identify compact clusters with feature selection method used to genes are selected.
Keywords: Clustering, Feature selection, Gene expression data, Quick reduct.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19676723 Segmentation Free Nastalique Urdu OCR
Authors: Sobia T. Javed, Sarmad Hussain, Ameera Maqbool, Samia Asloob, Sehrish Jamil, Huma Moin
Abstract:
The electronically available Urdu data is in image form which is very difficult to process. Printed Urdu data is the root cause of problem. So for the rapid progress of Urdu language we need an OCR systems, which can help us to make Urdu data available for the common person. Research has been carried out for years to automata Arabic and Urdu script. But the biggest hurdle in the development of Urdu OCR is the challenge to recognize Nastalique Script which is taken as standard for writing Urdu language. Nastalique script is written diagonally with no fixed baseline which makes the script somewhat complex. Overlap is present not only in characters but in the ligatures as well. This paper proposes a method which allows successful recognition of Nastalique Script.Keywords: HMM, Image processing, Optical CharacterRecognition, Urdu OCR.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2159