Search results for: Missing Data Techniques.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9101

Search results for: Missing Data Techniques.

8891 Computer Aided Diagnosis of Polycystic Kidney Disease Using ANN

Authors: Anjan Babu G, Sumana G, Rajasekhar M

Abstract:

Many inherited diseases and non-hereditary disorders are common in the development of renal cystic diseases. Polycystic kidney disease (PKD) is a disorder developed within the kidneys in which grouping of cysts filled with water like fluid. PKD is responsible for 5-10% of end-stage renal failure treated by dialysis or transplantation. New experimental models, application of molecular biology techniques have provided new insights into the pathogenesis of PKD. Researchers are showing keen interest for developing an automated system by applying computer aided techniques for the diagnosis of diseases. In this paper a multilayered feed forward neural network with one hidden layer is constructed, trained and tested by applying back propagation learning rule for the diagnosis of PKD based on physical symptoms and test results of urinalysis collected from the individual patients. The data collected from 50 patients are used to train and test the network. Among these samples, 75% of the data used for training and remaining 25% of the data are used for testing purpose. Further, this trained network is used to implement for new samples. The output results in normality and abnormality of the patient.

Keywords: Dialysis, Hereditary, Transplantation, Polycystic, Pathogenesis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1957
8890 Recommender Systems Using Ensemble Techniques

Authors: Yeonjeong Lee, Kyoung-jae Kim, Youngtae Kim

Abstract:

This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.

Keywords: Product recommender system, Ensemble technique, Association rules, Decision tree, Artificial neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4177
8889 Determining Cluster Boundaries Using Particle Swarm Optimization

Authors: Anurag Sharma, Christian W. Omlin

Abstract:

Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.

Keywords: Particle swarm optimization, self-organizing maps, clustering, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1674
8888 Transcutaneous Inductive Powering Links Based on ASK Modulation Techniques

Authors: S. M. Abbas, M. A. Hannan, S. A. Samad, A. Hussain

Abstract:

This paper presented a modified efficient inductive powering link based on ASK modulator and proposed efficient class- E power amplifier. The design presents the external part which is located outside the body to transfer power and data to the implanted devices such as implanted Microsystems to stimulate and monitoring the nerves and muscles. The system operated with low band frequency 10MHZ according to industrial- scientific – medical (ISM) band to avoid the tissue heating. For external part, the modulation index is 11.1% and the modulation rate 7.2% with data rate 1 Mbit/s assuming Tbit = 1us. The system has been designed using 0.35-μm fabricated CMOS technology. The mathematical model is given and the design is simulated using OrCAD P Spice 16.2 software tool and for real-time simulation, the electronic workbench MULISIM 11 has been used.

Keywords: Implanted devices, ASK techniques, Class-E power amplifier, Inductive powering and low-frequency ISM band.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2330
8887 Effective Keyword and Similarity Thresholds for the Discovery of Themes from the User Web Access Patterns

Authors: Haider A Ramadhan, Khalil Shihab

Abstract:

Clustering techniques have been used by many intelligent software agents to group similar access patterns of the Web users into high level themes which express users intentions and interests. However, such techniques have been mostly focusing on one salient feature of the Web document visited by the user, namely the extracted keywords. The major aim of these techniques is to come up with an optimal threshold for the number of keywords needed to produce more focused themes. In this paper we focus on both keyword and similarity thresholds to generate themes with concentrated themes, and hence build a more sound model of the user behavior. The purpose of this paper is two fold: use distance based clustering methods to recognize overall themes from the Proxy log file, and suggest an efficient cut off levels for the keyword and similarity thresholds which tend to produce more optimal clusters with better focus and efficient size.

Keywords: Data mining, knowledge discovery, clustering, dataanalysis, Web log analysis, theme based searching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1409
8886 A New Version of Annotation Method with a XML-based Knowledge Base

Authors: Mohammad Yasrebi, Somayeh Khosravi

Abstract:

Machine-understandable data when strongly interlinked constitutes the basis for the SemanticWeb. Annotating web documents is one of the major techniques for creating metadata on the Web. Annotating websitexs defines the containing data in a form which is suitable for interpretation by machines. In this paper, we present a better and improved approach than previous [1] to annotate the texts of the websites depends on the knowledge base.

Keywords: Knowledge base, ontology, semantic annotation, XML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1526
8885 Multidimensional Visualization Tools for Analysis of Expression Data

Authors: Urska Cvek, Marjan Trutschl, Randolph Stone II, Zanobia Syed, John L. Clifford, Anita L. Sabichi

Abstract:

Expression data analysis is based mostly on the statistical approaches that are indispensable for the study of biological systems. Large amounts of multidimensional data resulting from the high-throughput technologies are not completely served by biostatistical techniques and are usually complemented with visual, knowledge discovery and other computational tools. In many cases, in biological systems we only speculate on the processes that are causing the changes, and it is the visual explorative analysis of data during which a hypothesis is formed. We would like to show the usability of multidimensional visualization tools and promote their use in life sciences. We survey and show some of the multidimensional visualization tools in the process of data exploration, such as parallel coordinates and radviz and we extend them by combining them with the self-organizing map algorithm. We use a time course data set of transitional cell carcinoma of the bladder in our examples. Analysis of data with these tools has the potential to uncover additional relationships and non-trivial structures.

Keywords: microarrays, visualization, parallel coordinates, radviz, self-organizing maps.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2469
8884 A Comparative Study of Virus Detection Techniques

Authors: Sulaiman Al Amro, Ali Alkhalifah

Abstract:

The growing number of computer viruses and the detection of zero day malware have been the concern for security researchers for a large period of time. Existing antivirus products (AVs) rely on detecting virus signatures which do not provide a full solution to the problems associated with these viruses. The use of logic formulae to model the behaviour of viruses is one of the most encouraging recent developments in virus research, which provides alternatives to classic virus detection methods. In this paper, we proposed a comparative study about different virus detection techniques. This paper provides the advantages and drawbacks of different detection techniques. Different techniques will be used in this paper to provide a discussion about what technique is more effective to detect computer viruses.

Keywords: Computer viruses, virus detection, signature-based, behaviour-based, heuristic-based.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4545
8883 Combined Beamforming and Channel Estimation in WCDMA Communication Systems

Authors: Nermin A. Mohamed, Mohamed F. Madkour

Abstract:

We address the problem of joint beamforming and multipath channel parameters estimation in Wideband Code Division Multiple Access (WCDMA) communication systems that employ Multiple-Access Interference (MAI) suppression techniques in the uplink (from mobile to base station). Most of the existing schemes rely on time multiplex a training sequence with the user data. In WCDMA, the channel parameters can also be estimated from a code multiplexed common pilot channel (CPICH) that could be corrupted by strong interference resulting in a bad estimate. In this paper, we present new methods to combine interference suppression together with channel estimation when using multiple receiving antennas by using adaptive signal processing techniques. Computer simulation is used to compare between the proposed methods and the existing conventional estimation techniques.

Keywords: Adaptive arrays, channel estimation, interferencecancellation, wideband code division multiple access (WCDMA).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2273
8882 Cross Project Software Fault Prediction at Design Phase

Authors: Pradeep Singh, Shrish Verma

Abstract:

Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. Earlier we predicted the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven datasets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.

Keywords: Software Metrics, Fault prediction, Cross project, Within project.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2477
8881 Fuzzy Ideology based Long Term Load Forecasting

Authors: Jagadish H. Pujar

Abstract:

Fuzzy Load forecasting plays a paramount role in the operation and management of power systems. Accurate estimation of future power demands for various lead times facilitates the task of generating power reliably and economically. The forecasting of future loads for a relatively large lead time (months to few years) is studied here (long term load forecasting). Among the various techniques used in forecasting load, artificial intelligence techniques provide greater accuracy to the forecasts as compared to conventional techniques. Fuzzy Logic, a very robust artificial intelligent technique, is described in this paper to forecast load on long term basis. The paper gives a general algorithm to forecast long term load. The algorithm is an Extension of Short term load forecasting method to Long term load forecasting and concentrates not only on the forecast values of load but also on the errors incorporated into the forecast. Hence, by correcting the errors in the forecast, forecasts with very high accuracy have been achieved. The algorithm, in the paper, is demonstrated with the help of data collected for residential sector (LT2 (a) type load: Domestic consumers). Load, is determined for three consecutive years (from April-06 to March-09) in order to demonstrate the efficiency of the algorithm and to forecast for the next two years (from April-09 to March-11).

Keywords: Fuzzy Logic Control (FLC), Data DependantFactors(DDF), Model Dependent Factors(MDF), StatisticalError(SE), Short Term Load Forecasting (STLF), MiscellaneousError(ME).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2421
8880 Image Retrieval: Techniques, Challenge, and Trend

Authors: Hui Hui Wang, Dzulkifli Mohamad, N.A Ismail

Abstract:

This paper attempts to discuss the evolution of the retrieval techniques focusing on development, challenges and trends of the image retrieval. It highlights both the already addressed and outstanding issues. The explosive growth of image data leads to the need of research and development of Image Retrieval. However, Image retrieval researches are moving from keyword, to low level features and to semantic features. Drive towards semantic features is due to the problem of the keywords which can be very subjective and time consuming while low level features cannot always describe high level concepts in the users- mind.

Keywords: content based image retrieval, keyword based imageretrieval, semantic gap, semantic image retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2482
8879 Cluster Algorithm for Genetic Diversity

Authors: Manpreet Singh, Keerat Kaur, Bhavdeep Singh

Abstract:

With the hardware technology advancing, the cost of storing is decreasing. Thus there is an urgent need for new techniques and tools that can intelligently and automatically assist us in transferring this data into useful knowledge. Different techniques of data mining are developed which are helpful for handling these large size databases [7]. Data mining is also finding its role in the field of biotechnology. Pedigree means the associated ancestry of a crop variety. Genetic diversity is the variation in the genetic composition of individuals within or among species. Genetic diversity depends upon the pedigree information of the varieties. Parents at lower hierarchic levels have more weightage for predicting genetic diversity as compared to the upper hierarchic levels. The weightage decreases as the level increases. For crossbreeding, the two varieties should be more and more genetically diverse so as to incorporate the useful characters of the two varieties in the newly developed variety. This paper discusses the searching and analyzing of different possible pairs of varieties selected on the basis of morphological characters, Climatic conditions and Nutrients so as to obtain the most optimal pair that can produce the required crossbreed variety. An algorithm was developed to determine the genetic diversity between the selected wheat varieties. Cluster analysis technique is used for retrieving the results.

Keywords: Genetic diversity, pedigree, nutrients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1757
8878 From Industry 4.0 to Agriculture 4.0: A Framework to Manage Product Data in Agri-Food Supply Chain for Voluntary Traceability

Authors: Angelo Corallo, Maria Elena Latino, Marta Menegoli

Abstract:

Agri-food value chain involves various stakeholders with different roles. All of them abide by national and international rules and leverage marketing strategies to advance their products. Food products and related processing phases carry with it a big mole of data that are often not used to inform final customer. Some data, if fittingly identified and used, can enhance the single company, and/or the all supply chain creates a math between marketing techniques and voluntary traceability strategies. Moreover, as of late, the world has seen buying-models’ modification: customer is careful on wellbeing and food quality. Food citizenship and food democracy was born, leveraging on transparency, sustainability and food information needs. Internet of Things (IoT) and Analytics, some of the innovative technologies of Industry 4.0, have a significant impact on market and will act as a main thrust towards a genuine ‘4.0 change’ for agriculture. But, realizing a traceability system is not simple because of the complexity of agri-food supply chain, a lot of actors involved, different business models, environmental variations impacting products and/or processes, and extraordinary climate changes. In order to give support to the company involved in a traceability path, starting from business model analysis and related business process a Framework to Manage Product Data in Agri-Food Supply Chain for Voluntary Traceability was conceived. Studying each process task and leveraging on modeling techniques lead to individuate information held by different actors during agri-food supply chain. IoT technologies for data collection and Analytics techniques for data processing supply information useful to increase the efficiency intra-company and competitiveness in the market. The whole information recovered can be shown through IT solutions and mobile application to made accessible to the company, the entire supply chain and the consumer with the view to guaranteeing transparency and quality.

Keywords: Agriculture 4.0, agri-food supply chain, Industry 4.0, voluntary traceability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2279
8877 Heuristic Optimization Techniques for Network Reconfiguration in Distribution System

Authors: A. Charlangsut, N. Rugthaicharoencheep, S. Auchariyamet

Abstract:

Network reconfiguration is an operation to modify the network topology. The implementation of network reconfiguration has many advantages such as loss minimization, increasing system security and others. In this paper, two topics about the network reconfiguration in distribution system are briefly described. The first topic summarizes its impacts while the second explains some heuristic optimization techniques for solving the network reconfiguration problem.

Keywords: Network Reconfiguration, Optimization Techniques, Distribution System

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2712
8876 Hippocampus Segmentation using a Local Prior Model on its Boundary

Authors: Dimitrios Zarpalas, Anastasios Zafeiropoulos, Petros Daras, Nicos Maglaveras

Abstract:

Segmentation techniques based on Active Contour Models have been strongly benefited from the use of prior information during their evolution. Shape prior information is captured from a training set and is introduced in the optimization procedure to restrict the evolution into allowable shapes. In this way, the evolution converges onto regions even with weak boundaries. Although significant effort has been devoted on different ways of capturing and analyzing prior information, very little thought has been devoted on the way of combining image information with prior information. This paper focuses on a more natural way of incorporating the prior information in the level set framework. For proof of concept the method is applied on hippocampus segmentation in T1-MR images. Hippocampus segmentation is a very challenging task, due to the multivariate surrounding region and the missing boundary with the neighboring amygdala, whose intensities are identical. The proposed method, mimics the human segmentation way and thus shows enhancements in the segmentation accuracy.

Keywords: Medical imaging & processing, Brain MRI segmentation, hippocampus segmentation, hippocampus-amygdala missingboundary, weak boundary segmentation, region based segmentation, prior information, local weighting scheme in level sets, spatialdistribution of labels, gradient distribution on boundary.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1703
8875 An Approach for Reducing the Computational Complexity of LAMSTAR Intrusion Detection System using Principal Component Analysis

Authors: V. Venkatachalam, S. Selvan

Abstract:

The security of computer networks plays a strategic role in modern computer systems. Intrusion Detection Systems (IDS) act as the 'second line of defense' placed inside a protected network, looking for known or potential threats in network traffic and/or audit data recorded by hosts. We developed an Intrusion Detection System using LAMSTAR neural network to learn patterns of normal and intrusive activities, to classify observed system activities and compared the performance of LAMSTAR IDS with other classification techniques using 5 classes of KDDCup99 data. LAMSAR IDS gives better performance at the cost of high Computational complexity, Training time and Testing time, when compared to other classification techniques (Binary Tree classifier, RBF classifier, Gaussian Mixture classifier). we further reduced the Computational Complexity of LAMSTAR IDS by reducing the dimension of the data using principal component analysis which in turn reduces the training and testing time with almost the same performance.

Keywords: Binary Tree Classifier, Gaussian Mixture, IntrusionDetection System, LAMSTAR, Radial Basis Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1697
8874 Predicting Groundwater Areas Using Data Mining Techniques: Groundwater in Jordan as Case Study

Authors: Faisal Aburub, Wael Hadi

Abstract:

Data mining is the process of extracting useful or hidden information from a large database. Extracted information can be used to discover relationships among features, where data objects are grouped according to logical relationships; or to predict unseen objects to one of the predefined groups. In this paper, we aim to investigate four well-known data mining algorithms in order to predict groundwater areas in Jordan. These algorithms are Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbor (kNN) and Classification Based on Association Rule (CBA). The experimental results indicate that the SVMs algorithm outperformed other algorithms in terms of classification accuracy, precision and F1 evaluation measures using the datasets of groundwater areas that were collected from Jordanian Ministry of Water and Irrigation.

Keywords: Classification, data mining, evaluation measures, groundwater.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2541
8873 Comparison of Inter Cell Interference Coordination Approaches

Authors: Selma Sbit, Mohamed Bechir Dadi, Belgacem Chibani Rhaimi

Abstract:

This work aims to compare various techniques used in order to mitigate Inter-Cell Interference (ICI) in Long Term Evolution (LTE) and LTE-Advanced systems. For that, we will evaluate the performance of each one. In mobile communication networks, systems are limited by ICI particularly caused by deployment of small cells in conventional cell’s implementation. Therefore, various mitigation techniques, named Inter-Cell Interference Coordination techniques (ICIC), enhanced Inter-Cell Interference Coordination (eICIC) techniques and Coordinated Multi-Point transmission and reception (CoMP) are proposed. This paper presents a comparative study of these strategies. It can be concluded that CoMP techniques can ameliorate SINR and capacity system compared to ICIC and eICIC. In fact, SINR value reaches 15 dB for a distance of 0.5 km between user equipment and servant base station if we use CoMP technology whereas it cannot exceed 12 dB and 9 dB for eICIC and ICIC approaches respectively as reflected in simulations.

Keywords: 4th generation, interference, coordination, ICIC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 960
8872 Modeling of Reinforcement in Concrete Beams Using Machine Learning Tools

Authors: Yogesh Aggarwal

Abstract:

The paper discusses the results obtained to predict reinforcement in singly reinforced beam using Neural Net (NN), Support Vector Machines (SVM-s) and Tree Based Models. Major advantage of SVM-s over NN is of minimizing a bound on the generalization error of model rather than minimizing a bound on mean square error over the data set as done in NN. Tree Based approach divides the problem into a small number of sub problems to reach at a conclusion. Number of data was created for different parameters of beam to calculate the reinforcement using limit state method for creation of models and validation. The results from this study suggest a remarkably good performance of tree based and SVM-s models. Further, this study found that these two techniques work well and even better than Neural Network methods. A comparison of predicted values with actual values suggests a very good correlation coefficient with all four techniques.

Keywords: Linear Regression, M5 Model Tree, Neural Network, Support Vector Machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1992
8871 Analysing the Elementary Science and Technology Coursebook and Student Workbook in Terms of Constructivism

Authors: Nil Duban

Abstract:

The curriculum of the primary school science course was redesigned on the basis of constructivism in 2005-2006 academic years, in Turkey. In this context, the name of this course has been changed as “Science and Technology"; and both content and course books, students workbooks for this course have been redesigned in light of constructivism. The aim of this study is to determine whether the Science and Technology course books and student work books for primary school 5th grade are appropriate for the constructivism by evaluating them in terms of the fundamental principles of constructivism. In this study, out of qualitative research methods, documentation technique (i.e. document analysis) is applied; while selecting samples, criterion-sampling is used out of purposeful sampling techniques. When the Science and Technology course book and workbook for the 5th grade in primary education are examined, it is seen that both books complete each other in certain areas. Consequently, it can be claimed that in spite of some inadequate and missing points in the course book and workbook of the primary school Science and Technology course for the 5th grade students, these books are attempted to be designed in terms of the principles of constructivism. To overcome the inadequacies in the books, it can be suggested to redesign them. In addition to them, not to ignore the technology dimension of the course, the activities that encourage the students to prepare projects using technology cycle should be included.

Keywords: Constructivism, coursebooks, science and technology education.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1907
8870 Predictive Analysis for Big Data: Extension of Classification and Regression Trees Algorithm

Authors: Ameur Abdelkader, Abed Bouarfa Hafida

Abstract:

Since its inception, predictive analysis has revolutionized the IT industry through its robustness and decision-making facilities. It involves the application of a set of data processing techniques and algorithms in order to create predictive models. Its principle is based on finding relationships between explanatory variables and the predicted variables. Past occurrences are exploited to predict and to derive the unknown outcome. With the advent of big data, many studies have suggested the use of predictive analytics in order to process and analyze big data. Nevertheless, they have been curbed by the limits of classical methods of predictive analysis in case of a large amount of data. In fact, because of their volumes, their nature (semi or unstructured) and their variety, it is impossible to analyze efficiently big data via classical methods of predictive analysis. The authors attribute this weakness to the fact that predictive analysis algorithms do not allow the parallelization and distribution of calculation. In this paper, we propose to extend the predictive analysis algorithm, Classification And Regression Trees (CART), in order to adapt it for big data analysis. The major changes of this algorithm are presented and then a version of the extended algorithm is defined in order to make it applicable for a huge quantity of data.

Keywords: Predictive analysis, big data, predictive analysis algorithms. CART algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1016
8869 Techniques with Statistics for Web Page Watermarking

Authors: Mohamed Lahcen BenSaad, Sun XingMing

Abstract:

Information hiding, especially watermarking is a promising technique for the protection of intellectual property rights. This technology is mainly advanced for multimedia but the same has not been done for text. Web pages, like other documents, need a protection against piracy. In this paper, some techniques are proposed to show how to hide information in web pages using some features of the markup language used to describe these pages. Most of the techniques proposed here use the white space to hide information or some varieties of the language in representing elements. Experiments on a very small page and analysis of five thousands web pages show that these techniques have a wide bandwidth available for information hiding, and they might form a solid base to develop a robust algorithm for web page watermarking.

Keywords: Digital Watermarking, Information Hiding, Markup Language, Text watermarking, Software Watermarking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1750
8868 Dynamic Data Partition Algorithm for a Parallel H.264 Encoder

Authors: Juntae Kim, Jaeyoung Park, Kyoungkun Lee, Jong Tae Kim

Abstract:

The H.264/AVC standard is a highly efficient video codec providing high-quality videos at low bit-rates. As employing advanced techniques, the computational complexity has been increased. The complexity brings about the major problem in the implementation of a real-time encoder and decoder. Parallelism is the one of approaches which can be implemented by multi-core system. We analyze macroblock-level parallelism which ensures the same bit rate with high concurrency of processors. In order to reduce the encoding time, dynamic data partition based on macroblock region is proposed. The data partition has the advantages in load balancing and data communication overhead. Using the data partition, the encoder obtains more than 3.59x speed-up on a four-processor system. This work can be applied to other multimedia processing applications.

Keywords: H.264/AVC, video coding, thread-level parallelism, OpenMP, multimedia

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1760
8867 Energy Efficient In-Network Data Processing in Sensor Networks

Authors: Prakash G L, Thejaswini M, S H Manjula, K R Venugopal, L M Patnaik

Abstract:

The Sensor Network consists of densely deployed sensor nodes. Energy optimization is one of the most important aspects of sensor application design. Data acquisition and aggregation techniques for processing data in-network should be energy efficient. Due to the cross-layer design, resource-limited and noisy nature of Wireless Sensor Networks(WSNs), it is challenging to study the performance of these systems in a realistic setting. In this paper, we propose optimizing queries by aggregation of data and data redundancy to reduce energy consumption without requiring all sensed data and directed diffusion communication paradigm to achieve power savings, robust communication and processing data in-network. To estimate the per-node power consumption POWERTossim mica2 energy model is used, which provides scalable and accurate results. The performance analysis shows that the proposed methods overcomes the existing methods in the aspects of energy consumption in wireless sensor networks.

Keywords: Data Aggregation, Directed Diffusion, Partial Aggregation, Packet Merging, Query Plan.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1790
8866 Masquerade and “What Comes Behind Six Is More Than Seven”: Thoughts on Art History and Visual Culture Research Methods

Authors: Osa D Egonwa

Abstract:

In the 21st century, the disciplinary boundaries of past centuries that we often create through mainstream art historical classification, techniques and sources may have been eroded by visual culture, which seems to provide a more inclusive umbrella for the new ways artists go about the creative process and its resultant commodities. Over the past four decades, artists in Africa have resorted to new materials, techniques and themes which have affected our ways of research on these artists and their art. Frontline artists such as El Anatsui, Yinka Shonibare, Erasmus Onyishi are demonstrating that any material is just suitable for artistic expression. Most of times, these materials come with their own techniques/effects and visual syntax: a combination of materials compounds techniques, formal aesthetic indexes, halo effects, and iconography. This tends to challenge the categories and we lean on to view, think and talk about them. This renders our main stream art historical research methods inadequate, thus suggesting new discursive concepts, terms and theories. This paper proposed the Africanist eclectic methods derived from the dual framework of Masquerade Theory and What Comes Behind Six is More Than Seven. This paper shares thoughts/research on art historical methods, terminological re-alignments on classification/source data, presentational format and interpretation arising from the emergent trends in our subject. The outcome provides useful tools to mediate new thoughts and experiences in recent African art and visual culture.

Keywords: Art Historical Methods, Classifications, Concepts , Re-alignment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 584
8865 MCOKE: Multi-Cluster Overlapping K-Means Extension Algorithm

Authors: Said Baadel, Fadi Thabtah, Joan Lu

Abstract:

Clustering involves the partitioning of n objects into k clusters. Many clustering algorithms use hard-partitioning techniques where each object is assigned to one cluster. In this paper we propose an overlapping algorithm MCOKE which allows objects to belong to one or more clusters. The algorithm is different from fuzzy clustering techniques because objects that overlap are assigned a membership value of 1 (one) as opposed to a fuzzy membership degree. The algorithm is also different from other overlapping algorithms that require a similarity threshold be defined a priori which can be difficult to determine by novice users.

Keywords: Data mining, k-means, MCOKE, overlapping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2681
8864 XML Schema Automatic Matching Solution

Authors: Huynh Quyet Thang, Vo Sy Nam

Abstract:

Schema matching plays a key role in many different applications, such as schema integration, data integration, data warehousing, data transformation, E-commerce, peer-to-peer data management, ontology matching and integration, semantic Web, semantic query processing, etc. Manual matching is expensive and error-prone, so it is therefore important to develop techniques to automate the schema matching process. In this paper, we present a solution for XML schema automated matching problem which produces semantic mappings between corresponding schema elements of given source and target schemas. This solution contributed in solving more comprehensively and efficiently XML schema automated matching problem. Our solution based on combining linguistic similarity, data type compatibility and structural similarity of XML schema elements. After describing our solution, we present experimental results that demonstrate the effectiveness of this approach.

Keywords: XML Schema, Schema Matching, SemanticMatching, Automatic XML Schema Matching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1785
8863 Analysis of Diverse Cluster Ensemble Techniques

Authors: S. Sarumathi, N. Shanthi, P. Ranjetha

Abstract:

Data mining is the procedure of determining interesting patterns from the huge amount of data. With the intention of accessing the data faster the most supporting processes needed is clustering. Clustering is the process of identifying similarity between data according to the individuality present in the data and grouping associated data objects into clusters. Cluster ensemble is the technique to combine various runs of different clustering algorithms to obtain a general partition of the original dataset, aiming for consolidation of outcomes from a collection of individual clustering outcomes. The performances of clustering ensembles are mainly affecting by two principal factors such as diversity and quality. This paper presents the overview about the different cluster ensemble algorithm along with their methods used in cluster ensemble to improve the diversity and quality in the several cluster ensemble related papers and shows the comparative analysis of different cluster ensemble also summarize various cluster ensemble methods. Henceforth this clear analysis will be very useful for the world of clustering experts and also helps in deciding the most appropriate one to determine the problem in hand.

Keywords: Cluster Ensemble, Consensus Function, CSPA, Diversity, HGPA, MCLA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1793
8862 Complex Energy Signal Model for Digital Human Fingerprint Matching

Authors: Jason Zalev, Reza Sedaghat

Abstract:

This paper describes a complex energy signal model that is isomorphic with digital human fingerprint images. By using signal models, the problem of fingerprint matching is transformed into the signal processing problem of finding a correlation between two complex signals that differ by phase-rotation and time-scaling. A technique for minutiae matching that is independent of image translation, rotation and linear-scaling, and is resistant to missing minutiae is proposed. The method was tested using random data points. The results show that for matching prints the scaling and rotation angles are closely estimated and a stronger match will have a higher correlation.

Keywords: Affine Invariant, Fingerprint Recognition, Matching, Minutiae.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1269