Search results for: dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 358

Search results for: dataset

58 Improving Activity Recognition Classification of Repetitious Beginner Swimming Using a 2-Step Peak/Valley Segmentation Method with Smoothing and Resampling for Machine Learning

Authors: Larry Powell, Seth Polsley, Drew Casey, Tracy Hammond

Abstract:

Human activity recognition (HAR) systems have shown positive performance when recognizing repetitive activities like walking, running, and sleeping. Water-based activities are a reasonably new area for activity recognition. However, water-based activity recognition has largely focused on supporting the elite and competitive swimming population, which already has amazing coordination and proper form. Beginner swimmers are not perfect, and activity recognition needs to support the individual motions to help beginners. Activity recognition algorithms are traditionally built around short segments of timed sensor data. Using a time window input can cause performance issues in the machine learning model. The window’s size can be too small or large, requiring careful tuning and precise data segmentation. In this work, we present a method that uses a time window as the initial segmentation, then separates the data based on the change in the sensor value. Our system uses a multi-phase segmentation method that pulls all peaks and valleys for each axis of an accelerometer placed on the swimmer’s lower back. This results in high recognition performance using leave-one-subject-out validation on our study with 20 beginner swimmers, with our model optimized from our final dataset resulting in an F-Score of 0.95.

Keywords: Time window, peak/valley segmentation, feature extraction, beginner swimming, activity recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 129
57 Rank-Based Chain-Mode Ensemble for Binary Classification

Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu

Abstract:

In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.

Keywords: Consensus, curse of correlation, imbalanced classification, rank-based chain-mode ensemble.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 670
56 Fast Approximate Bayesian Contextual Cold Start Learning (FAB-COST)

Authors: Jack R. McKenzie, Peter A. Appleby, Thomas House, Neil Walton

Abstract:

Cold-start is a notoriously difficult problem which can occur in recommendation systems, and arises when there is insufficient information to draw inferences for users or items. To address this challenge, a contextual bandit algorithm – the Fast Approximate Bayesian Contextual Cold Start Learning algorithm (FAB-COST) – is proposed, which is designed to provide improved accuracy compared to the traditionally used Laplace approximation in the logistic contextual bandit, while controlling both algorithmic complexity and computational cost. To this end, FAB-COST uses a combination of two moment projection variational methods: Expectation Propagation (EP), which performs well at the cold start, but becomes slow as the amount of data increases; and Assumed Density Filtering (ADF), which has slower growth of computational cost with data size but requires more data to obtain an acceptable level of accuracy. By switching from EP to ADF when the dataset becomes large, it is able to exploit their complementary strengths. The empirical justification for FAB-COST is presented, and systematically compared to other approaches on simulated data. In a benchmark against the Laplace approximation on real data consisting of over 670, 000 impressions from autotrader.co.uk, FAB-COST demonstrates at one point increase of over 16% in user clicks. On the basis of these results, it is argued that FAB-COST is likely to be an attractive approach to cold-start recommendation systems in a variety of contexts.

Keywords: Cold-start, expectation propagation, multi-armed bandits, Thompson sampling, variational inference.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 487
55 Satellite Data Classification Accuracy Assessment Based from Reference Dataset

Authors: Mohd Hasmadi Ismail, Kamaruzaman Jusoff

Abstract:

In order to develop forest management strategies in tropical forest in Malaysia, surveying the forest resources and monitoring the forest area affected by logging activities is essential. There are tremendous effort has been done in classification of land cover related to forest resource management in this country as it is a priority in all aspects of forest mapping using remote sensing and related technology such as GIS. In fact classification process is a compulsory step in any remote sensing research. Therefore, the main objective of this paper is to assess classification accuracy of classified forest map on Landsat TM data from difference number of reference data (200 and 388 reference data). This comparison was made through observation (200 reference data), and interpretation and observation approaches (388 reference data). Five land cover classes namely primary forest, logged over forest, water bodies, bare land and agricultural crop/mixed horticultural can be identified by the differences in spectral wavelength. Result showed that an overall accuracy from 200 reference data was 83.5 % (kappa value 0.7502459; kappa variance 0.002871), which was considered acceptable or good for optical data. However, when 200 reference data was increased to 388 in the confusion matrix, the accuracy slightly improved from 83.5% to 89.17%, with Kappa statistic increased from 0.7502459 to 0.8026135, respectively. The accuracy in this classification suggested that this strategy for the selection of training area, interpretation approaches and number of reference data used were importance to perform better classification result.

Keywords: Image Classification, Reference Data, Accuracy Assessment, Kappa Statistic, Forest Land Cover

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3090
54 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars, and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: Remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2009
53 Enhanced Planar Pattern Tracking for an Outdoor Augmented Reality System

Authors: L. Yu, W. K. Li, S. K. Ong, A. Y. C. Nee

Abstract:

In this paper, a scalable augmented reality framework for handheld devices is presented. The presented framework is enabled by using a server-client data communication structure, in which the search for tracking targets among a database of images is performed on the server-side while pixel-wise 3D tracking is performed on the client-side, which, in this case, is a handheld mobile device. Image search on the server-side adopts a residual-enhanced image descriptors representation that gives the framework a scalability property. The tracking algorithm on the client-side is based on a gravity-aligned feature descriptor which takes the advantage of a sensor-equipped mobile device and an optimized intensity-based image alignment approach that ensures the accuracy of 3D tracking. Automatic content streaming is achieved by using a key-frame selection algorithm, client working phase monitoring and standardized rules for content communication between the server and client. The recognition accuracy test performed on a standard dataset shows that the method adopted in the presented framework outperforms the Bag-of-Words (BoW) method that has been used in some of the previous systems. Experimental test conducted on a set of video sequences indicated the real-time performance of the tracking system with a frame rate at 15-30 frames per second. The presented framework is exposed to be functional in practical situations with a demonstration application on a campus walk-around.

Keywords: Augmented reality framework, server-client model, vision-based tracking, image search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1083
52 Retrieval of User Specific Images Using Semantic Signatures

Authors: K. Venkateswari, U. K. Balaji Saravanan, K. Thangaraj, K. V. Deepana

Abstract:

Image search engines rely on the surrounding textual keywords for the retrieval of images. It is a tedious work for the search engines like Google and Bing to interpret the user’s search intention and to provide the desired results. The recent researches also state that the Google image search engines do not work well on all the images. Consequently, this leads to the emergence of efficient image retrieval technique, which interprets the user’s search intention and shows the desired results. In order to accomplish this task, an efficient image re-ranking framework is required. Sequentially, to provide best image retrieval, the new image re-ranking framework is experimented in this paper. The implemented new image re-ranking framework provides best image retrieval from the image dataset by making use of re-ranking of retrieved images that is based on the user’s desired images. This is experimented in two sections. One is offline section and other is online section. In offline section, the reranking framework studies differently (reference classes or Semantic Spaces) for diverse user query keywords. The semantic signatures get generated by combining the textual and visual features of the images. In the online section, images are re-ranked by comparing the semantic signatures that are obtained from the reference classes with the user specified image query keywords. This re-ranking methodology will increases the retrieval image efficiency and the result will be effective to the user.

Keywords: CBIR, Image Re-ranking, Image Retrieval, Semantic Signature, Semantic Space.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1897
51 Multi-Temporal Mapping of Built-up Areas Using Daytime and Nighttime Satellite Images Based on Google Earth Engine Platform

Authors: S. Hutasavi, D. Chen

Abstract:

The built-up area is a significant proxy to measure regional economic growth and reflects the Gross Provincial Product (GPP). However, an up-to-date and reliable database of built-up areas is not always available, especially in developing countries. The cloud-based geospatial analysis platform such as Google Earth Engine (GEE) provides an opportunity with accessibility and computational power for those countries to generate the built-up data. Therefore, this study aims to extract the built-up areas in Eastern Economic Corridor (EEC), Thailand using day and nighttime satellite imagery based on GEE facilities. The normalized indices were generated from Landsat 8 surface reflectance dataset, including Normalized Difference Built-up Index (NDBI), Built-up Index (BUI), and Modified Built-up Index (MBUI). These indices were applied to identify built-up areas in EEC. The result shows that MBUI performs better than BUI and NDBI, with the highest accuracy of 0.85 and Kappa of 0.82. Moreover, the overall accuracy of classification was improved from 79% to 90%, and error of total built-up area was decreased from 29% to 0.7%, after night-time light data from the Visible and Infrared Imaging Suite (VIIRS) Day Night Band (DNB). The results suggest that MBUI with night-time light imagery is appropriate for built-up area extraction and be utilize for further study of socioeconomic impacts of regional development policy over the EEC region.

Keywords: Built-up area extraction, Google earth engine, adaptive thresholding method, rapid mapping.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 535
50 RV-YOLOX: Object Detection on Inland Waterways Based on Optimized YOLOX through Fusion of Vision and 3+1D Millimeter Wave Radar

Authors: Zixian Zhang, Shanliang Yao, Zile Huang, Zhaodong Wu, Xiaohui Zhu, Yong Yue, Jieming Ma

Abstract:

Unmanned Surface Vehicles (USVs) hold significant value for their capacity to undertake hazardous and labor-intensive operations over aquatic environments. Object detection tasks are significant in these applications. Nonetheless, the efficacy of USVs in object detection is impeded by several intrinsic challenges, including the intricate dispersal of obstacles, reflections emanating from coastal structures, and the presence of fog over water surfaces, among others. To address these problems, this paper provides a fusion method for USVs to effectively detect objects in the inland surface environment, utilizing vision sensors and 3+1D Millimeter-wave radar. The MMW radar is a complementary tool to vision sensors, offering reliable environmental data. This approach involves the conversion of the radar’s 3D point cloud into a 2D radar pseudo-image, thereby standardizing the format for radar and vision data by leveraging a point transformer. Furthermore, this paper proposes the development of a multi-source object detection network, named RV-YOLOX, which leverages radar-vision integration specifically tailored for inland waterway environments. The performance is evaluated on our self-recording waterways dataset. Compared with the YOLOX network, our fusion network significantly improves detection accuracy, especially for objects with bad light conditions.

Keywords: Inland waterways, object detection, YOLO, sensor fusion, self-attention, deep learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 105
49 Improving Similarity Search Using Clustered Data

Authors: Deokho Kim, Wonwoo Lee, Jaewoong Lee, Teresa Ng, Gun-Ill Lee, Jiwon Jeong

Abstract:

This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.

Keywords: Visual search, deep learning, convolutional neural network, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 774
48 Bayes Net Classifiers for Prediction of Renal Graft Status and Survival Period

Authors: Jiakai Li, Gursel Serpen, Steven Selman, Matt Franchetti, Mike Riesen, Cynthia Schneider

Abstract:

This paper presents the development of a Bayesian belief network classifier for prediction of graft status and survival period in renal transplantation using the patient profile information prior to the transplantation. The objective was to explore feasibility of developing a decision making tool for identifying the most suitable recipient among the candidate pool members. The dataset was compiled from the University of Toledo Medical Center Hospital patients as reported to the United Network Organ Sharing, and had 1228 patient records for the period covering 1987 through 2009. The Bayes net classifiers were developed using the Weka machine learning software workbench. Two separate classifiers were induced from the data set, one to predict the status of the graft as either failed or living, and a second classifier to predict the graft survival period. The classifier for graft status prediction performed very well with a prediction accuracy of 97.8% and true positive values of 0.967 and 0.988 for the living and failed classes, respectively. The second classifier to predict the graft survival period yielded a prediction accuracy of 68.2% and a true positive rate of 0.85 for the class representing those instances with kidneys failing during the first year following transplantation. Simulation results indicated that it is feasible to develop a successful Bayesian belief network classifier for prediction of graft status, but not the graft survival period, using the information in UNOS database.

Keywords: Bayesian network classifier, renal transplantation, graft survival period, United Network for Organ Sharing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2060
47 Intelligent Assistive Methods for Diagnosis of Rheumatoid Arthritis Using Histogram Smoothing and Feature Extraction of Bone Images

Authors: SP. Chokkalingam, K. Komathy

Abstract:

Advances in the field of image processing envision a new era of evaluation techniques and application of procedures in various different fields. One such field being considered is the biomedical field for prognosis as well as diagnosis of diseases. This plethora of methods though provides a wide range of options to select from, it also proves confusion in selecting the apt process and also in finding which one is more suitable. Our objective is to use a series of techniques on bone scans, so as to detect the occurrence of rheumatoid arthritis (RA) as accurately as possible. Amongst other techniques existing in the field our proposed system tends to be more effective as it depends on new methodologies that have been proved to be better and more consistent than others. Computer aided diagnosis will provide more accurate and infallible rate of consistency that will help to improve the efficiency of the system. The image first undergoes histogram smoothing and specification, morphing operation, boundary detection by edge following algorithm and finally image subtraction to determine the presence of rheumatoid arthritis in a more efficient and effective way. Using preprocessing noises are removed from images and using segmentation, region of interest is found and Histogram smoothing is applied for a specific portion of the images. Gray level co-occurrence matrix (GLCM) features like Mean, Median, Energy, Correlation, Bone Mineral Density (BMD) and etc. After finding all the features it stores in the database. This dataset is trained with inflamed and noninflamed values and with the help of neural network all the new images are checked properly for their status and Rough set is implemented for further reduction.

Keywords: Computer Aided Diagnosis, Edge Detection, Histogram Smoothing, Rheumatoid Arthritis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2432
46 Applying Case-Based Reasoning in Supporting Strategy Decisions

Authors: S. M. Seyedhosseini, A. Makui, M. Ghadami

Abstract:

Globalization and therefore increasing tight competition among companies, have resulted to increase the importance of making well-timed decision. Devising and employing effective strategies, that are flexible and adaptive to changing market, stand a greater chance of being effective in the long-term. In other side, a clear focus on managing the entire product lifecycle has emerged as critical areas for investment. Therefore, applying wellorganized tools to employ past experience in new case, helps to make proper and managerial decisions. Case based reasoning (CBR) is based on a means of solving a new problem by using or adapting solutions to old problems. In this paper, an adapted CBR model with k-nearest neighbor (K-NN) is employed to provide suggestions for better decision making which are adopted for a given product in the middle of life phase. The set of solutions are weighted by CBR in the principle of group decision making. Wrapper approach of genetic algorithm is employed to generate optimal feature subsets. The dataset of the department store, including various products which are collected among two years, have been used. K-fold approach is used to evaluate the classification accuracy rate. Empirical results are compared with classical case based reasoning algorithm which has no special process for feature selection, CBR-PCA algorithm based on filter approach feature selection, and Artificial Neural Network. The results indicate that the predictive performance of the model, compare with two CBR algorithms, in specific case is more effective.

Keywords: Case based reasoning, Genetic algorithm, Groupdecision making, Product management.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2133
45 Power Production Performance of Different Wave Energy Converters in the Southwestern Black Sea

Authors: Ajab G. Majidi, Bilal Bingölbali, Adem Akpınar

Abstract:

This study aims to investigate the amount of energy (economic wave energy potential) that can be obtained from the existing wave energy converters in the high wave energy potential region of the Black Sea in terms of wave energy potential and their performance at different depths in the region. The data needed for this purpose were obtained using the calibrated nested layered SWAN wave modeling program version 41.01AB, which was forced with Climate Forecast System Reanalysis (CFSR) winds from 1979 to 2009. The wave dataset at a time interval of 2 hours was accumulated for a sub-grid domain for around Karaburun beach in Arnavutkoy, a district of Istanbul city. The annual sea state characteristic matrices for the five different depths along with a vertical line to the coastline were calculated for 31 years. According to the power matrices of different wave energy converter systems and characteristic matrices for each possible installation depth, the probability distribution tables of the specified mean wave period or wave energy period and significant wave height were calculated. Then, by using the relationship between these distribution tables, according to the present wave climate, the energy that the wave energy converter systems at each depth can produce was determined. Thus, the economically feasible potential of the relevant coastal zone was revealed, and the effect of different depths on energy converter systems is presented. The Oceantic at 50, 75 and 100 m depths and Oyster at 5 and 25 m depths presents the best performance. In the 31-year long period 1998 the most and 1989 is the least dynamic year.

Keywords: Annual power production, Black Sea, efficiency, power production performance, wave energy converter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 587
44 NANCY: Combining Adversarial Networks with Cycle-Consistency for Robust Multi-Modal Image Registration

Authors: Mirjana Ruppel, Rajendra Persad, Amit Bahl, Sanja Dogramadzi, Chris Melhuish, Lyndon Smith

Abstract:

Multimodal image registration is a profoundly complex task which is why deep learning has been used widely to address it in recent years. However, two main challenges remain: Firstly, the lack of ground truth data calls for an unsupervised learning approach, which leads to the second challenge of defining a feasible loss function that can compare two images of different modalities to judge their level of alignment. To avoid this issue altogether we implement a generative adversarial network consisting of two registration networks GAB, GBA and two discrimination networks DA, DB connected by spatial transformation layers. GAB learns to generate a deformation field which registers an image of the modality B to an image of the modality A. To do that, it uses the feedback of the discriminator DB which is learning to judge the quality of alignment of the registered image B. GBA and DA learn a mapping from modality A to modality B. Additionally, a cycle-consistency loss is implemented. For this, both registration networks are employed twice, therefore resulting in images ˆA, ˆB which were registered to ˜B, ˜A which were registered to the initial image pair A, B. Thus the resulting and initial images of the same modality can be easily compared. A dataset of liver CT and MRI was used to evaluate the quality of our approach and to compare it against learning and non-learning based registration algorithms. Our approach leads to dice scores of up to 0.80 ± 0.01 and is therefore comparable to and slightly more successful than algorithms like SimpleElastix and VoxelMorph.

Keywords: Multimodal image registration, GAN, cycle consistency, deep learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 738
43 Crashworthiness Optimization of an Automotive Front Bumper in Composite Material

Authors: S. Boria

Abstract:

In the last years, the crashworthiness of an automotive body structure can be improved, since the beginning of the design stage, thanks to the development of specific optimization tools. It is well known how the finite element codes can help the designer to investigate the crashing performance of structures under dynamic impact. Therefore, by coupling nonlinear mathematical programming procedure and statistical techniques with FE simulations, it is possible to optimize the design with reduced number of analytical evaluations. In engineering applications, many optimization methods which are based on statistical techniques and utilize estimated models, called meta-models, are quickly spreading. A meta-model is an approximation of a detailed simulation model based on a dataset of input, identified by the design of experiments (DOE); the number of simulations needed to build it depends on the number of variables. Among the various types of meta-modeling techniques, Kriging method seems to be excellent in accuracy, robustness and efficiency compared to other ones when applied to crashworthiness optimization. Therefore the application of such meta-model was used in this work, in order to improve the structural optimization of a bumper for a racing car in composite material subjected to frontal impact. The specific energy absorption represents the objective function to maximize and the geometrical parameters subjected to some design constraints are the design variables. LS-DYNA codes were interfaced with LS-OPT tool in order to find the optimized solution, through the use of a domain reduction strategy. With the use of the Kriging meta-model the crashworthiness characteristic of the composite bumper was improved.

Keywords: Composite material, crashworthiness, finite element analysis, optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1075
42 Automatic Classification of Lung Diseases from CT Images

Authors: Abobaker Mohammed Qasem Farhan, Shangming Yang, Mohammed Al-Nehari

Abstract:

Pneumonia is a kind of lung disease that creates congestion in the chest. Such pneumonic conditions lead to loss of life due to the severity of high congestion. Pneumonic lung disease is caused by viral pneumonia, bacterial pneumonia, or COVID-19 induced pneumonia. The early prediction and classification of such lung diseases help reduce the mortality rate. We propose the automatic Computer-Aided Diagnosis (CAD) system in this paper using the deep learning approach. The proposed CAD system takes input from raw computerized tomography (CT) scans of the patient's chest and automatically predicts disease classification. We designed the Hybrid Deep Learning Algorithm (HDLA) to improve accuracy and reduce processing requirements. The raw CT scans are pre-processed first to enhance their quality for further analysis. We then applied a hybrid model that consists of automatic feature extraction and classification. We propose the robust 2D Convolutional Neural Network (CNN) model to extract the automatic features from the pre-processed CT image. This CNN model assures feature learning with extremely effective 1D feature extraction for each input CT image. The outcome of the 2D CNN model is then normalized using the Min-Max technique. The second step of the proposed hybrid model is related to training and classification using different classifiers. The simulation outcomes using the publicly available dataset prove the robustness and efficiency of the proposed model compared to state-of-art algorithms.

Keywords: CT scans, COVID-19, deep learning, image processing, pneumonia, lung disease.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 478
41 A Robust and Efficient Segmentation Method Applied for Cardiac Left Ventricle with Abnormal Shapes

Authors: Peifei Zhu, Zisheng Li, Yasuki Kakishita, Mayumi Suzuki, Tomoaki Chono

Abstract:

Segmentation of left ventricle (LV) from cardiac ultrasound images provides a quantitative functional analysis of the heart to diagnose disease. Active Shape Model (ASM) is widely used for LV segmentation, but it suffers from the drawback that initialization of the shape model is not sufficiently close to the target, especially when dealing with abnormal shapes in disease. In this work, a two-step framework is improved to achieve a fast and efficient LV segmentation. First, a robust and efficient detection based on Hough forest localizes cardiac feature points. Such feature points are used to predict the initial fitting of the LV shape model. Second, ASM is applied to further fit the LV shape model to the cardiac ultrasound image. With the robust initialization, ASM is able to achieve more accurate segmentation. The performance of the proposed method is evaluated on a dataset of 810 cardiac ultrasound images that are mostly abnormal shapes. This proposed method is compared with several combinations of ASM and existing initialization methods. Our experiment results demonstrate that accuracy of the proposed method for feature point detection for initialization was 40% higher than the existing methods. Moreover, the proposed method significantly reduces the number of necessary ASM fitting loops and thus speeds up the whole segmentation process. Therefore, the proposed method is able to achieve more accurate and efficient segmentation results and is applicable to unusual shapes of heart with cardiac diseases, such as left atrial enlargement.

Keywords: Hough forest, active shape model, segmentation, cardiac left ventricle.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1450
40 Comparative Study Using Weka for Red Blood Cells Classification

Authors: Jameela Ali Alkrimi, Hamid A. Jalab, Loay E. George, Abdul Rahim Ahmad, Azizah Suliman, Karim Al-Jashamy

Abstract:

Red blood cells (RBC) are the most common types of blood cells and are the most intensively studied in cell biology. The lack of RBCs is a condition in which the amount of hemoglobin level is lower than normal and is referred to as “anemia”. Abnormalities in RBCs will affect the exchange of oxygen. This paper presents a comparative study for various techniques for classifying the RBCs as normal or abnormal (anemic) using WEKA. WEKA is an open source consists of different machine learning algorithms for data mining applications. The algorithms tested are Radial Basis Function neural network, Support vector machine, and K-Nearest Neighbors algorithm. Two sets of combined features were utilized for classification of blood cells images. The first set, exclusively consist of geometrical features, was used to identify whether the tested blood cell has a spherical shape or non-spherical cells. While the second set, consist mainly of textural features was used to recognize the types of the spherical cells. We have provided an evaluation based on applying these classification methods to our RBCs image dataset which were obtained from Serdang Hospital - Malaysia, and measuring the accuracy of test results. The best achieved classification rates are 97%, 98%, and 79% for Support vector machines, Radial Basis Function neural network, and K-Nearest Neighbors algorithm respectively.

Keywords: K-Nearest Neighbors, Neural Network, Radial Basis Function, Red blood cells, Support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2936
39 Mapping of Alteration Zones in Mineral Rich Belt of South-East Rajasthan Using Remote Sensing Techniques

Authors: Mrinmoy Dhara, Vivek K. Sengar, Shovan L. Chattoraj, Soumiya Bhattacharjee

Abstract:

Remote sensing techniques have emerged as an asset for various geological studies. Satellite images obtained by different sensors contain plenty of information related to the terrain. Digital image processing further helps in customized ways for the prospecting of minerals. In this study, an attempt has been made to map the hydrothermally altered zones using multispectral and hyperspectral datasets of South East Rajasthan. Advanced Space-borne Thermal Emission and Reflection Radiometer (ASTER) and Hyperion (Level1R) dataset have been processed to generate different Band Ratio Composites (BRCs). For this study, ASTER derived BRCs were generated to delineate the alteration zones, gossans, abundant clays and host rocks. ASTER and Hyperion images were further processed to extract mineral end members and classified mineral maps have been produced using Spectral Angle Mapper (SAM) method. Results were validated with the geological map of the area which shows positive agreement with the image processing outputs. Thus, this study concludes that the band ratios and image processing in combination play significant role in demarcation of alteration zones which may provide pathfinders for mineral prospecting studies.

Keywords: Advanced space-borne thermal emission and reflection radiometer, ASTER, Hyperion, Band ratios, Alteration zones, spectral angle mapper.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1430
38 Effects of Different Meteorological Variables on Reference Evapotranspiration Modeling: Application of Principal Component Analysis

Authors: Akinola Ikudayisi, Josiah Adeyemo

Abstract:

The correct estimation of reference evapotranspiration (ETₒ) is required for effective irrigation water resources planning and management. However, there are some variables that must be considered while estimating and modeling ETₒ. This study therefore determines the multivariate analysis of correlated variables involved in the estimation and modeling of ETₒ at Vaalharts irrigation scheme (VIS) in South Africa using Principal Component Analysis (PCA) technique. Weather and meteorological data between 1994 and 2014 were obtained both from South African Weather Service (SAWS) and Agricultural Research Council (ARC) in South Africa for this study. Average monthly data of minimum and maximum temperature (°C), rainfall (mm), relative humidity (%), and wind speed (m/s) were the inputs to the PCA-based model, while ETₒ is the output. PCA technique was adopted to extract the most important information from the dataset and also to analyze the relationship between the five variables and ETₒ. This is to determine the most significant variables affecting ETₒ estimation at VIS. From the model performances, two principal components with a variance of 82.7% were retained after the eigenvector extraction. The results of the two principal components were compared and the model output shows that minimum temperature, maximum temperature and windspeed are the most important variables in ETₒ estimation and modeling at VIS. In order words, ETₒ increases with temperature and windspeed. Other variables such as rainfall and relative humidity are less important and cannot be used to provide enough information about ETₒ estimation at VIS. The outcome of this study has helped to reduce input variable dimensionality from five to the three most significant variables in ETₒ modelling at VIS, South Africa.

Keywords: Irrigation, principal component analysis, reference evapotranspiration, Vaalharts.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1017
37 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection

Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada

Abstract:

With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.

Keywords: Machine learning, Imbalanced data, Data mining, Big data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1063
36 Automated Natural Hazard Zonation System with Internet-SMS Warning: Distributed GIS for Sustainable Societies Creating Schema & Interface for Mapping & Communication

Authors: Devanjan Bhattacharya, Jitka Komarkova

Abstract:

The research describes the implementation of a novel and stand-alone system for dynamic hazard warning. The system uses all existing infrastructure already in place like mobile networks, a laptop/PC and the small installation software. The geospatial dataset are the maps of a region which are again frugal. Hence there is no need to invest and it reaches everyone with a mobile. A novel architecture of hazard assessment and warning introduced where major technologies in ICT interfaced to give a unique WebGIS based dynamic real time geohazard warning communication system. A never before architecture introduced for integrating WebGIS with telecommunication technology. Existing technologies interfaced in a novel architectural design to address a neglected domain in a way never done before – through dynamically updatable WebGIS based warning communication. The work publishes new architecture and novelty in addressing hazard warning techniques in sustainable way and user friendly manner. Coupling of hazard zonation and hazard warning procedures into a single system has been shown. Generalized architecture for deciphering a range of geo-hazards has been developed. Hence the developmental work presented here can be summarized as the development of internet-SMS based automated geo-hazard warning communication system; integrating a warning communication system with a hazard evaluation system; interfacing different open-source technologies towards design and development of a warning system; modularization of different technologies towards development of a warning communication system; automated data creation, transformation and dissemination over different interfaces. The architecture of the developed warning system has been functionally automated as well as generalized enough that can be used for any hazard and setup requirement has been kept to a minimum.

Keywords: Geospatial, web-based GIS, geohazard, warning system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1748
35 Identification of Spam Keywords Using Hierarchical Category in C2C E-commerce

Authors: Shao Bo Cheng, Yong-Jin Han, Se Young Park, Seong-Bae Park

Abstract:

Consumer-to-Consumer (C2C) E-commerce has been growing at a very high speed in recent years. Since identical or nearly-same kinds of products compete one another by relying on keyword search in C2C E-commerce, some sellers describe their products with spam keywords that are popular but are not related to their products. Though such products get more chances to be retrieved and selected by consumers than those without spam keywords, the spam keywords mislead the consumers and waste their time. This problem has been reported in many commercial services like ebay and taobao, but there have been little research to solve this problem. As a solution to this problem, this paper proposes a method to classify whether keywords of a product are spam or not. The proposed method assumes that a keyword for a given product is more reliable if the keyword is observed commonly in specifications of products which are the same or the same kind as the given product. This is because that a hierarchical category of a product in general determined precisely by a seller of the product and so is the specification of the product. Since higher layers of the hierarchical category represent more general kinds of products, a reliable degree is differently determined according to the layers. Hence, reliable degrees from different layers of a hierarchical category become features for keywords and they are used together with features only from specifications for classification of the keywords. Support Vector Machines are adopted as a basic classifier using the features, since it is powerful, and widely used in many classification tasks. In the experiments, the proposed method is evaluated with a golden standard dataset from Yi-han-wang, a Chinese C2C E-commerce, and is compared with a baseline method that does not consider the hierarchical category. The experimental results show that the proposed method outperforms the baseline in F1-measure, which proves that spam keywords are effectively identified by a hierarchical category in C2C E-commerce.

Keywords: Spam Keyword, E-commerce, keyword features, spam filtering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2460
34 Reducing CO2 Emission Using EDA and Weighted Sum Model in Smart Parking System

Authors: Rahman Ali, Muhammad Sajjad, Farkhund Iqbal, Muhammad Sadiq Hassan Zada, Mohammed Hussain

Abstract:

Emission of Carbon Dioxide (CO2) has adversely affected the environment. One of the major sources of CO2 emission is transportation. In the last few decades, the increase in mobility of people using vehicles has enormously increased the emission of CO2 in the environment. To reduce CO2 emission, sustainable transportation system is required in which smart parking is one of the important measures that need to be established. To contribute to the issue of reducing the amount of CO2 emission, this research proposes a smart parking system. A cloud-based solution is provided to the drivers which automatically searches and recommends the most preferred parking slots. To determine preferences of the parking areas, this methodology exploits a number of unique parking features which ultimately results in the selection of a parking that leads to minimum level of CO2 emission from the current position of the vehicle. To realize the methodology, a scenario-based implementation is considered. During the implementation, a mobile application with GPS signals, vehicles with a number of vehicle features and a list of parking areas with parking features are used by sorting, multi-level filtering, exploratory data analysis (EDA, Analytical Hierarchy Process (AHP)) and weighted sum model (WSM) to rank the parking areas and recommend the drivers with top-k most preferred parking areas. In the EDA process, “2020testcar-2020-03-03”, a freely available dataset is used to estimate CO2 emission of a particular vehicle. To evaluate the system, results of the proposed system are compared with the conventional approach, which reveal that the proposed methodology supersedes the conventional one in reducing the emission of CO2 into the atmosphere.

Keywords: CO2 emission, IoT, EDA, Weighted Sum Model, WSM, regression, smart parking system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 678
33 Development of Energy Benchmarks Using Mandatory Energy and Emissions Reporting Data: Ontario Post-Secondary Residences

Authors: C. Xavier Mendieta, J. J McArthur

Abstract:

Governments are playing an increasingly active role in reducing carbon emissions, and a key strategy has been the introduction of mandatory energy disclosure policies. These policies have resulted in a significant amount of publicly available data, providing researchers with a unique opportunity to develop location-specific energy and carbon emission benchmarks from this data set, which can then be used to develop building archetypes and used to inform urban energy models. This study presents the development of such a benchmark using the public reporting data. The data from Ontario’s Ministry of Energy for Post-Secondary Educational Institutions are being used to develop a series of building archetype dynamic building loads and energy benchmarks to fill a gap in the currently available building database. This paper presents the development of a benchmark for college and university residences within ASHRAE climate zone 6 areas in Ontario using the mandatory disclosure energy and greenhouse gas emissions data. The methodology presented includes data cleaning, statistical analysis, and benchmark development, and lessons learned from this investigation are presented and discussed to inform the development of future energy benchmarks from this larger data set. The key findings from this initial benchmarking study are: (1) the importance of careful data screening and outlier identification to develop a valid dataset; (2) the key features used to develop a model of the data are building age, size, and occupancy schedules and these can be used to estimate energy consumption; and (3) policy changes affecting the primary energy generation significantly affected greenhouse gas emissions, and consideration of these factors was critical to evaluate the validity of the reported data.

Keywords: Building archetypes, data analysis, energy benchmarks, GHG emissions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 974
32 Customer Churn Prediction Using Four Machine Learning Algorithms Integrating Feature Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial part of maintaining a customer-oriented business in the telecommunications industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years, which has made it more important to understand customers’ needs in this strong market. For those who are looking to turn over their service providers, understanding their needs is especially important. Predictive churn is now a mandatory requirement for retaining customers in the telecommunications industry. Machine learning can be used to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: Machine Learning, Gradient Boosting, Logistic Regression, Churn, Random Forest, Decision Tree, ROC, AUC, F1-score.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 332
31 Review of the Road Crash Data Availability in Iraq

Authors: Abeer K. Jameel, Harry Evdorides

Abstract:

Iraq is a middle income country where the road safety issue is considered one of the leading causes of deaths. To control the road risk issue, the Iraqi Ministry of Planning, General Statistical Organization started to organise a collection system of traffic accidents data with details related to their causes and severity. These data are published as an annual report. In this paper, a review of the available crash data in Iraq will be presented. The available data represent the rate of accidents in aggregated level and classified according to their types, road users’ details, and crash severity, type of vehicles, causes and number of causalities. The review is according to the types of models used in road safety studies and research, and according to the required road safety data in the road constructions tasks. The available data are also compared with the road safety dataset published in the United Kingdom as an example of developed country. It is concluded that the data in Iraq are suitable for descriptive and exploratory models, aggregated level comparison analysis, and evaluation and monitoring the progress of the overall traffic safety performance. However, important traffic safety studies require disaggregated level of data and details related to the factors of the likelihood of traffic crashes. Some studies require spatial geographic details such as the location of the accidents which is essential in ranking the roads according to their level of safety, and name the most dangerous roads in Iraq which requires tactic plan to control this issue. Global Road safety agencies interested in solve this problem in low and middle-income countries have designed road safety assessment methodologies which are basing on the road attributes data only. Therefore, in this research it is recommended to use one of these methodologies.

Keywords: Data availability, Iraq, road safety.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 859
30 Development of an Ensemble Classification Model Based on Hybrid Filter-Wrapper Feature Selection for Email Phishing Detection

Authors: R. B. Ibrahim, M. S. Argungu, I. M. Mungadi

Abstract:

It is obvious in this present time, internet has become an indispensable part of human life since its inception. The Internet has provided diverse opportunities to make life so easy for human beings, through the adoption of various channels. Among these channels are email, internet banking, video conferencing, and the like. Email is one of the easiest means of communication hugely accepted among individuals and organizations globally. But over decades the security integrity of this platform has been challenged with malicious activities like Phishing. Email phishing is designed by phishers to fool the recipient into handing over sensitive personal information such as passwords, credit card numbers, account credentials, social security numbers, etc. This activity has caused a lot of financial damage to email users globally which has resulted in bankruptcy, sudden death of victims, and other health-related sicknesses. Although many methods have been proposed to detect email phishing, in this research, the results of multiple machine-learning methods for predicting email phishing have been compared with the use of filter-wrapper feature selection. It is worth noting that all three models performed substantially but one outperformed the other. The dataset used for these models is obtained from Kaggle online data repository, while three classifiers: decision tree, Naïve Bayes, and Logistic regression are ensemble (Bagging) respectively. Results from the study show that the Decision Tree (CART) bagging ensemble recorded the highest accuracy of 98.13% using PEF (Phishing Essential Features). This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, hybrid, filter-wrapper, phishing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 114
29 Grassland Phenology in Different Eco-Geographic Regions over the Tibetan Plateau

Authors: Jiahua Zhang, Qing Chang, Fengmei Yao

Abstract:

Studying on the response of vegetation phenology to climate change at different temporal and spatial scales is important for understanding and predicting future terrestrial ecosystem dynamics and the adaptation of ecosystems to global change. In this study, the Moderate Resolution Imaging Spectroradiometer (MODIS) Normalized Difference Vegetation Index (NDVI) dataset and climate data were used to analyze the dynamics of grassland phenology as well as their correlation with climatic factors in different eco-geographic regions and elevation units across the Tibetan Plateau. The results showed that during 2003–2012, the start of the grassland greening season (SOS) appeared later while the end of the growing season (EOS) appeared earlier following the plateau’s precipitation and heat gradients from southeast to northwest. The multi-year mean value of SOS showed differences between various eco-geographic regions and was significantly impacted by average elevation and regional average precipitation during spring. Regional mean differences for EOS were mainly regulated by mean temperature during autumn. Changes in trends of SOS in the central and eastern eco-geographic regions were coupled to the mean temperature during spring, advancing by about 7d/°C. However, in the two southwestern eco-geographic regions, SOS was delayed significantly due to the impact of spring precipitation. The results also showed that the SOS occurred later with increasing elevation, as expected, with a delay rate of 0.66 d/100m. For 2003–2012, SOS showed an advancing trend in low-elevation areas, but a delayed trend in high-elevation areas, while EOS was delayed in low-elevation areas, but advanced in high-elevation areas. Grassland SOS and EOS changes may be influenced by a variety of other environmental factors in each eco-geographic region.

Keywords: Grassland, phenology, MODIS, eco-geographic regions, elevation, climatic factors, Tibetan Plateau.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2781