Search results for: encrypted traffic classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3320

Search results for: encrypted traffic classification

2120 Fatigue Life Estimation of Tubular Joints - A Comparative Study

Authors: Jeron Maheswaran, Sudath C. Siriwardane

Abstract:

In fatigue analysis, the structural detail of tubular joint has taken great attention among engineers. The DNV-RP-C203 is covering this topic quite well for simple and clear joint cases. For complex joint and geometry, where joint classification isn’t available and limitation on validity range of non-dimensional geometric parameters, the challenges become a fact among engineers. The classification of joint is important to carry out through the fatigue analysis. These joint configurations are identified by the connectivity and the load distribution of tubular joints. To overcome these problems to some extent, this paper compare the fatigue life of tubular joints in offshore jacket according to the stress concentration factors (SCF) in DNV-RP-C203 and finite element method employed Abaqus/CAE. The paper presents the geometric details, material properties and considered load history of the jacket structure. Describe the global structural analysis and identification of critical tubular joints for fatigue life estimation. Hence fatigue life is determined based on the guidelines provided by design codes. Fatigue analysis of tubular joints is conducted using finite element employed Abaqus/CAE [4] as next major step. Finally, obtained SCFs and fatigue lives are compared and their significances are discussed.

Keywords: fatigue life, stress-concentration factor, finite element analysis, offshore jacket structure

Procedia PDF Downloads 448
2119 Investigating the Causes of Human Error-Induced Incidents in the Maintenance Operations of Petrochemical Industry by Using Human Factors Analysis and Classification System

Authors: Omid Kalatpour, Mohammadreza Ajdari

Abstract:

This article studied the possible causes of human error-induced incidents in the petrochemical industry maintenance activities by using Human Factors Analysis and Classification System (HFACS). The purpose of the study was anticipating and identifying these causes and proposing corrective and preventive actions. Maintenance department in a petrochemical company was selected for research. A checklist of human error-induced incidents was developed based on four HFACS main levels and nineteen sub-groups. Hierarchical task analysis (HTA) technique was used to identify maintenance activities and tasks. The main causes of possible incidents were identified by checklist and recorded. Corrective and preventive actions were defined depending on priority. Analyzing the worksheets of 444 activities in four levels of HFACS showed 37.6% of the causes were at the level of unsafe actions, 27.5% at the level of unsafe supervision, 20.9% at the level of preconditions for unsafe acts and 14% of the causes were at the level of organizational effects. The HFACS sub-groups showed errors (24.36%) inadequate supervision (14.89%) and violations (13.26%) with the most frequency. According to findings of this study, increasing the training effectiveness of operators and supervision improvement respectively are the most important measures in decreasing the human error-induced incidents in petrochemical industry maintenance.

Keywords: human error, petrochemical industry, maintenance, HFACS

Procedia PDF Downloads 235
2118 Acoustical Comfort in Major Highway in Birnin Kebbi, Kebbi State-Nigeria

Authors: Muhammad Naziru Yahaya, Mustapha Bashir Ayinde

Abstract:

Noise has been recognized as a major source of pollution in many urban and semi-urban settlements. Noise pollution causes by vehicular movement in urban cities has reaches an alarming proportion due to continuous increases in vehicles and industrialization. This research aim to determine the geo-physical characteristics of the study area and to determine the level of noise generation and volume intensity in areas where noise levels are high within the metropolis and compare with NESREA and WHO standards. This study identified the various sources of noise, compared noise levels in various parts of the study area with recommended standards and determined the geo-physical characteristic of noise generated. A sound level meter Gm 1352, was used for the noise measurements. The study showed that the noise pollution levels measured in minimum noise level of 63.75 dBA and average maximum of 95.175 dBA, at some locations in Birnin Kebbi metropolis the noise level have exceeded the standard limits set by the World Health Organization (WHO), Federal Environment Protection Agency (FEPA). Results revealed that there was a considerable increase in noise pollution in First Bank roundabout and Haliru Abdu roundabout, attribute to high numbers of vehicular movement and road congestion within Birnin Kebbi. The study therefore concluded that there should be an enforcement and adherence to the regulation regarding noise pollution limit. The minimum average day noise level recorded was 67.225 dBA, and average maximum of 96.6 dBA is an indication that the noise level of Birnin Kebbi metropolis was highly unsatisfactory. Based on this, it is suggested that taking adequate measures and following the laid-down recommendations will reduce traffic noise to the barest minimum.

Keywords: decibel, noise level, pollution, sound level, traffic, highway

Procedia PDF Downloads 74
2117 The Prediction of Reflection Noise and Its Reduction by Shaped Noise Barriers

Authors: I. L. Kim, J. Y. Lee, A. K. Tekile

Abstract:

In consequence of the very high urbanization rate of Korea, the number of traffic noise damages in areas congested with population and facilities is steadily increasing. The current environmental noise levels data in major cities of the country show that the noise levels exceed the standards set for both day and night times. This research was about comparative analysis in search for optimal soundproof panel shape and design factor that can minimize sound reflection noise. In addition to the normal flat-type panel shape, the reflection noise reduction of swelling-type, combined swelling and curved-type, and screen-type were evaluated. The noise source model Nord 2000, which often provides abundant information compared to models for the similar purpose, was used in the study to determine the overall noise level. Based on vehicle categorization in Korea, the noise levels for varying frequency from different heights of the sound source (directivity heights of Harmonize model) have been calculated for simulation. Each simulation has been made using the ray-tracing method. The noise level has also been calculated using the noise prediction program called SoundPlan 7.2, for comparison. The noise level prediction was made at 15m (R1), 30 m (R2) and at middle of the road, 2m (R3) receiving the point. By designing the noise barriers by shape and running the prediction program by inserting the noise source on the 2nd lane to the noise barrier side, among the 6 lanes considered, the reflection noise slightly decreased or increased in all noise barriers. At R1, especially in the cases of the screen-type noise barriers, there was no reduction effect predicted in all conditions. However, the swelling-type showed a decrease of 0.7~1.2 dB at R1, performing the best reduction effect among the tested noise barriers. Compared to other forms of noise barriers, the swelling-type was thought to be the most suitable for reducing the reflection noise; however, since a slight increase was predicted at R2, further research based on a more sophisticated categorization of related design factors is necessary. Moreover, as swellings are difficult to produce and the size of the modules are smaller than other panels, it is challenging to install swelling-type noise barriers. If these problems are solved, its applicable region will not be limited to other types of noise barriers. Hence, when a swelling-type noise barrier is installed at a downtown region where the amount of traffic is increasing every day, it will both secure visibility through the transparent walls and diminish any noise pollution due to the reflection. Moreover, when decorated with shapes and design, noise barriers will achieve a visual attraction than a flat-type one and thus will alleviate any psychological hardships related to noise, other than the unique physical soundproofing functions of the soundproof panels.

Keywords: reflection noise, shaped noise barriers, sound proof panel, traffic noise

Procedia PDF Downloads 504
2116 Improving Cheon-Kim-Kim-Song (CKKS) Performance with Vector Computation and GPU Acceleration

Authors: Smaran Manchala

Abstract:

Homomorphic Encryption (HE) enables computations on encrypted data without requiring decryption, mitigating data vulnerability during processing. Usable Fully Homomorphic Encryption (FHE) could revolutionize secure data operations across cloud computing, AI training, and healthcare, providing both privacy and functionality, however, the computational inefficiency of schemes like Cheon-Kim-Kim-Song (CKKS) hinders their widespread practical use. This study focuses on optimizing CKKS for faster matrix operations through the implementation of vector computation parallelization and GPU acceleration. The variable effects of vector parallelization on GPUs were explored, recognizing that while parallelization typically accelerates operations, it could introduce overhead that results in slower runtimes, especially in smaller, less computationally demanding operations. To assess performance, two neural network models, MLPN and CNN—were tested on the MNIST dataset using both ARM and x86-64 architectures, with CNN chosen for its higher computational demands. Each test was repeated 1,000 times, and outliers were removed via Z-score analysis to measure the effect of vector parallelization on CKKS performance. Model accuracy was also evaluated under CKKS encryption to ensure optimizations did not compromise results. According to the results of the trail runs, applying vector parallelization had a 2.63X efficiency increase overall with a 1.83X performance increase for x86-64 over ARM architecture. Overall, these results suggest that the application of vector parallelization in tandem with GPU acceleration significantly improves the efficiency of CKKS even while accounting for vector parallelization overhead, providing impact in future zero trust operations.

Keywords: CKKS scheme, runtime efficiency, fully homomorphic encryption (FHE), GPU acceleration, vector parallelization

Procedia PDF Downloads 3
2115 Housing Prices and Travel Costs: Insights from Origin-Destination Demand Estimation in Taiwan’s Science Parks

Authors: Kai-Wei Ji, Dung-Ying Lin

Abstract:

This study investigates the impact of transportation on housing prices in regions surrounding Taiwan's science parks. As these parks evolve into crucial economic and population growth centers, they attract an increasing number of residents and workers, significantly influencing local housing markets. This demographic shift raises important questions about the role of transportation in shaping real estate values. Our research examines four major science parks in Taiwan, providing a comparative analysis of how transportation conditions and population dynamics interact to affect housing price premiums. We employ an origin-destination (OD) matrix derived from pervasive traffic data to model travel patterns and their effects on real estate values. The methodology utilizes a bi-level framework: a genetic algorithm optimizes OD demand estimation at the upper level, while a user equilibrium (UE) model simulates traffic flow at the lower level. This approach enables a nuanced exploration of how population growth impacts transportation conditions and housing price premiums. By analyzing the interplay between travel costs based on OD demand estimation and housing prices, we offer valuable insights for urban planners and policymakers. These findings are crucial for informed decision-making in rapidly developing areas, where understanding the relationship between mobility and real estate values is essential for sustainable urban development.

Keywords: demand estimation, genetic algorithm, housing price, transportation

Procedia PDF Downloads 12
2114 DeepNIC a Method to Transform Each Tabular Variable into an Independant Image Analyzable by Basic CNNs

Authors: Nguyen J. M., Lucas G., Ruan S., Digonnet H., Antonioli D.

Abstract:

Introduction: Deep Learning (DL) is a very powerful tool for analyzing image data. But for tabular data, it cannot compete with machine learning methods like XGBoost. The research question becomes: can tabular data be transformed into images that can be analyzed by simple CNNs (Convolutional Neuron Networks)? Will DL be the absolute tool for data classification? All current solutions consist in repositioning the variables in a 2x2 matrix using their correlation proximity. In doing so, it obtains an image whose pixels are the variables. We implement a technology, DeepNIC, that offers the possibility of obtaining an image for each variable, which can be analyzed by simple CNNs. Material and method: The 'ROP' (Regression OPtimized) model is a binary and atypical decision tree whose nodes are managed by a new artificial neuron, the Neurop. By positioning an artificial neuron in each node of the decision trees, it is possible to make an adjustment on a theoretically infinite number of variables at each node. From this new decision tree whose nodes are artificial neurons, we created the concept of a 'Random Forest of Perfect Trees' (RFPT), which disobeys Breiman's concepts by assembling very large numbers of small trees with no classification errors. From the results of the RFPT, we developed a family of 10 statistical information criteria, Nguyen Information Criterion (NICs), which evaluates in 3 dimensions the predictive quality of a variable: Performance, Complexity and Multiplicity of solution. A NIC is a probability that can be transformed into a grey level. The value of a NIC depends essentially on 2 super parameters used in Neurops. By varying these 2 super parameters, we obtain a 2x2 matrix of probabilities for each NIC. We can combine these 10 NICs with the functions AND, OR, and XOR. The total number of combinations is greater than 100,000. In total, we obtain for each variable an image of at least 1166x1167 pixels. The intensity of the pixels is proportional to the probability of the associated NIC. The color depends on the associated NIC. This image actually contains considerable information about the ability of the variable to make the prediction of Y, depending on the presence or absence of other variables. A basic CNNs model was trained for supervised classification. Results: The first results are impressive. Using the GSE22513 public data (Omic data set of markers of Taxane Sensitivity in Breast Cancer), DEEPNic outperformed other statistical methods, including XGBoost. We still need to generalize the comparison on several databases. Conclusion: The ability to transform any tabular variable into an image offers the possibility of merging image and tabular information in the same format. This opens up great perspectives in the analysis of metadata.

Keywords: tabular data, CNNs, NICs, DeepNICs, random forest of perfect trees, classification

Procedia PDF Downloads 119
2113 A Mechanical Diagnosis Method Based on Vibration Fault Signal down-Sampling and the Improved One-Dimensional Convolutional Neural Network

Authors: Bowei Yuan, Shi Li, Liuyang Song, Huaqing Wang, Lingli Cui

Abstract:

Convolutional neural networks (CNN) have received extensive attention in the field of fault diagnosis. Many fault diagnosis methods use CNN for fault type identification. However, when the amount of raw data collected by sensors is massive, the neural network needs to perform a time-consuming classification task. In this paper, a mechanical fault diagnosis method based on vibration signal down-sampling and the improved one-dimensional convolutional neural network is proposed. Through the robust principal component analysis, the low-rank feature matrix of a large amount of raw data can be separated, and then down-sampling is realized to reduce the subsequent calculation amount. In the improved one-dimensional CNN, a smaller convolution kernel is used to reduce the number of parameters and computational complexity, and regularization is introduced before the fully connected layer to prevent overfitting. In addition, the multi-connected layers can better generalize classification results without cumbersome parameter adjustments. The effectiveness of the method is verified by monitoring the signal of the centrifugal pump test bench, and the average test accuracy is above 98%. When compared with the traditional deep belief network (DBN) and support vector machine (SVM) methods, this method has better performance.

Keywords: fault diagnosis, vibration signal down-sampling, 1D-CNN

Procedia PDF Downloads 127
2112 Quantitative Structure–Activity Relationship Analysis of Some Benzimidazole Derivatives by Linear Multivariate Method

Authors: Strahinja Z. Kovačević, Lidija R. Jevrić, Sanja O. Podunavac Kuzmanović

Abstract:

The relationship between antibacterial activity of eighteen different substituted benzimidazole derivatives and their molecular characteristics was studied using chemometric QSAR (Quantitative Structure–Activity Relationships) approach. QSAR analysis has been carried out on inhibitory activity towards Staphylococcus aureus, by using molecular descriptors, as well as minimal inhibitory activity (MIC). Molecular descriptors were calculated from the optimized structures. Principal component analysis (PCA) followed by hierarchical cluster analysis (HCA) and multiple linear regression (MLR) was performed in order to select molecular descriptors that best describe the antibacterial behavior of the compounds investigated, and to determine the similarities between molecules. The HCA grouped the molecules in separated clusters which have the similar inhibitory activity. PCA showed very similar classification of molecules as the HCA, and displayed which descriptors contribute to that classification. MLR equations, that represent MIC as a function of the in silico molecular descriptors were established. The statistical significance of the estimated models was confirmed by standard statistical measures and cross-validation parameters (SD = 0.0816, F = 46.27, R = 0.9791, R2CV = 0.8266, R2adj = 0.9379, PRESS = 0.1116). These parameters indicate the possibility of application of the established chemometric models in prediction of the antibacterial behaviour of studied derivatives and structurally very similar compounds.

Keywords: antibacterial, benzimidazole, molecular descriptors, QSAR

Procedia PDF Downloads 360
2111 Trafficking of Women in Assam: The Untold Violation of Women's Human Rights

Authors: Mridula Devi

Abstract:

Trafficking of women is a slur on human dignity and a shameful act to human civilization and development. Trafficking of women is one of worst brazen abuses which violate the women’s human rights. In India, more particularly in Assam, human trafficking and infringement of human rights of individual includes mainly the women and girl child of the State. Trafficking in North East region of India, more particularly in Assam occurs in two different ways – one is the internal trafficking of women and girl child from conflict affected rural areas of Assam for domestic work and prostitution. Secondly, there is trafficking of women to other south-East Asiatic countries like Bangladesh, Bhutan, Bangkok, Myanmar (Burma) for various purposes such as drug trafficking, labor, bar girl and prostitution.Historically, trafficking in human beings is associated with slavery and bonded or forced labor. Since the period of Roman Civilization, there was the practice of traffic in persons in the form of slave trade among the nations. With the rise of new imperialism, slavery had become an integral part of the colonial system of European Countries. With time, it almost became synonymous with prostitution or commercial sexual exploitation. Finally, the United Nation adopted the Convention for the Suppression of the Traffic in Persons and of the Prostitution of others, 1949 by the G.A.Res.No.-317(iv). The Convention totally denounces the traffic in persons for the purpose of prostitution. However, it is important to note that, now a days trafficking is not confined to commercial sexual exploitation of women and children alone. It has myriad forms and the number of victims has been steadily on the rise over the past few decades. In Assam, it takes place through and for marriage, sexual exploitation, begging, organ trading, militancy conflicts, drug padding and smuggling, labour, adoption, entertainment, and sports. In this paper, empirical methodology has been used. The study is based on primary and secondary sources. Data’s are collected from different books, publications, newspaper, journals etc. For empirical analysis, some random samples are collected and systematized for better result. India suffers from the ignominy of being one of the biggest hubs of women trafficking in the world. Over the years, Assam: the north east part of India has been bearing the brunt of the rapidly rising evil of trafficking of women which threaten the life, dignity and human rights of women. Though different laws are adopted at international and national level to restore trafficking, still the menace of trafficking of women in Assam is not decreased, rather it increased. This causes a serious violation of women’s human right in Assam. Human trafficking or women’s trafficking is a serious crime against society. To curb this in Assam it is required to take some effective and dedicated measure at state level as well as national and international level.

Keywords: Assam, human trafficking, sexual exploitation, India

Procedia PDF Downloads 512
2110 Monitor Vehicle Speed Using Internet of Things Based Wireless Sensor Network System

Authors: Akber Oumer Abdurezak

Abstract:

Road traffic accident is a major problem in Ethiopia, resulting in the deaths of many people and potential injuries and crash every year and loss of properties. According to the Federal Transport Authority, one of the main causes of traffic accident and crash in Ethiopia is over speeding. Implementation of different technologies is used to monitor the speed of vehicles in order to minimize accidents and crashes. This research aimed at designing a speed monitoring system to monitor the speed of travelling vehicles and movements, reporting illegal speeds or overspeeding vehicles to the concerned bodies. The implementation of the system is through a wireless sensor network. The proposed system can sense and detect the movement of vehicles, process, and analysis the data obtained from the sensor and the cloud system. The data is sent to the central controlling server. The system contains accelerometer and gyroscope sensors to sense and collect the data of the vehicle. Arduino to process the data and Global System for Mobile Communication (GSM) module for communication purposes to send the data to the concerned body. When the speed of the vehicle exceeds the allowable speed limit, the system sends a message to database as “over speeding”. Both accelerometer and gyroscope sensors are used to collect acceleration data. The acceleration data then convert to speed, and the corresponding speed is checked with the speed limit, and those above the speed limit are reported to the concerned authorities to avoid frequent accidents. The proposed system decreases the occurrence of accidents and crashes due to overspeeding and can be used as an eye opener for the implementation of other intelligent transport system technologies. This system can also integrate with other technologies like GPS and Google Maps to obtain better output.

Keywords: accelerometer, IOT, GSM, gyroscope

Procedia PDF Downloads 74
2109 Feature Selection Approach for the Classification of Hydraulic Leakages in Hydraulic Final Inspection using Machine Learning

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

Manufacturing companies are facing global competition and enormous cost pressure. The use of machine learning applications can help reduce production costs and create added value. Predictive quality enables the securing of product quality through data-supported predictions using machine learning models as a basis for decisions on test results. Furthermore, machine learning methods are able to process large amounts of data, deal with unfavourable row-column ratios and detect dependencies between the covariates and the given target as well as assess the multidimensional influence of all input variables on the target. Real production data are often subject to highly fluctuating boundary conditions and unbalanced data sets. Changes in production data manifest themselves in trends, systematic shifts, and seasonal effects. Thus, Machine learning applications require intensive pre-processing and feature selection. Data preprocessing includes rule-based data cleaning, the application of dimensionality reduction techniques, and the identification of comparable data subsets. Within the used real data set of Bosch hydraulic valves, the comparability of the same production conditions in the production of hydraulic valves within certain time periods can be identified by applying the concept drift method. Furthermore, a classification model is developed to evaluate the feature importance in different subsets within the identified time periods. By selecting comparable and stable features, the number of features used can be significantly reduced without a strong decrease in predictive power. The use of cross-process production data along the value chain of hydraulic valves is a promising approach to predict the quality characteristics of workpieces. In this research, the ada boosting classifier is used to predict the leakage of hydraulic valves based on geometric gauge blocks from machining, mating data from the assembly, and hydraulic measurement data from end-of-line testing. In addition, the most suitable methods are selected and accurate quality predictions are achieved.

Keywords: classification, achine learning, predictive quality, feature selection

Procedia PDF Downloads 158
2108 On Board Measurement of Real Exhaust Emission of Light-Duty Vehicles in Algeria

Authors: R. Kerbachi, S. Chikhi, M. Boughedaoui

Abstract:

The study presents an analysis of the Algerian vehicle fleet and resultant emissions. The emission measurement of air pollutants emitted by road transportation (CO, THC, NOX and CO2) was conducted on 17 light duty vehicles in real traffic. This sample is representative of the Algerian light vehicles in terms of fuel quality (gasoline, diesel and liquefied petroleum gas) and the technology quality (injection system and emission control). The experimental measurement methodology of unit emission of vehicles in real traffic situation is based on the use of the mini-Constant Volume Sampler for gas sampling and a set of gas analyzers for CO2, CO, NOx and THC, with an instrumentation to measure kinematics, gas temperature and pressure. The apparatus is also equipped with data logging instrument and data transfer. The results were compared with the database of the European light vehicles (Artemis). It was shown that the technological injection liquefied petroleum gas (LPG) has significant impact on air pollutants emission. Therefore, with the exception of nitrogen oxide compounds, uncatalyzed LPG vehicles are more effective in reducing emissions unit of air pollutants compared to uncatalyzed gasoline vehicles. LPG performance seems to be lower under real driving conditions than expected on chassis dynamometer. On the other hand, the results show that uncatalyzed gasoline vehicles emit high levels of carbon monoxide, and nitrogen oxides. Overall, and in the absence of standards in Algeria, unit emissions are much higher than Euro 3. The enforcement of pollutant emission standard in developing countries is an important step towards introducing cleaner technology and reducing vehicular emissions.

Keywords: on-board measurements of unit emissions of CO, HC, NOx and CO2, light vehicles, mini-CVS, LPG-fuel, artemis, Algeria

Procedia PDF Downloads 273
2107 Roughness Discrimination Using Bioinspired Tactile Sensors

Authors: Zhengkun Yi

Abstract:

Surface texture discrimination using artificial tactile sensors has attracted increasing attentions in the past decade as it can endow technical and robot systems with a key missing ability. However, as a major component of texture, roughness has rarely been explored. This paper presents an approach for tactile surface roughness discrimination, which includes two parts: (1) design and fabrication of a bioinspired artificial fingertip, and (2) tactile signal processing for tactile surface roughness discrimination. The bioinspired fingertip is comprised of two polydimethylsiloxane (PDMS) layers, a polymethyl methacrylate (PMMA) bar, and two perpendicular polyvinylidene difluoride (PVDF) film sensors. This artificial fingertip mimics human fingertips in three aspects: (1) Elastic properties of epidermis and dermis in human skin are replicated by the two PDMS layers with different stiffness, (2) The PMMA bar serves the role analogous to that of a bone, and (3) PVDF film sensors emulate Meissner’s corpuscles in terms of both location and response to the vibratory stimuli. Various extracted features and classification algorithms including support vector machines (SVM) and k-nearest neighbors (kNN) are examined for tactile surface roughness discrimination. Eight standard rough surfaces with roughness values (Ra) of 50 μm, 25 μm, 12.5 μm, 6.3 μm 3.2 μm, 1.6 μm, 0.8 μm, and 0.4 μm are explored. The highest classification accuracy of (82.6 ± 10.8) % can be achieved using solely one PVDF film sensor with kNN (k = 9) classifier and the standard deviation feature.

Keywords: bioinspired fingertip, classifier, feature extraction, roughness discrimination

Procedia PDF Downloads 305
2106 A Business-to-Business Collaboration System That Promotes Data Utilization While Encrypting Information on the Blockchain

Authors: Hiroaki Nasu, Ryota Miyamoto, Yuta Kodera, Yasuyuki Nogami

Abstract:

To promote Industry 4.0 and Society 5.0 and so on, it is important to connect and share data so that every member can trust it. Blockchain (BC) technology is currently attracting attention as the most advanced tool and has been used in the financial field and so on. However, the data collaboration using BC has not progressed sufficiently among companies on the supply chain of manufacturing industry that handle sensitive data such as product quality, manufacturing conditions, etc. There are two main reasons why data utilization is not sufficiently advanced in the industrial supply chain. The first reason is that manufacturing information is top secret and a source for companies to generate profits. It is difficult to disclose data even between companies with transactions in the supply chain. In the blockchain mechanism such as Bitcoin using PKI (Public Key Infrastructure), in order to confirm the identity of the company that has sent the data, the plaintext must be shared between the companies. Another reason is that the merits (scenarios) of collaboration data between companies are not specifically specified in the industrial supply chain. For these problems this paper proposes a Business to Business (B2B) collaboration system using homomorphic encryption and BC technique. Using the proposed system, each company on the supply chain can exchange confidential information on encrypted data and utilize the data for their own business. In addition, this paper considers a scenario focusing on quality data, which was difficult to collaborate because it is a top secret. In this scenario, we show a implementation scheme and a benefit of concrete data collaboration by proposing a comparison protocol that can grasp the change in quality while hiding the numerical value of quality data.

Keywords: business to business data collaboration, industrial supply chain, blockchain, homomorphic encryption

Procedia PDF Downloads 132
2105 Preliminary Evaluation of Decommissioning Wastes for the First Commercial Nuclear Power Reactor in South Korea

Authors: Kyomin Lee, Joohee Kim, Sangho Kang

Abstract:

The commercial nuclear power reactor in South Korea, Kori Unit 1, which was a 587 MWe pressurized water reactor that started operation since 1978, was permanently shut down in June 2017 without an additional operating license extension. The Kori 1 Unit is scheduled to become the nuclear power unit to enter the decommissioning phase. In this study, the preliminary evaluation of the decommissioning wastes for the Kori Unit 1 was performed based on the following series of process: firstly, the plant inventory is investigated based on various documents (i.e., equipment/ component list, construction records, general arrangement drawings). Secondly, the radiological conditions of systems, structures and components (SSCs) are established to estimate the amount of radioactive waste by waste classification. Third, the waste management strategies for Kori Unit 1 including waste packaging are established. Forth, selection of the proper decontamination and dismantling (D&D) technologies is made considering the various factors. Finally, the amount of decommissioning waste by classification for Kori 1 is estimated using the DeCAT program, which was developed by KEPCO-E&C for a decommissioning cost estimation. The preliminary evaluation results have shown that the expected amounts of decommissioning wastes were less than about 2% and 8% of the total wastes generated (i.e., sum of clean wastes and radwastes) before/after waste processing, respectively, and it was found that the majority of contaminated material was carbon or alloy steel and stainless steel. In addition, within the range of availability of information, the results of the evaluation were compared with the results from the various decommissioning experiences data or international/national decommissioning study. The comparison results have shown that the radioactive waste amount from Kori Unit 1 decommissioning were much less than those from the plants decommissioned in U.S. and were comparable to those from the plants in Europe. This result comes from the difference of disposal cost and clearance criteria (i.e., free release level) between U.S. and non-U.S. The preliminary evaluation performed using the methodology established in this study will be useful as a important information in establishing the decommissioning planning for the decommissioning schedule and waste management strategy establishment including the transportation, packaging, handling, and disposal of radioactive wastes.

Keywords: characterization, classification, decommissioning, decontamination and dismantling, Kori 1, radioactive waste

Procedia PDF Downloads 206
2104 Myanmar Character Recognition Using Eight Direction Chain Code Frequency Features

Authors: Kyi Pyar Zaw, Zin Mar Kyu

Abstract:

Character recognition is the process of converting a text image file into editable and searchable text file. Feature Extraction is the heart of any character recognition system. The character recognition rate may be low or high depending on the extracted features. In the proposed paper, 25 features for one character are used in character recognition. Basically, there are three steps of character recognition such as character segmentation, feature extraction and classification. In segmentation step, horizontal cropping method is used for line segmentation and vertical cropping method is used for character segmentation. In the Feature extraction step, features are extracted in two ways. The first way is that the 8 features are extracted from the entire input character using eight direction chain code frequency extraction. The second way is that the input character is divided into 16 blocks. For each block, although 8 feature values are obtained through eight-direction chain code frequency extraction method, we define the sum of these 8 feature values as a feature for one block. Therefore, 16 features are extracted from that 16 blocks in the second way. We use the number of holes feature to cluster the similar characters. We can recognize the almost Myanmar common characters with various font sizes by using these features. All these 25 features are used in both training part and testing part. In the classification step, the characters are classified by matching the all features of input character with already trained features of characters.

Keywords: chain code frequency, character recognition, feature extraction, features matching, segmentation

Procedia PDF Downloads 313
2103 Partial Least Square Regression for High-Dimentional and High-Correlated Data

Authors: Mohammed Abdullah Alshahrani

Abstract:

The research focuses on investigating the use of partial least squares (PLS) methodology for addressing challenges associated with high-dimensional correlated data. Recent technological advancements have led to experiments producing data characterized by a large number of variables compared to observations, with substantial inter-variable correlations. Such data patterns are common in chemometrics, where near-infrared (NIR) spectrometer calibrations record chemical absorbance levels across hundreds of wavelengths, and in genomics, where thousands of genomic regions' copy number alterations (CNA) are recorded from cancer patients. PLS serves as a widely used method for analyzing high-dimensional data, functioning as a regression tool in chemometrics and a classification method in genomics. It handles data complexity by creating latent variables (components) from original variables. However, applying PLS can present challenges. The study investigates key areas to address these challenges, including unifying interpretations across three main PLS algorithms and exploring unusual negative shrinkage factors encountered during model fitting. The research presents an alternative approach to addressing the interpretation challenge of predictor weights associated with PLS. Sparse estimation of predictor weights is employed using a penalty function combining a lasso penalty for sparsity and a Cauchy distribution-based penalty to account for variable dependencies. The results demonstrate sparse and grouped weight estimates, aiding interpretation and prediction tasks in genomic data analysis. High-dimensional data scenarios, where predictors outnumber observations, are common in regression analysis applications. Ordinary least squares regression (OLS), the standard method, performs inadequately with high-dimensional and highly correlated data. Copy number alterations (CNA) in key genes have been linked to disease phenotypes, highlighting the importance of accurate classification of gene expression data in bioinformatics and biology using regularized methods like PLS for regression and classification.

Keywords: partial least square regression, genetics data, negative filter factors, high dimensional data, high correlated data

Procedia PDF Downloads 47
2102 Classification of Multiple Cancer Types with Deep Convolutional Neural Network

Authors: Nan Deng, Zhenqiu Liu

Abstract:

Thousands of patients with metastatic tumors were diagnosed with cancers of unknown primary sites each year. The inability to identify the primary cancer site may lead to inappropriate treatment and unexpected prognosis. Nowadays, a large amount of genomics and transcriptomics cancer data has been generated by next-generation sequencing (NGS) technologies, and The Cancer Genome Atlas (TCGA) database has accrued thousands of human cancer tumors and healthy controls, which provides an abundance of resource to differentiate cancer types. Meanwhile, deep convolutional neural networks (CNNs) have shown high accuracy on classification among a large number of image object categories. Here, we utilize 25 cancer primary tumors and 3 normal tissues from TCGA and convert their RNA-Seq gene expression profiling to color images; train, validate and test a CNN classifier directly from these images. The performance result shows that our CNN classifier can archive >80% test accuracy on most of the tumors and normal tissues. Since the gene expression pattern of distant metastases is similar to their primary tumors, the CNN classifier may provide a potential computational strategy on identifying the unknown primary origin of metastatic cancer in order to plan appropriate treatment for patients.

Keywords: bioinformatics, cancer, convolutional neural network, deep leaning, gene expression pattern

Procedia PDF Downloads 298
2101 Public-Private Partnership for Critical Infrastructure Resilience

Authors: Anjula Negi, D. T. V. Raghu Ramaswamy, Rajneesh Sareen

Abstract:

Road infrastructure is emphatically one of the top most critical infrastructure to the Indian economy. Road network in the country of around 3.3 million km is the second largest in the world. Nationwide statistics released by Ministry of Road, Transport and Highways reveal that every minute an accident happens and one death every 3.7 minutes. This reported scale in terms of safety is a matter of grave concern, and economically represents a national loss of 3% to the GDP. Union Budget 2016-17 has allocated USD 12 billion annually for development and strengthening of roads, an increase of 56% from last year. Thus, highlighting the importance of roads as critical infrastructure. National highway alone represent only 1.7% of the total road linkages, however, carry over 40% of traffic. Further, trends analysed from 2002 -2011 on national highways, indicate that in less than a decade, a 22 % increase in accidents have been reported, but, 68% increase in death fatalities. Paramount inference is that accident severity has increased with time. Over these years many measures to increase road safety, lessening damage to physical assets, reducing vulnerabilities leading to a build-up for resilient road infrastructure have been taken. In the context of national highway development program, policy makers proposed implementation of around 20 % of such road length on PPP mode. These roads were taken up on high-density traffic considerations and for qualitative implementation. In order to understand resilience impacts and safety parameters, enshrined in various PPP concession agreements executed with the private sector partners, such highway specific projects would be appraised. This research paper would attempt to assess such safety measures taken and the possible reasons behind an increase in accident severity through these PPP case study projects. Delving further on safety features to understand policy measures adopted in these cases and an introspection on reasons of severity, whether an outcome of increased speeds, faulty road design and geometrics, driver negligence, or due to lack of discipline in following lane traffic with increased speed. Assessment exercise would study these aspects hitherto to PPP and post PPP project structures, based on literature review and opinion surveys with sectoral experts. On the way forward, it is understood that the Ministry of Road, Transport and Highway’s estimate for strengthening the national highway network is USD 77 billion within next five years. The outcome of this paper would provide an understanding of resilience measures adopted, possible options for accessible and safe road network and its expansion to policy makers for possible policy initiatives and funding allocation in securing critical infrastructure.

Keywords: national highways, policy, PPP, safety

Procedia PDF Downloads 256
2100 Semantic Differences between Bug Labeling of Different Repositories via Machine Learning

Authors: Pooja Khanal, Huaming Zhang

Abstract:

Labeling of issues/bugs, also known as bug classification, plays a vital role in software engineering. Some known labels/classes of bugs are 'User Interface', 'Security', and 'API'. Most of the time, when a reporter reports a bug, they try to assign some predefined label to it. Those issues are reported for a project, and each project is a repository in GitHub/GitLab, which contains multiple issues. There are many software project repositories -ranging from individual projects to commercial projects. The labels assigned for different repositories may be dependent on various factors like human instinct, generalization of labels, label assignment policy followed by the reporter, etc. While the reporter of the issue may instinctively give that issue a label, another person reporting the same issue may label it differently. This way, it is not known mathematically if a label in one repository is similar or different to the label in another repository. Hence, the primary goal of this research is to find the semantic differences between bug labeling of different repositories via machine learning. Independent optimal classifiers for individual repositories are built first using the text features from the reported issues. The optimal classifiers may include a combination of multiple classifiers stacked together. Then, those classifiers are used to cross-test other repositories which leads the result to be deduced mathematically. The produce of this ongoing research includes a formalized open-source GitHub issues database that is used to deduce the similarity of the labels pertaining to the different repositories.

Keywords: bug classification, bug labels, GitHub issues, semantic differences

Procedia PDF Downloads 193
2099 Communication Infrastructure Required for a Driver Behaviour Monitoring System, ‘SiaMOTO’ IT Platform

Authors: Dogaru-Ulieru Valentin, Sălișteanu Ioan Corneliu, Ardeleanu Mihăiță Nicolae, Broscăreanu Ștefan, Sălișteanu Bogdan, Mihai Mihail

Abstract:

The SiaMOTO system is a communications and data processing platform for vehicle traffic. The human factor is the most important factor in the generation of this data, as the driver is the one who dictates the trajectory of the vehicle. Like any trajectory, specific parameters refer to position, speed and acceleration. Constant knowledge of these parameters allows complex analyses. Roadways allow many vehicles to travel through their confined space, and the overlapping trajectories of several vehicles increase the likelihood of collision events, known as road accidents. Any such event has causes that lead to its occurrence, so the conditions for its occurrence are known. The human factor is predominant in deciding the trajectory parameters of the vehicle on the road, so monitoring it by knowing the events reported by the DiaMOTO device over time, will generate a guide to target any potentially high-risk driving behavior and reward those who control the driving phenomenon well. In this paper, we have focused on detailing the communication infrastructure of the DiaMOTO device with the traffic data collection server, the infrastructure through which the database that will be used for complex AI/DLM analysis is built. The central element of this description is the data string in CODEC-8 format sent by the DiaMOTO device to the SiaMOTO collection server database. The data presented are specific to a functional infrastructure implemented in an experimental model stage, by installing on a number of 50 vehicles DiaMOTO unique code devices, integrating ADAS and GPS functions, through which vehicle trajectories can be monitored 24 hours a day.

Keywords: DiaMOTO, Codec-8, ADAS, GPS, driver monitoring

Procedia PDF Downloads 73
2098 Object-Scene: Deep Convolutional Representation for Scene Classification

Authors: Yanjun Chen, Chuanping Hu, Jie Shao, Lin Mei, Chongyang Zhang

Abstract:

Traditional image classification is based on encoding scheme (e.g. Fisher Vector, Vector of Locally Aggregated Descriptor) with low-level image features (e.g. SIFT, HoG). Compared to these low-level local features, deep convolutional features obtained at the mid-level layer of convolutional neural networks (CNN) have richer information but lack of geometric invariance. For scene classification, there are scattered objects with different size, category, layout, number and so on. It is crucial to find the distinctive objects in scene as well as their co-occurrence relationship. In this paper, we propose a method to take advantage of both deep convolutional features and the traditional encoding scheme while taking object-centric and scene-centric information into consideration. First, to exploit the object-centric and scene-centric information, two CNNs that trained on ImageNet and Places dataset separately are used as the pre-trained models to extract deep convolutional features at multiple scales. This produces dense local activations. By analyzing the performance of different CNNs at multiple scales, it is found that each CNN works better in different scale ranges. A scale-wise CNN adaption is reasonable since objects in scene are at its own specific scale. Second, a fisher kernel is applied to aggregate a global representation at each scale and then to merge into a single vector by using a post-processing method called scale-wise normalization. The essence of Fisher Vector lies on the accumulation of the first and second order differences. Hence, the scale-wise normalization followed by average pooling would balance the influence of each scale since different amount of features are extracted. Third, the Fisher vector representation based on the deep convolutional features is followed by a linear Supported Vector Machine, which is a simple yet efficient way to classify the scene categories. Experimental results show that the scale-specific feature extraction and normalization with CNNs trained on object-centric and scene-centric datasets can boost the results from 74.03% up to 79.43% on MIT Indoor67 when only two scales are used (compared to results at single scale). The result is comparable to state-of-art performance which proves that the representation can be applied to other visual recognition tasks.

Keywords: deep convolutional features, Fisher Vector, multiple scales, scale-specific normalization

Procedia PDF Downloads 327
2097 Artificial Intelligence and Governance in Relevance to Satellites in Space

Authors: Anwesha Pathak

Abstract:

With the increasing number of satellites and space debris, space traffic management (STM) becomes crucial. AI can aid in STM by predicting and preventing potential collisions, optimizing satellite trajectories, and managing orbital slots. Governance frameworks need to address the integration of AI algorithms in STM to ensure safe and sustainable satellite activities. AI and governance play significant roles in the context of satellite activities in space. Artificial intelligence (AI) technologies, such as machine learning and computer vision, can be utilized to process vast amounts of data received from satellites. AI algorithms can analyse satellite imagery, detect patterns, and extract valuable information for applications like weather forecasting, urban planning, agriculture, disaster management, and environmental monitoring. AI can assist in automating and optimizing satellite operations. Autonomous decision-making systems can be developed using AI to handle routine tasks like orbit control, collision avoidance, and antenna pointing. These systems can improve efficiency, reduce human error, and enable real-time responsiveness in satellite operations. AI technologies can be leveraged to enhance the security of satellite systems. AI algorithms can analyze satellite telemetry data to detect anomalies, identify potential cyber threats, and mitigate vulnerabilities. Governance frameworks should encompass regulations and standards for securing satellite systems against cyberattacks and ensuring data privacy. AI can optimize resource allocation and utilization in satellite constellations. By analyzing user demands, traffic patterns, and satellite performance data, AI algorithms can dynamically adjust the deployment and routing of satellites to maximize coverage and minimize latency. Governance frameworks need to address fair and efficient resource allocation among satellite operators to avoid monopolistic practices. Satellite activities involve multiple countries and organizations. Governance frameworks should encourage international cooperation, information sharing, and standardization to address common challenges, ensure interoperability, and prevent conflicts. AI can facilitate cross-border collaborations by providing data analytics and decision support tools for shared satellite missions and data sharing initiatives. AI and governance are critical aspects of satellite activities in space. They enable efficient and secure operations, ensure responsible and ethical use of AI technologies, and promote international cooperation for the benefit of all stakeholders involved in the satellite industry.

Keywords: satellite, space debris, traffic, threats, cyber security.

Procedia PDF Downloads 68
2096 Machine Learning Approach for Automating Electronic Component Error Classification and Detection

Authors: Monica Racha, Siva Chandrasekaran, Alex Stojcevski

Abstract:

The engineering programs focus on promoting students' personal and professional development by ensuring that students acquire technical and professional competencies during four-year studies. The traditional engineering laboratory provides an opportunity for students to "practice by doing," and laboratory facilities aid them in obtaining insight and understanding of their discipline. Due to rapid technological advancements and the current COVID-19 outbreak, the traditional labs were transforming into virtual learning environments. Aim: To better understand the limitations of the physical laboratory, this research study aims to use a Machine Learning (ML) algorithm that interfaces with the Augmented Reality HoloLens and predicts the image behavior to classify and detect the electronic components. The automated electronic components error classification and detection automatically detect and classify the position of all components on a breadboard by using the ML algorithm. This research will assist first-year undergraduate engineering students in conducting laboratory practices without any supervision. With the help of HoloLens, and ML algorithm, students will reduce component placement error on a breadboard and increase the efficiency of simple laboratory practices virtually. Method: The images of breadboards, resistors, capacitors, transistors, and other electrical components will be collected using HoloLens 2 and stored in a database. The collected image dataset will then be used for training a machine learning model. The raw images will be cleaned, processed, and labeled to facilitate further analysis of components error classification and detection. For instance, when students conduct laboratory experiments, the HoloLens captures images of students placing different components on a breadboard. The images are forwarded to the server for detection in the background. A hybrid Convolutional Neural Networks (CNNs) and Support Vector Machines (SVMs) algorithm will be used to train the dataset for object recognition and classification. The convolution layer extracts image features, which are then classified using Support Vector Machine (SVM). By adequately labeling the training data and classifying, the model will predict, categorize, and assess students in placing components correctly. As a result, the data acquired through HoloLens includes images of students assembling electronic components. It constantly checks to see if students appropriately position components in the breadboard and connect the components to function. When students misplace any components, the HoloLens predicts the error before the user places the components in the incorrect proportion and fosters students to correct their mistakes. This hybrid Convolutional Neural Networks (CNNs) and Support Vector Machines (SVMs) algorithm automating electronic component error classification and detection approach eliminates component connection problems and minimizes the risk of component damage. Conclusion: These augmented reality smart glasses powered by machine learning provide a wide range of benefits to supervisors, professionals, and students. It helps customize the learning experience, which is particularly beneficial in large classes with limited time. It determines the accuracy with which machine learning algorithms can forecast whether students are making the correct decisions and completing their laboratory tasks.

Keywords: augmented reality, machine learning, object recognition, virtual laboratories

Procedia PDF Downloads 132
2095 A Psychophysiological Evaluation of an Effective Recognition Technique Using Interactive Dynamic Virtual Environments

Authors: Mohammadhossein Moghimi, Robert Stone, Pia Rotshtein

Abstract:

Recording psychological and physiological correlates of human performance within virtual environments and interpreting their impacts on human engagement, ‘immersion’ and related emotional or ‘effective’ states is both academically and technologically challenging. By exposing participants to an effective, real-time (game-like) virtual environment, designed and evaluated in an earlier study, a psychophysiological database containing the EEG, GSR and Heart Rate of 30 male and female gamers, exposed to 10 games, was constructed. Some 174 features were subsequently identified and extracted from a number of windows, with 28 different timing lengths (e.g. 2, 3, 5, etc. seconds). After reducing the number of features to 30, using a feature selection technique, K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) methods were subsequently employed for the classification process. The classifiers categorised the psychophysiological database into four effective clusters (defined based on a 3-dimensional space – valence, arousal and dominance) and eight emotion labels (relaxed, content, happy, excited, angry, afraid, sad, and bored). The KNN and SVM classifiers achieved average cross-validation accuracies of 97.01% (±1.3%) and 92.84% (±3.67%), respectively. However, no significant differences were found in the classification process based on effective clusters or emotion labels.

Keywords: virtual reality, effective computing, effective VR, emotion-based effective physiological database

Procedia PDF Downloads 230
2094 Hacking's 'Between Goffman and Foucault': A Theoretical Frame for Criminology

Authors: Tomás Speziale

Abstract:

This paper aims to analyse how Ian Hacking states the theoretical basis of his research on the classification of people. Although all his early philosophical education had been based in Foucault, it is also true that Erving Goffman’s perspective provided him with epistemological and methodological tools for understanding face-to-face relationships. Hence, all his works must be thought of as social science texts that combine the research on how the individuals are constituted ‘top-down’ (as in Foucault), with the inquiry into how people renegotiate ‘bottom-up’ the classifications about them. Thus, Hacking´s proposal constitutes a middle ground between the French Philosopher and the American Sociologist. Placing himself between both authors allows Hacking to build a frame that is expected to adjust to Social Sciences’ main particularity: the fact that they study interactive kinds. These are kinds of people, which imply that those who are classified can change in certain ways that prompt the need for changing previous classifications themselves. It is all about the interaction between the labelling of people and the people who are classified. Consequently, understanding the way in which Hacking uses Foucault’s and Goffman’s theories is essential to fully comprehend the social dynamic between individuals and concepts, what Bert Hansen had called dialectical realism. His theoretical proposal, therefore, is not only valuable because it combines diverse perspectives, but also because it constitutes an utterly original and relevant framework for Sociological theory and particularly for Criminology.

Keywords: classification of people, Foucault's archaeology, Goffman's interpersonal sociology, interactive kinds

Procedia PDF Downloads 339
2093 Technologic Information about Photovoltaic Applied in Urban Residences

Authors: Stephanie Fabris Russo, Daiane Costa Guimarães, Jonas Pedro Fabris, Maria Emilia Camargo, Suzana Leitão Russo, José Augusto Andrade Filho

Abstract:

Among renewable energy sources, solar energy is the one that has stood out. Solar radiation can be used as a thermal energy source and can also be converted into electricity by means of effects on certain materials, such as thermoelectric and photovoltaic panels. These panels are often used to generate energy in homes, buildings, arenas, etc., and have low pollution emissions. Thus, a technological prospecting was performed to find patents related to the use of photovoltaic plates in urban residences. The patent search was based on ESPACENET, associating the keywords photovoltaic and home, where we found 136 patent documents in the period of 1994-2015 in the fields title and abstract. Note that the years 2009, 2010, 2011, 2012, 2013 and 2014 had the highest number of applicants, with respectively, 11, 13, 23, 29, 15 and 21. Regarding the country that deposited about this technology, it is clear that China leads with 67 patent deposits, followed by Japan with 38 patents applications. It is important to note that most depositors, 50% are companies, 44% are individual inventors and only 6% are universities. On the International Patent classification (IPC) codes, we noted that the most present classification in results was H02J3/38, which represents provisions in parallel to feed a single network by two or more generators, converters or transformers. Among all categories, there is the H session, which means Electricity, with 70% of the patents.

Keywords: photovoltaic, urban residences, technology forecasting, prospecting

Procedia PDF Downloads 294
2092 Movie Genre Preference Prediction Using Machine Learning for Customer-Based Information

Authors: Haifeng Wang, Haili Zhang

Abstract:

Most movie recommendation systems have been developed for customers to find items of interest. This work introduces a predictive model usable by small and medium-sized enterprises (SMEs) who are in need of a data-based and analytical approach to stock proper movies for local audiences and retain more customers. We used classification models to extract features from thousands of customers’ demographic, behavioral and social information to predict their movie genre preference. In the implementation, a Gaussian kernel support vector machine (SVM) classification model and a logistic regression model were established to extract features from sample data and their test error-in-sample were compared. Comparison of error-out-sample was also made under different Vapnik–Chervonenkis (VC) dimensions in the machine learning algorithm to find and prevent overfitting. Gaussian kernel SVM prediction model can correctly predict movie genre preferences in 85% of positive cases. The accuracy of the algorithm increased to 93% with a smaller VC dimension and less overfitting. These findings advance our understanding of how to use machine learning approach to predict customers’ preferences with a small data set and design prediction tools for these enterprises.

Keywords: computational social science, movie preference, machine learning, SVM

Procedia PDF Downloads 254
2091 An Improved Parallel Algorithm of Decision Tree

Authors: Jiameng Wang, Yunfei Yin, Xiyu Deng

Abstract:

Parallel optimization is one of the important research topics of data mining at this stage. Taking Classification and Regression Tree (CART) parallelization as an example, this paper proposes a parallel data mining algorithm based on SSP-OGini-PCCP. Aiming at the problem of choosing the best CART segmentation point, this paper designs an S-SP model without data association; and in order to calculate the Gini index efficiently, a parallel OGini calculation method is designed. In addition, in order to improve the efficiency of the pruning algorithm, a synchronous PCCP pruning strategy is proposed in this paper. In this paper, the optimal segmentation calculation, Gini index calculation, and pruning algorithm are studied in depth. These are important components of parallel data mining. By constructing a distributed cluster simulation system based on SPARK, data mining methods based on SSP-OGini-PCCP are tested. Experimental results show that this method can increase the search efficiency of the best segmentation point by an average of 89%, increase the search efficiency of the Gini segmentation index by 3853%, and increase the pruning efficiency by 146% on average; and as the size of the data set increases, the performance of the algorithm remains stable, which meets the requirements of contemporary massive data processing.

Keywords: classification, Gini index, parallel data mining, pruning ahead

Procedia PDF Downloads 117