Search results for: traffic data
25005 Cloud Design for Storing Large Amount of Data
Authors: M. Strémy, P. Závacký, P. Cuninka, M. Juhás
Abstract:
Main goal of this paper is to introduce our design of private cloud for storing large amount of data, especially pictures, and to provide good technological backend for data analysis based on parallel processing and business intelligence. We have tested hypervisors, cloud management tools, storage for storing all data and Hadoop to provide data analysis on unstructured data. Providing high availability, virtual network management, logical separation of projects and also rapid deployment of physical servers to our environment was also needed.Keywords: cloud, glusterfs, hadoop, juju, kvm, maas, openstack, virtualization
Procedia PDF Downloads 35325004 Assessment of Vehicular Emission and Its Impact on Urban Air Quality
Authors: Syed Imran Hussain Shah
Abstract:
Air pollution rapidly impacts the Earth's climate and environmental quality, causing public health nuisances and cardio-pulmonary illnesses. Air pollution is a global issue, and all population groups in all the regions in the developed and developing parts of the world were affected by it. The promise of a reduction in deaths and diseases as per SDG No. 3 is an international commitment towards sustainable development. In that context, assessing and evaluating the ambient air quality is paramount. This article estimates the air pollution released by the vehicles on roads of Lahore, a mega city having 13.98 million populations. A survey was conducted on different fuel stations to determine the estimated fuel pumped to different types of vehicles from different fuel stations. The number of fuel stations in Lahore is around 350. Another survey was also conducted to interview the drivers to know the per-litre fuel consumption of other vehicles. Therefore, a survey was conducted on 189 fuel stations and 400 drivers using a combination of random sampling and convenience sampling methods. The sampling was done in a manner to cover all areas of the city including central commercial hubs, modern housing societies, industrial zones, main highways, old traditional population centres, etc. Mathematical equations were also used to estimate the emissions from different modes of vehicles. Due to the increase in population, the number of vehicles is increasing, and consequently, traffic emissions were rising at a higher level. Motorcycles, auto rickshaws, motor cars, and vans were the main contributors to Carbon dioxide and vehicular emissions in the air. It has been observed that vehicles that use petrol fuel produce more Carbon dioxide emissions in the air. Buses and trucks were the main contributors to NOx in the air due to the use of diesel fuel. Whereas vans, buses, and trucks produce the maximum amount of SO2. PM10 and PM2.5 were mainly produced by motorcycles and motorcycle two-stroke rickshaws. Auto rickshaws and motor cars mainly produce benzene emissions. This study may act as a major tool for traffic and vehicle policy decisions to promote better fuel quality and more fuel-efficient vehicles to reduce emissions.Keywords: particulate matter, nitrogen dioxide, climate change, pollution control
Procedia PDF Downloads 1325003 Behavioral Mapping and Post-Occupancy Evaluation of Meeting-Point Design in an International Airport
Authors: Meng-Cong Zheng, Yu-Sheng Chen
Abstract:
The meeting behavior is a pervasive kind of interaction, which often occurs between the passenger and the shuttle. However, the meeting point set up at the Taoyuan International Airport is too far from the entry-exit, often causing passengers to stop searching near the entry-exit. When the number of people waiting for the rush hour increases, it often results in chaos in the waiting area. This study tried to find out what is the key factor to promote the rapid finding of each other between the passengers and the pick-ups. Then we implemented several design proposals to improve the meeting behavior of passengers and pick-ups based on behavior mapping and post-occupancy evaluation to enhance their meeting efficiency in unfamiliar environments. The research base is the reception hall of the second terminal of Taoyuan International Airport. Behavioral observation and mapping are implemented on the entry of inbound passengers into the welcome space, including the crowd distribution of the people who rely on the separation wall in the waiting area, the behavior of meeting and the interaction between the inbound passengers and the pick-ups. Then we redesign the space planning and signage design based on post-occupancy evaluation to verify the effectiveness of space plan and signage design. This study found that passengers ignore existing meeting-point designs which are placed on distant pillars at both ends. The position of the screen affects the area where the receiver is stranded, causing the pick-ups to block the passenger's moving line. The pick-ups prefer to wait where it is easy to watch incoming passengers and where it is closest to the mode of transport they take when leaving. Large visitors tend to gather next to landmarks, and smaller groups have a wide waiting area in the lobby. The location of the meeting point chosen by the pick-ups is related to the view of the incoming passenger. Finally, this study proposes an improved design of the meeting point, setting the traffic information in it, so that most passengers can see the traffic information when they enter the country. At the same time, we also redesigned the pick-ups desk to improve the efficiency of passenger meeting.Keywords: meeting point design, post-occupancy evaluation, behavioral mapping, international airport
Procedia PDF Downloads 13925002 Estimation of Missing Values in Aggregate Level Spatial Data
Authors: Amitha Puranik, V. S. Binu, Seena Biju
Abstract:
Missing data is a common problem in spatial analysis especially at the aggregate level. Missing can either occur in covariate or in response variable or in both in a given location. Many missing data techniques are available to estimate the missing data values but not all of these methods can be applied on spatial data since the data are autocorrelated. Hence there is a need to develop a method that estimates the missing values in both response variable and covariates in spatial data by taking account of the spatial autocorrelation. The present study aims to develop a model to estimate the missing data points at the aggregate level in spatial data by accounting for (a) Spatial autocorrelation of the response variable (b) Spatial autocorrelation of covariates and (c) Correlation between covariates and the response variable. Estimating the missing values of spatial data requires a model that explicitly account for the spatial autocorrelation. The proposed model not only accounts for spatial autocorrelation but also utilizes the correlation that exists between covariates, within covariates and between a response variable and covariates. The precise estimation of the missing data points in spatial data will result in an increased precision of the estimated effects of independent variables on the response variable in spatial regression analysis.Keywords: spatial regression, missing data estimation, spatial autocorrelation, simulation analysis
Procedia PDF Downloads 38225001 Anatomical and Histological Analysis of Salpinx and Ovary in Anatolian Wild Goat (Capra aegagrus aegagrus)
Authors: Gulseren Kirbas, Mushap Kuru, Buket Bakir, Ebru Karadag Sari
Abstract:
Capra (mountain goat) is a genus comprising nine species. The domestic goat (C. aegagrus hircus) is a subspecies of the wild goat that is domesticated. This study aimed to determine the anatomical structure of the salpinx and ovary of the Anatolian wild goat (C. aegagrus aegagrus). Animals that were taken to the Kafkas University Wildlife Rescue and Rehabilitation Center, Kars, Turkey, because of various reasons, such as traffic accidents and firearm injuries, were used in this study. The salpinges and ovaries of four wild goats of similar ages, which could not be rescued by the Center despite all interventions, were dissected. Measurements were taken from the right-left salpinx and ovary using digital calipers. The weights of each ovary and salpinx were measured using a precision scale (min: 0.0001 g − max: 220 g, code: XB220A; Precisa, Swiss). The histological structure of the tissues was examined after weighing the organs. The tissue samples were fixed in 10% formaldehyde for 24 h. Then a routine procedure was applied, and the tissues were embedded in paraffin. Mallory’s modified triple staining was used to demonstrate the general structure of the salpinx. The salpinx was found to consist of three different regions (infundibulum, ampulla, and isthmus). These regions consisted of tunica mucosa, tunica muscularis, and tunica serosa. The prismatic epithelial cells were observed in the lamina epithelialis of tunica mucosa in every region, but the prismatic fimbrae cells occurred most in the infundibulum. The ampulla was distinguished by its many mucosal folds. It was the longest region of the salpinx and was joined to the isthmus via the ampullary–isthmus junction. Isthmus was the caudal end of the salpinx joined to the uterus and had the thickest tunica muscularis compared with the other regions. The mean length of the ovary was 13.22 ± 1.27 mm, width was 8.46 ± 0.88 mm, the thickness was 5.67 ± 0.79 mm, and weight was 0.59 ± 0.17 g. The average length of the salpinx was 58.11 ± 14.02 mm, width was 0.80 ± 0.22 mm, the thickness was 0.41 ± 0.01 mm, and weight was 0.30 ± 0.08 g. In conclusion, the Anatolian wild goat, which is included in wildlife diversity in Turkey, has been disappearing due to illegal and uncontrolled hunting as well as traffic accidents in recent years. These findings are believed to contribute to the literature.Keywords: Anatolian wild goat, anatomy, ovary, salpinx
Procedia PDF Downloads 22425000 Association Rules Mining and NOSQL Oriented Document in Big Data
Authors: Sarra Senhadji, Imene Benzeguimi, Zohra Yagoub
Abstract:
Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.Keywords: Apriori, Association rules mining, Big Data, Data Mining, Hadoop, MapReduce, MongoDB, NoSQL
Procedia PDF Downloads 16224999 Immunization-Data-Quality in Public Health Facilities in the Pastoralist Communities: A Comparative Study Evidence from Afar and Somali Regional States, Ethiopia
Authors: Melaku Tsehay
Abstract:
The Consortium of Christian Relief and Development Associations (CCRDA), and the CORE Group Polio Partners (CGPP) Secretariat have been working with Global Alliance for Vac-cines and Immunization (GAVI) to improve the immunization data quality in Afar and Somali Regional States. The main aim of this study was to compare the quality of immunization data before and after the above interventions in health facilities in the pastoralist communities in Ethiopia. To this end, a comparative-cross-sectional study was conducted on 51 health facilities. The baseline data was collected in May 2019, while the end line data in August 2021. The WHO data quality self-assessment tool (DQS) was used to collect data. A significant improvment was seen in the accuracy of the pentavalent vaccine (PT)1 (p = 0.012) data at the health posts (HP), while PT3 (p = 0.010), and Measles (p = 0.020) at the health centers (HC). Besides, a highly sig-nificant improvment was observed in the accuracy of tetanus toxoid (TT)2 data at HP (p < 0.001). The level of over- or under-reporting was found to be < 8%, at the HP, and < 10% at the HC for PT3. The data completeness was also increased from 72.09% to 88.89% at the HC. Nearly 74% of the health facilities timely reported their respective immunization data, which is much better than the baseline (7.1%) (p < 0.001). These findings may provide some hints for the policies and pro-grams targetting on improving immunization data qaulity in the pastoralist communities.Keywords: data quality, immunization, verification factor, pastoralist region
Procedia PDF Downloads 12424998 Metropolis-Hastings Sampling Approach for High Dimensional Testing Methods of Autonomous Vehicles
Authors: Nacer Eddine Chelbi, Ayet Bagane, Annie Saleh, Claude Sauvageau, Denis Gingras
Abstract:
As recently stated by National Highway Traffic Safety Administration (NHTSA), to demonstrate the expected performance of a highly automated vehicles system, test approaches should include a combination of simulation, test track, and on-road testing. In this paper, we propose a new validation method for autonomous vehicles involving on-road tests (Field Operational Tests), test track (Test Matrix) and simulation (Worst Case Scenarios). We concentrate our discussion on the simulation aspects, in particular, we extend recent work based on Importance Sampling by using a Metropolis-Hasting algorithm (MHS) to sample collected data from the Safety Pilot Model Deployment (SPMD) in lane-change scenarios. Our proposed MH sampling method will be compared to the Importance Sampling method, which does not perform well in high-dimensional problems. The importance of this study is to obtain a sampler that could be applied to high dimensional simulation problems in order to reduce and optimize the number of test scenarios that are necessary for validation and certification of autonomous vehicles.Keywords: automated driving, autonomous emergency braking (AEB), autonomous vehicles, certification, evaluation, importance sampling, metropolis-hastings sampling, tests
Procedia PDF Downloads 28924997 Identifying Critical Success Factors for Data Quality Management through a Delphi Study
Authors: Maria Paula Santos, Ana Lucas
Abstract:
Organizations support their operations and decision making on the data they have at their disposal, so the quality of these data is remarkably important and Data Quality (DQ) is currently a relevant issue, the literature being unanimous in pointing out that poor DQ can result in large costs for organizations. The literature review identified and described 24 Critical Success Factors (CSF) for Data Quality Management (DQM) that were presented to a panel of experts, who ordered them according to their degree of importance, using the Delphi method with the Q-sort technique, based on an online questionnaire. The study shows that the five most important CSF for DQM are: definition of appropriate policies and standards, control of inputs, definition of a strategic plan for DQ, organizational culture focused on quality of the data and obtaining top management commitment and support.Keywords: critical success factors, data quality, data quality management, Delphi, Q-Sort
Procedia PDF Downloads 21724996 A Fuzzy Inference System for Predicting Air Traffic Demand Based on Socioeconomic Drivers
Authors: Nur Mohammad Ali, Md. Shafiqul Alam, Jayanta Bhusan Deb, Nowrin Sharmin
Abstract:
The past ten years have seen significant expansion in the aviation sector, which during the previous five years has steadily pushed emerging countries closer to economic independence. It is crucial to accurately forecast the potential demand for air travel to make long-term financial plans. To forecast market demand for low-cost passenger carriers, this study suggests working with low-cost airlines, airports, consultancies, and governmental institutions' strategic planning divisions. The study aims to develop an artificial intelligence-based methods, notably fuzzy inference systems (FIS), to determine the most accurate forecasting technique for domestic low-cost carrier demand in Bangladesh. To give end users real-world applications, the study includes nine variables, two sub-FIS, and one final Mamdani Fuzzy Inference System utilizing a graphical user interface (GUI) made with the app designer tool. The evaluation criteria used in this inquiry included mean square error (MSE), accuracy, precision, sensitivity, and specificity. The effectiveness of the developed air passenger demand prediction FIS is assessed using 240 data sets, and the accuracy, precision, sensitivity, specificity, and MSE values are 90.83%, 91.09%, 90.77%, and 2.09%, respectively.Keywords: aviation industry, fuzzy inference system, membership function, graphical user interference
Procedia PDF Downloads 7224995 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine
Authors: Djamila Benhaddouche, Abdelkader Benyettou
Abstract:
In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction
Procedia PDF Downloads 55924994 Analysis of Different Classification Techniques Using WEKA for Diabetic Disease
Authors: Usama Ahmed
Abstract:
Data mining is the process of analyze data which are used to predict helpful information. It is the field of research which solve various type of problem. In data mining, classification is an important technique to classify different kind of data. Diabetes is most common disease. This paper implements different classification technique using Waikato Environment for Knowledge Analysis (WEKA) on diabetes dataset and find which algorithm is suitable for working. The best classification algorithm based on diabetic data is Naïve Bayes. The accuracy of Naïve Bayes is 76.31% and take 0.06 seconds to build the model.Keywords: data mining, classification, diabetes, WEKA
Procedia PDF Downloads 14724993 Integrative Analysis of Urban Transportation Network and Land Use Using GIS: A Case Study of Siddipet City
Authors: P. Priya Madhuri, J. Kamini, S. C. Jayanthi
Abstract:
Assessment of land use and transportation networks is essential for sustainable urban growth, urban planning, efficient public transportation systems, and reducing traffic congestion. The study focuses on land use, population density, and their correlation with the road network for future development. The scope of the study covers inventory and assessment of the road network dataset (line) at the city, zonal, or ward level, which is extracted from very high-resolution satellite data (spatial resolution < 0.5 m) at 1:4000 map scale and ground truth verification. Road network assessment is carried out by computing various indices that measure road coverage and connectivity. In this study, an assessment of the road network is carried out for the study region at the municipal and ward levels. In order to identify gaps, road coverage and connectivity were associated with urban land use, built-up area, and population density in the study area. Ward-wise road connectivity and coverage maps have been prepared. To assess the relationship between road network metrics, correlation analysis is applied. The study's conclusions are extremely beneficial for effective road network planning and detecting gaps in the road network at the ward level in association with urban land use, existing built-up, and population.Keywords: road connectivity, road coverage, road network, urban land use, transportation analysis
Procedia PDF Downloads 3324992 Comprehensive Study of Data Science
Authors: Asifa Amara, Prachi Singh, Kanishka, Debargho Pathak, Akshat Kumar, Jayakumar Eravelly
Abstract:
Today's generation is totally dependent on technology that uses data as its fuel. The present study is all about innovations and developments in data science and gives an idea about how efficiently to use the data provided. This study will help to understand the core concepts of data science. The concept of artificial intelligence was introduced by Alan Turing in which the main principle was to create an artificial system that can run independently of human-given programs and can function with the help of analyzing data to understand the requirements of the users. Data science comprises business understanding, analyzing data, ethical concerns, understanding programming languages, various fields and sources of data, skills, etc. The usage of data science has evolved over the years. In this review article, we have covered a part of data science, i.e., machine learning. Machine learning uses data science for its work. Machines learn through their experience, which helps them to do any work more efficiently. This article includes a comparative study image between human understanding and machine understanding, advantages, applications, and real-time examples of machine learning. Data science is an important game changer in the life of human beings. Since the advent of data science, we have found its benefits and how it leads to a better understanding of people, and how it cherishes individual needs. It has improved business strategies, services provided by them, forecasting, the ability to attend sustainable developments, etc. This study also focuses on a better understanding of data science which will help us to create a better world.Keywords: data science, machine learning, data analytics, artificial intelligence
Procedia PDF Downloads 8224991 Challenges of New Technologies in the Field of Criminal Law: The Protection of the Right to Privacy in the Spanish Penal Code
Authors: Deborah Garcia-Magna
Abstract:
The use of new technologies has become widespread in the last decade, giving rise to various risks associated with the transfer of personal data and the publication of sensitive material on social media. There are already several supranational instruments that seek to protect the citizens involved in this growing traffic of personal information and, especially, the most vulnerable people, such as minors, who are also the ones who make the most intense use of these new means of communication. In this sense, the configuration of the concept of privacy as a legal right has necessarily been influenced by these new social uses and supranational instruments. The researcher considers correct the decision to introduce sexting as a new criminal behaviour in the Penal Code in 2015, but questions the concrete manner in which it has been made. To this end, an updated review of the various options that our legal system already offered is made, assessing whether these legal options adequately addressed the new social needs and guidelines from jurisprudence and other supranational instruments. Some important issues emerge as to whether the principles of fragmentarity and subsidiarity may be violated since the new article 197.7 of the Spanish Penal Code could refer to very varied behaviours and protect not only particularly vulnerable persons. In this sense, the research focuses on issues such as the concept of 'seriousness' of the infringement of privacy, the possible reckless conduct of the victim, who hang over its own private material to third parties, the affection to other legal rights such as freedom and sexual indemnity, the possible problems of concurrent offences, etc.Keywords: criminal law reform, ECHR jurisprudence, right to privacy, sexting
Procedia PDF Downloads 19424990 Application of Artificial Neural Network Technique for Diagnosing Asthma
Authors: Azadeh Bashiri
Abstract:
Introduction: Lack of proper diagnosis and inadequate treatment of asthma leads to physical and financial complications. This study aimed to use data mining techniques and creating a neural network intelligent system for diagnosis of asthma. Methods: The study population is the patients who had visited one of the Lung Clinics in Tehran. Data were analyzed using the SPSS statistical tool and the chi-square Pearson's coefficient was the basis of decision making for data ranking. The considered neural network is trained using back propagation learning technique. Results: According to the analysis performed by means of SPSS to select the top factors, 13 effective factors were selected, in different performances, data was mixed in various forms, so the different models were made for training the data and testing networks and in all different modes, the network was able to predict correctly 100% of all cases. Conclusion: Using data mining methods before the design structure of system, aimed to reduce the data dimension and the optimum choice of the data, will lead to a more accurate system. Therefore, considering the data mining approaches due to the nature of medical data is necessary.Keywords: asthma, data mining, Artificial Neural Network, intelligent system
Procedia PDF Downloads 27324989 Interpreting Privacy Harms from a Non-Economic Perspective
Authors: Christopher Muhawe, Masooda Bashir
Abstract:
With increased Internet Communication Technology(ICT), the virtual world has become the new normal. At the same time, there is an unprecedented collection of massive amounts of data by both private and public entities. Unfortunately, this increase in data collection has been in tandem with an increase in data misuse and data breach. Regrettably, the majority of data breach and data misuse claims have been unsuccessful in the United States courts for the failure of proof of direct injury to physical or economic interests. The requirement to express data privacy harms from an economic or physical stance negates the fact that not all data harms are physical or economic in nature. The challenge is compounded by the fact that data breach harms and risks do not attach immediately. This research will use a descriptive and normative approach to show that not all data harms can be expressed in economic or physical terms. Expressing privacy harms purely from an economic or physical harm perspective negates the fact that data insecurity may result into harms which run counter the functions of privacy in our lives. The promotion of liberty, selfhood, autonomy, promotion of human social relations and the furtherance of the existence of a free society. There is no economic value that can be placed on these functions of privacy. The proposed approach addresses data harms from a psychological and social perspective.Keywords: data breach and misuse, economic harms, privacy harms, psychological harms
Procedia PDF Downloads 19524988 Signature Bridge Design for the Port of Montreal
Authors: Juan Manuel Macia
Abstract:
The Montreal Port Authority (MPA) wanted to build a new road link via Souligny Avenue to increase the fluidity of goods transported by truck in the Viau Street area of Montreal and to mitigate the current traffic problems on Notre-Dame Street. With the purpose of having a better integration and acceptance of this project with the neighboring residential surroundings, this project needed to include an architectural integration, bringing some artistic components to the bridge design along with some landscaping components. The MPA is required primarily to provide direct truck access to Port of Montreal with a direct connection to the future Assomption Boulevard planned by the City of Montreal and, thus, direct access to Souligny Avenue. The MPA also required other key aspects to be considered for the proposal and development of the project, such as the layout of road and rail configurations, the reconstruction of underground structures, the relocation of power lines, the installation of lighting systems, the traffic signage and communication systems improvement, the construction of new access ramps, the pavement reconstruction and a summary assessment of the structural capacity of an existing service tunnel. The identification of the various possible scenarios began by identifying all the constraints related to the numerous infrastructures located in the area of the future link between the port and the future extension of Souligny Avenue, involving interaction with several disciplines and technical specialties. Several viaduct- and tunnel-type geometries were studied to link the port road to the right-of-way north of Notre-Dame Street and to improve traffic flow at the railway corridor. The proposed design took into account the existing access points to Port of Montreal, the built environment of the MPA site, the provincial and municipal rights-of-way, and the future Notre-Dame Street layout planned by the City of Montreal. These considerations required the installation of an engineering structure with a span of over 60 m to free up a corridor for the future urban fabric of Notre-Dame Street. The best option for crossing this span length was identified by the design and construction of a curved bridge over Notre-Dame Street, which is essentially a structure with a deck formed by a reinforced concrete slab on steel box girders with a single span of 63.5m. The foundation units were defined as pier-cap type abutments on drilled shafts to bedrock with rock sockets, with MSE-type walls at the approaches. The configuration of a single-span curved structure posed significant design and construction challenges, considering the major constraints of the project site, a design for durability approach, and the need to guarantee optimum performance over a 75-year service life in accordance with the client's needs and the recommendations and requirements defined by the standards used for the project. These aspects and the need to include architectural and artistic components in this project made it possible to design, build, and integrate a signature infrastructure project with a sustainable approach, from which the MPA, the commuters, and the city of Montreal and its residents will benefit.Keywords: curved bridge, steel box girder, medium span, simply supported, industrial and urban environment, architectural integration, design for durability
Procedia PDF Downloads 7024987 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course
Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu
Abstract:
This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN
Procedia PDF Downloads 4424986 Data Access, AI Intensity, and Scale Advantages
Authors: Chuping Lo
Abstract:
This paper presents a simple model demonstrating that ceteris paribus countries with lower barriers to accessing global data tend to earn higher incomes than other countries. Therefore, large countries that inherently have greater data resources tend to have higher incomes than smaller countries, such that the former may be more hesitant than the latter to liberalize cross-border data flows to maintain this advantage. Furthermore, countries with higher artificial intelligence (AI) intensity in production technologies tend to benefit more from economies of scale in data aggregation, leading to higher income and more trade as they are better able to utilize global data.Keywords: digital intensity, digital divide, international trade, scale of economics
Procedia PDF Downloads 6824985 Secured Transmission and Reserving Space in Images Before Encryption to Embed Data
Authors: G. R. Navaneesh, E. Nagarajan, C. H. Rajam Raju
Abstract:
Nowadays the multimedia data are used to store some secure information. All previous methods allocate a space in image for data embedding purpose after encryption. In this paper, we propose a novel method by reserving space in image with a boundary surrounded before encryption with a traditional RDH algorithm, which makes it easy for the data hider to reversibly embed data in the encrypted images. The proposed method can achieve real time performance, that is, data extraction and image recovery are free of any error. A secure transmission process is also discussed in this paper, which improves the efficiency by ten times compared to other processes as discussed.Keywords: secure communication, reserving room before encryption, least significant bits, image encryption, reversible data hiding
Procedia PDF Downloads 41224984 Identity Verification Using k-NN Classifiers and Autistic Genetic Data
Authors: Fuad M. Alkoot
Abstract:
DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN).Keywords: biometrics, genetic data, identity verification, k nearest neighbor
Procedia PDF Downloads 25824983 A Review on Intelligent Systems for Geoscience
Authors: R Palson Kennedy, P.Kiran Sai
Abstract:
This article introduces machine learning (ML) researchers to the hurdles that geoscience problems present, as well as the opportunities for improvement in both ML and geosciences. This article presents a review from the data life cycle perspective to meet that need. Numerous facets of geosciences present unique difficulties for the study of intelligent systems. Geosciences data is notoriously difficult to analyze since it is frequently unpredictable, intermittent, sparse, multi-resolution, and multi-scale. The first half addresses data science’s essential concepts and theoretical underpinnings, while the second section contains key themes and sharing experiences from current publications focused on each stage of the data life cycle. Finally, themes such as open science, smart data, and team science are considered.Keywords: Data science, intelligent system, machine learning, big data, data life cycle, recent development, geo science
Procedia PDF Downloads 13524982 Data Quality as a Pillar of Data-Driven Organizations: Exploring the Benefits of Data Mesh
Authors: Marc Bachelet, Abhijit Kumar Chatterjee, José Manuel Avila
Abstract:
Data quality is a key component of any data-driven organization. Without data quality, organizations cannot effectively make data-driven decisions, which often leads to poor business performance. Therefore, it is important for an organization to ensure that the data they use is of high quality. This is where the concept of data mesh comes in. Data mesh is an organizational and architectural decentralized approach to data management that can help organizations improve the quality of data. The concept of data mesh was first introduced in 2020. Its purpose is to decentralize data ownership, making it easier for domain experts to manage the data. This can help organizations improve data quality by reducing the reliance on centralized data teams and allowing domain experts to take charge of their data. This paper intends to discuss how a set of elements, including data mesh, are tools capable of increasing data quality. One of the key benefits of data mesh is improved metadata management. In a traditional data architecture, metadata management is typically centralized, which can lead to data silos and poor data quality. With data mesh, metadata is managed in a decentralized manner, ensuring accurate and up-to-date metadata, thereby improving data quality. Another benefit of data mesh is the clarification of roles and responsibilities. In a traditional data architecture, data teams are responsible for managing all aspects of data, which can lead to confusion and ambiguity in responsibilities. With data mesh, domain experts are responsible for managing their own data, which can help provide clarity in roles and responsibilities and improve data quality. Additionally, data mesh can also contribute to a new form of organization that is more agile and adaptable. By decentralizing data ownership, organizations can respond more quickly to changes in their business environment, which in turn can help improve overall performance by allowing better insights into business as an effect of better reports and visualization tools. Monitoring and analytics are also important aspects of data quality. With data mesh, monitoring, and analytics are decentralized, allowing domain experts to monitor and analyze their own data. This will help in identifying and addressing data quality problems in quick time, leading to improved data quality. Data culture is another major aspect of data quality. With data mesh, domain experts are encouraged to take ownership of their data, which can help create a data-driven culture within the organization. This can lead to improved data quality and better business outcomes. Finally, the paper explores the contribution of AI in the coming years. AI can help enhance data quality by automating many data-related tasks, like data cleaning and data validation. By integrating AI into data mesh, organizations can further enhance the quality of their data. The concepts mentioned above are illustrated by AEKIDEN experience feedback. AEKIDEN is an international data-driven consultancy that has successfully implemented a data mesh approach. By sharing their experience, AEKIDEN can help other organizations understand the benefits and challenges of implementing data mesh and improving data quality.Keywords: data culture, data-driven organization, data mesh, data quality for business success
Procedia PDF Downloads 13524981 Big Data Analysis with RHadoop
Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim
Abstract:
It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop
Procedia PDF Downloads 43724980 A Mutually Exclusive Task Generation Method Based on Data Augmentation
Authors: Haojie Wang, Xun Li, Rui Yin
Abstract:
In order to solve the memorization overfitting in the meta-learning MAML algorithm, a method of generating mutually exclusive tasks based on data augmentation is proposed. This method generates a mutex task by corresponding one feature of the data to multiple labels, so that the generated mutex task is inconsistent with the data distribution in the initial dataset. Because generating mutex tasks for all data will produce a large number of invalid data and, in the worst case, lead to exponential growth of computation, this paper also proposes a key data extraction method, that only extracts part of the data to generate the mutex task. The experiments show that the method of generating mutually exclusive tasks can effectively solve the memorization overfitting in the meta-learning MAML algorithm.Keywords: data augmentation, mutex task generation, meta-learning, text classification.
Procedia PDF Downloads 9424979 Comparison of Noise Emissions in the Interior of Passenger Cars
Authors: Martin Kendra, Tomas Skrucany, Jaroslav Masek
Abstract:
The noise is one of the negative elements influencing the human health. This article is due to the measurement of noise emitted by road vehicle and its parts during the operation. Measurement was done in the interior of common passenger cars with a digital sound meter. The results compare the noise value in different cars with different body shape, which influences the driver’s health. Transport has considerable ecological effects, many of them detrimental to environmental sustainability. Roads and traffic exert a variety of direct and mostly detrimental effects on nature.Keywords: driver, noise measurement, passenger road vehicle, road transport
Procedia PDF Downloads 44924978 Efficient Positioning of Data Aggregation Point for Wireless Sensor Network
Authors: Sifat Rahman Ahona, Rifat Tasnim, Naima Hassan
Abstract:
Data aggregation is a helpful technique for reducing the data communication overhead in wireless sensor network. One of the important tasks of data aggregation is positioning of the aggregator points. There are a lot of works done on data aggregation. But, efficient positioning of the aggregators points is not focused so much. In this paper, authors are focusing on the positioning or the placement of the aggregation points in wireless sensor network. Authors proposed an algorithm to select the aggregators positions for a scenario where aggregator nodes are more powerful than sensor nodes.Keywords: aggregation point, data communication, data aggregation, wireless sensor network
Procedia PDF Downloads 15824977 Destination Port Detection For Vessels: An Analytic Tool For Optimizing Port Authorities Resources
Authors: Lubna Eljabu, Mohammad Etemad, Stan Matwin
Abstract:
Port authorities have many challenges in congested ports to allocate their resources to provide a safe and secure loading/ unloading procedure for cargo vessels. Selecting a destination port is the decision of a vessel master based on many factors such as weather, wavelength and changes of priorities. Having access to a tool which leverages AIS messages to monitor vessel’s movements and accurately predict their next destination port promotes an effective resource allocation process for port authorities. In this research, we propose a method, namely, Reference Route of Trajectory (RRoT) to assist port authorities in predicting inflow and outflow traffic in their local environment by monitoring Automatic Identification System (AIS) messages. Our RRoT method creates a reference route based on historical AIS messages. It utilizes some of the best trajectory similarity measure to identify the destination of a vessel using their recent movement. We evaluated five different similarity measures such as Discrete Fr´echet Distance (DFD), Dynamic Time Warping (DTW), Partial Curve Mapping (PCM), Area between two curves (Area) and Curve length (CL). Our experiments show that our method identifies the destination port with an accuracy of 98.97% and an fmeasure of 99.08% using Dynamic Time Warping (DTW) similarity measure.Keywords: spatial temporal data mining, trajectory mining, trajectory similarity, resource optimization
Procedia PDF Downloads 12124976 Spatial Econometric Approaches for Count Data: An Overview and New Directions
Authors: Paula Simões, Isabel Natário
Abstract:
This paper reviews a number of theoretical aspects for implementing an explicit spatial perspective in econometrics for modelling non-continuous data, in general, and count data, in particular. It provides an overview of the several spatial econometric approaches that are available to model data that are collected with reference to location in space, from the classical spatial econometrics approaches to the recent developments on spatial econometrics to model count data, in a Bayesian hierarchical setting. Considerable attention is paid to the inferential framework, necessary for structural consistent spatial econometric count models, incorporating spatial lag autocorrelation, to the corresponding estimation and testing procedures for different assumptions, to the constrains and implications embedded in the various specifications in the literature. This review combines insights from the classical spatial econometrics literature as well as from hierarchical modeling and analysis of spatial data, in order to look for new possible directions on the processing of count data, in a spatial hierarchical Bayesian econometric context.Keywords: spatial data analysis, spatial econometrics, Bayesian hierarchical models, count data
Procedia PDF Downloads 594