Search results for: big data markets
24737 Analyzing Tools and Techniques for Classification In Educational Data Mining: A Survey
Authors: D. I. George Amalarethinam, A. Emima
Abstract:
Educational Data Mining (EDM) is one of the newest topics to emerge in recent years, and it is concerned with developing methods for analyzing various types of data gathered from the educational circle. EDM methods and techniques with machine learning algorithms are used to extract meaningful and usable information from huge databases. For scientists and researchers, realistic applications of Machine Learning in the EDM sectors offer new frontiers and present new problems. One of the most important research areas in EDM is predicting student success. The prediction algorithms and techniques must be developed to forecast students' performance, which aids the tutor, institution to boost the level of student’s performance. This paper examines various classification techniques in prediction methods and data mining tools used in EDM.Keywords: classification technique, data mining, EDM methods, prediction methods
Procedia PDF Downloads 11724736 Improving Security in Healthcare Applications Using Federated Learning System With Blockchain Technology
Authors: Aofan Liu, Qianqian Tan, Burra Venkata Durga Kumar
Abstract:
Data security is of the utmost importance in the healthcare area, as sensitive patient information is constantly sent around and analyzed by many different parties. The use of federated learning, which enables data to be evaluated locally on devices rather than being transferred to a central server, has emerged as a potential solution for protecting the privacy of user information. To protect against data breaches and unauthorized access, federated learning alone might not be adequate. In this context, the application of blockchain technology could provide the system extra protection. This study proposes a distributed federated learning system that is built on blockchain technology in order to enhance security in healthcare. This makes it possible for a wide variety of healthcare providers to work together on data analysis without raising concerns about the confidentiality of the data. The technical aspects of the system, including as the design and implementation of distributed learning algorithms, consensus mechanisms, and smart contracts, are also investigated as part of this process. The technique that was offered is a workable alternative that addresses concerns about the safety of healthcare while also fostering collaborative research and the interchange of data.Keywords: data privacy, distributed system, federated learning, machine learning
Procedia PDF Downloads 13424735 Relationship of Entrepreneurial Ecosystem Factors and Entrepreneurial Cognition: An Exploratory Study Applied to Regional and Metropolitan Ecosystems in New South Wales, Australia
Authors: Sumedha Weerasekara, Morgan Miles, Mark Morrison, Branka Krivokapic-Skoko
Abstract:
This paper is aimed at exploring the interrelationships among entrepreneurial ecosystem factors and entrepreneurial cognition in regional and metropolitan ecosystems. Entrepreneurial ecosystem factors examined include: culture, infrastructure, access to finance, informal networks, support services, access to universities, and the depth and breadth of the talent pool. Using a multivariate approach we explore the impact of these ecosystem factors or elements on entrepreneurial cognition. In doing so, the existing body of knowledge from the literature on entrepreneurial ecosystem and cognition have been blended to explore the relationship between entrepreneurial ecosystem factors and cognition in a way not hitherto investigated. The concept of the entrepreneurial ecosystem has received increased attention as governments, universities and communities have started to recognize the potential of integrated policies, structures, programs and processes that foster entrepreneurship activities by supporting innovation, productivity and employment growth. The notion of entrepreneurial ecosystems has evolved and grown with the advancement of theoretical research and empirical studies. Importance of incorporating external factors like culture, political environment, and the economic environment within a single framework will enhance the capacity of examining the whole systems functionality to better understand the interaction of the entrepreneurial actors and factors within a single framework. The literature on clusters underplays the role of entrepreneurs and entrepreneurial management in creating and co-creating organizations, markets, and supporting ecosystems. Entrepreneurs are only one actor following a limited set of roles and dependent upon many other factors to thrive. As a consequence, entrepreneurs and relevant authorities should be aware of the other actors and factors with which they engage and rely, and make strategic choices to achieve both self and also collective objectives. The study uses stratified random sampling method to collect survey data from 12 different regions in regional and metropolitan regions of NSW, Australia. A questionnaire was administered online among 512 Small and medium enterprise owners operating their business in selected 12 regions in NSW, Australia. Data were analyzed using descriptive analyzing techniques and partial least squares - structural equation modeling. The findings show that even though there is a significant relationship between each and every entrepreneurial ecosystem factors, there is a weak relationship between most entrepreneurial ecosystem factors and entrepreneurial cognition. In the metropolitan context, the availability of finance and informal networks have the largest impact on entrepreneurial cognition while culture, infrastructure, and support services having the smallest impact and the talent pool and universities having a moderate impact on entrepreneurial cognition. Interestingly, in a regional context, culture, availability of finance, and the talent pool have the highest impact on entrepreneurial cognition, while informal networks having the smallest impact and the remaining factors – infrastructure, universities, and support services have a moderate impact on entrepreneurial cognition. These findings suggest the need for a location-specific strategy for supporting the development of entrepreneurial cognition.Keywords: academic achievement, colour response card, feedback
Procedia PDF Downloads 14324734 A Concept of Data Mining with XML Document
Authors: Akshay Agrawal, Anand K. Srivastava
Abstract:
The increasing amount of XML datasets available to casual users increases the necessity of investigating techniques to extract knowledge from these data. Data mining is widely applied in the database research area in order to extract frequent correlations of values from both structured and semi-structured datasets. The increasing availability of heterogeneous XML sources has raised a number of issues concerning how to represent and manage these semi structured data. In recent years due to the importance of managing these resources and extracting knowledge from them, lots of methods have been proposed in order to represent and cluster them in different ways.Keywords: XML, similarity measure, clustering, cluster quality, semantic clustering
Procedia PDF Downloads 38424733 Speed-Up Data Transmission by Using Bluetooth Module on Gas Sensor Node of Arduino Board
Authors: Hiesik Kim, YongBeum Kim
Abstract:
Internet of Things (IoT) applications are widely serviced and spread worldwide. Local wireless data transmission technique must be developed to speed up with some technique. Bluetooth wireless data communication is wireless technique is technique made by Special Inter Group(SIG) using the frequency range 2.4 GHz, and it is exploiting Frequency Hopping to avoid collision with different device. To implement experiment, equipment for experiment transmitting measured data is made by using Arduino as Open source hardware, Gas sensor, and Bluetooth Module and algorithm controlling transmission speed is demonstrated. Experiment controlling transmission speed also is progressed by developing Android Application receiving measured data, and controlling this speed is available at the experiment result. it is important that in the future, improvement for communication algorithm be needed because few error occurs when data is transferred or received.Keywords: Arduino, Bluetooth, gas sensor, internet of things, transmission Speed
Procedia PDF Downloads 48324732 Evaluating the Total Costs of a Ransomware-Resilient Architecture for Healthcare Systems
Authors: Sreejith Gopinath, Aspen Olmsted
Abstract:
This paper is based on our previous work that proposed a risk-transference-based architecture for healthcare systems to store sensitive data outside the system boundary, rendering the system unattractive to would-be bad actors. This architecture also allows a compromised system to be abandoned and a new system instance spun up in place to ensure business continuity without paying a ransom or engaging with a bad actor. This paper delves into the details of various attacks we simulated against the prototype system. In the paper, we discuss at length the time and computational costs associated with storing and retrieving data in the prototype system, abandoning a compromised system, and setting up a new instance with existing data. Lastly, we simulate some analytical workloads over the data stored in our specialized data storage system and discuss the time and computational costs associated with running analytics over data in a specialized storage system outside the system boundary. In summary, this paper discusses the total costs of data storage, access, and analytics incurred with the proposed architecture.Keywords: cybersecurity, healthcare, ransomware, resilience, risk transference
Procedia PDF Downloads 13324731 Money Laundering and Financing of Terrorism
Authors: Covadonga Mallada Fernández
Abstract:
Economic development and globalization of international markets have created a favourable atmosphere for the emergence of new forms of crime such as money laundering or financing of terrorism, which may contribute to destabilized and damage economic systems. In particular, money laundering have acquired great importance since the 11S attacks, what has caused on the one hand, the establishment and development of preventive measures and, on the other hand, a progressive hardening of penal measures. Since then, the regulations imposed to fight against money laundering have been viewed as key components also in the fight against terrorist financing. Terrorism, at the beginning, was a “national” crime connected with internal problems of the State (for instance the RAF in Germany or ETA in Spain) but in the last 20 years has started to be an international problem that is connected with the defence and security of the States. Therefore, the new strategic concept for the defense and security of NATO has a comprehensive list of security threats to the Alliance, such as terrorism, international instability, money laundering or attacks on cyberspace, among others. With this new concept, money laundering and terrorism has become a priority in the national defense. In this work we will analyze the methods to combat these new threats to the national security. We will study the preventive legislations to combat money laundering and financing of terrorism, the UIF that exchange information between States, and the hawala-Banking.Keywords: control of financial flows, money laundering, terrorism, financing of terrorism
Procedia PDF Downloads 45424730 Exploring the Capabilities of Sentinel-1A and Sentinel-2A Data for Landslide Mapping
Authors: Ismayanti Magfirah, Sartohadi Junun, Samodra Guruh
Abstract:
Landslides are one of the most frequent and devastating natural disasters in Indonesia. Many studies have been conducted regarding this phenomenon. However, there is a lack of attention in the landslide inventory mapping. The natural condition (dense forest area) and the limited human and economic resources are some of the major problems in building landslide inventory in Indonesia. Considering the importance of landslide inventory data in susceptibility, hazard, and risk analysis, it is essential to generate landslide inventory based on available resources. In order to achieve this, the first thing we have to do is identify the landslides' location. The presence of Sentinel-1A and Sentinel-2A data gives new insights into land monitoring investigation. The free access, high spatial resolution, and short revisit time, make the data become one of the most trending open sources data used in landslide mapping. Sentinel-1A and Sentinel-2A data have been used broadly for landslide detection and landuse/landcover mapping. This study aims to generate landslide map by integrating Sentinel-1A and Sentinel-2A data use change detection method. The result will be validated by field investigation to make preliminary landslide inventory in the study area.Keywords: change detection method, landslide inventory mapping, Sentinel-1A, Sentinel-2A
Procedia PDF Downloads 17124729 A DEA Model in a Multi-Objective Optimization with Fuzzy Environment
Authors: Michael Gidey Gebru
Abstract:
Most DEA models operate in a static environment with input and output parameters that are chosen by deterministic data. However, due to ambiguity brought on shifting market conditions, input and output data are not always precisely gathered in real-world scenarios. Fuzzy numbers can be used to address this kind of ambiguity in input and output data. Therefore, this work aims to expand crisp DEA into DEA with fuzzy environment. In this study, the input and output data are regarded as fuzzy triangular numbers. Then, the DEA model with fuzzy environment is solved using a multi-objective method to gauge the Decision Making Units’ efficiency. Finally, the developed DEA model is illustrated with an application on real data 50 educational institutions.Keywords: efficiency, DEA, fuzzy, decision making units, higher education institutions
Procedia PDF Downloads 5324728 Foreign Real Estate Investment and the Australian Residential Property Market: A Study on Chinese Investors
Authors: Peng Yew Wong
Abstract:
House prices in the Australian capital cities were at record levels subsequent to Global Financial Crisis (GFC) 2008 and many believed that foreign investors, especially the Chinese investors, were the main reason for the Australian capital cities’ house prices escalation. This research conducted an Australian cross border semi-structured interviews in Shanghai, China to uncover historical evidence and emerging trend supporting the existence of a significant relationship between overseas investors and residential housing markets performance in Australia subsequent to the GFC 2008. Some unique investment strategies of private investors from China which emphasised on non-capitalist factors such as early education were identified, alongside with some insights on the significant China government policies that have incentivised the cross border investments from China. It is believed that this understanding will assist policy makers to effectively manage the overheated Australian residential property market without compromising the steady flow of FREI.Keywords: Australian housing market, residential property, foreign real estate investment, education, China investor
Procedia PDF Downloads 29224727 Data-Driven Decision Making: Justification of Not Leaving Class without It
Authors: Denise Hexom, Judith Menoher
Abstract:
Teachers and administrators across America are being asked to use data and hard evidence to inform practice as they begin the task of implementing Common Core State Standards. Yet, the courses they are taking in schools of education are not preparing teachers or principals to understand the data-driven decision making (DDDM) process nor to utilize data in a much more sophisticated fashion. DDDM has been around for quite some time, however, it has only recently become systematically and consistently applied in the field of education. This paper discusses the theoretical framework of DDDM; empirical evidence supporting the effectiveness of DDDM; a process a department in a school of education has utilized to implement DDDM; and recommendations to other schools of education who attempt to implement DDDM in their decision-making processes and in their students’ coursework.Keywords: data-driven decision making, institute of higher education, special education, continuous improvement
Procedia PDF Downloads 38724726 Quantile Coherence Analysis: Application to Precipitation Data
Authors: Yaeji Lim, Hee-Seok Oh
Abstract:
The coherence analysis measures the linear time-invariant relationship between two data sets and has been studied various fields such as signal processing, engineering, and medical science. However classical coherence analysis tends to be sensitive to outliers and focuses only on mean relationship. In this paper, we generalized cross periodogram to quantile cross periodogram and provide richer inter-relationship between two data sets. This is a general version of Laplace cross periodogram. We prove its asymptotic distribution under the long range process and compare them with ordinary coherence through numerical examples. We also present real data example to confirm the usefulness of quantile coherence analysis.Keywords: coherence, cross periodogram, spectrum, quantile
Procedia PDF Downloads 39024725 Conception of a Predictive Maintenance System for Forest Harvesters from Multiple Data Sources
Authors: Lazlo Fauth, Andreas Ligocki
Abstract:
For cost-effective use of harvesters, expensive repairs and unplanned downtimes must be reduced as far as possible. The predictive detection of failing systems and the calculation of intelligent service intervals, necessary to avoid these factors, require in-depth knowledge of the machines' behavior. Such know-how needs permanent monitoring of the machine state from different technical perspectives. In this paper, three approaches will be presented as they are currently pursued in the publicly funded project PreForst at Ostfalia University of Applied Sciences. These include the intelligent linking of workshop and service data, sensors on the harvester, and a special online hydraulic oil condition monitoring system. Furthermore the paper shows potentials as well as challenges for the use of these data in the conception of a predictive maintenance system.Keywords: predictive maintenance, condition monitoring, forest harvesting, forest engineering, oil data, hydraulic data
Procedia PDF Downloads 14524724 Sampled-Data Control for Fuel Cell Systems
Authors: H. Y. Jung, Ju H. Park, S. M. Lee
Abstract:
A sampled-data controller is presented for solid oxide fuel cell systems which is expressed by a sector bounded nonlinear model. The sector bounded nonlinear systems, which have a feedback connection with a linear dynamical system and nonlinearity satisfying certain sector type constraints. Also, the sampled-data control scheme is very useful since it is possible to handle digital controller and increasing research efforts have been devoted to sampled-data control systems with the development of modern high-speed computers. The proposed control law is obtained by solving a convex problem satisfying several linear matrix inequalities. Simulation results are given to show the effectiveness of the proposed design method.Keywords: sampled-data control, fuel cell, linear matrix inequalities, nonlinear control
Procedia PDF Downloads 56524723 Real Estate Rigidities: The Effect of Cash Transactions and the Impact of Demonetisation on Them
Authors: Dishant Shahi, Aradhya Shandilya, Nand Kumar
Abstract:
We study here the impact of the black component referred to as X component in the text on Real estate transactions. The X component involved not only acts as friction in transaction but also leads to dysfunctionality in the capital market of real estate. The effect of the component is presented by using a model of economy which seeks resemblance with that of India involving property deals. The rigidities which hinder smooth transactions in property or land deals are depicted and their impact on the economy as a whole has been modelled. The effect of subprime crisis (2007) on Indian housing capital market and the role which the X component played during it, is also included in one of the sections. In the entire text, we have utilised 4 Quadrant graphs to study supply and demand causalities involved in commercial real estate. At the end we have included the impact of demonetisation as a move to counter the problem of overvaluation in the property assets arising due to the X component. The case of Demonetisation which has been the latest move by the Indian Government to control huge amount of black money in circulation has been included along with its impact on the housing and rent as well as the capital market.Keywords: X-component, 4Q graph, real estate, capital markets, demonetisation, consumer sentiments
Procedia PDF Downloads 36424722 How Western Donors Allocate Official Development Assistance: New Evidence From a Natural Language Processing Approach
Authors: Daniel Benson, Yundan Gong, Hannah Kirk
Abstract:
Advancement in national language processing techniques has led to increased data processing speeds, and reduced the need for cumbersome, manual data processing that is often required when processing data from multilateral organizations for specific purposes. As such, using named entity recognition (NER) modeling and the Organisation of Economically Developed Countries (OECD) Creditor Reporting System database, we present the first geotagged dataset of OECD donor Official Development Assistance (ODA) projects on a global, subnational basis. Our resulting data contains 52,086 ODA projects geocoded to subnational locations across 115 countries, worth a combined $87.9bn. This represents the first global, OECD donor ODA project database with geocoded projects. We use this new data to revisit old questions of how ‘well’ donors allocate ODA to the developing world. This understanding is imperative for policymakers seeking to improve ODA effectiveness.Keywords: international aid, geocoding, subnational data, natural language processing, machine learning
Procedia PDF Downloads 7924721 Compressed Suffix Arrays to Self-Indexes Based on Partitioned Elias-Fano
Abstract:
A practical and simple self-indexing data structure, Partitioned Elias-Fano (PEF) - Compressed Suffix Arrays (CSA), is built in linear time for the CSA based on PEF indexes. Moreover, the PEF-CSA is compared with two classical compressed indexing methods, Ferragina and Manzini implementation (FMI) and Sad-CSA on different type and size files in Pizza & Chili. The PEF-CSA performs better on the existing data in terms of the compression ratio, count, and locates time except for the evenly distributed data such as proteins data. The observations of the experiments are that the distribution of the φ is more important than the alphabet size on the compression ratio. Unevenly distributed data φ makes better compression effect, and the larger the size of the hit counts, the longer the count and locate time.Keywords: compressed suffix array, self-indexing, partitioned Elias-Fano, PEF-CSA
Procedia PDF Downloads 25224720 Data, Digital Identity and Antitrust Law: An Exploratory Study of Facebook’s Novi Digital Wallet
Authors: Wanjiku Karanja
Abstract:
Facebook has monopoly power in the social networking market. It has grown and entrenched its monopoly power through the capture of its users’ data value chains. However, antitrust law’s consumer welfare roots have prevented it from effectively addressing the role of data capture in Facebook’s market dominance. These regulatory blind spots are augmented in Facebook’s proposed Diem cryptocurrency project and its Novi Digital wallet. Novi, which is Diem’s digital identity component, shall enable Facebook to collect an unprecedented volume of consumer data. Consequently, Novi has seismic implications on internet identity as the network effects of Facebook’s large user base could establish it as the de facto internet identity layer. Moreover, the large tracts of data Facebook shall collect through Novi shall further entrench Facebook's market power. As such, the attendant lock-in effects of this project shall be very difficult to reverse. Urgent regulatory action is therefore required to prevent this expansion of Facebook’s data resources and monopoly power. This research thus highlights the importance of data capture to competition and market health in the social networking industry. It utilizes interviews with key experts to empirically interrogate the impact of Facebook’s data capture and control of its users’ data value chains on its market power. This inquiry is contextualized against Novi’s expansive effect on Facebook’s data value chains. It thus addresses the novel antitrust issues arising at the nexus of Facebook’s monopoly power and the privacy of its users’ data. It also explores the impact of platform design principles, specifically data portability and data portability, in mitigating Facebook’s anti-competitive practices. As such, this study finds that Facebook is a powerful monopoly that dominates the social media industry to the detriment of potential competitors. Facebook derives its power from its size, annexure of the consumer data value chain, and control of its users’ social graphs. Additionally, the platform design principles of data interoperability and data portability are not a panacea to restoring competition in the social networking market. Their success depends on the establishment of robust technical standards and regulatory frameworks.Keywords: antitrust law, data protection law, data portability, data interoperability, digital identity, Facebook
Procedia PDF Downloads 12324719 Recommendations for Data Quality Filtering of Opportunistic Species Occurrence Data
Authors: Camille Van Eupen, Dirk Maes, Marc Herremans, Kristijn R. R. Swinnen, Ben Somers, Stijn Luca
Abstract:
In ecology, species distribution models are commonly implemented to study species-environment relationships. These models increasingly rely on opportunistic citizen science data when high-quality species records collected through standardized recording protocols are unavailable. While these opportunistic data are abundant, uncertainty is usually high, e.g., due to observer effects or a lack of metadata. Data quality filtering is often used to reduce these types of uncertainty in an attempt to increase the value of studies relying on opportunistic data. However, filtering should not be performed blindly. In this study, recommendations are built for data quality filtering of opportunistic species occurrence data that are used as input for species distribution models. Using an extensive database of 5.7 million citizen science records from 255 species in Flanders, the impact on model performance was quantified by applying three data quality filters, and these results were linked to species traits. More specifically, presence records were filtered based on record attributes that provide information on the observation process or post-entry data validation, and changes in the area under the receiver operating characteristic (AUC), sensitivity, and specificity were analyzed using the Maxent algorithm with and without filtering. Controlling for sample size enabled us to study the combined impact of data quality filtering, i.e., the simultaneous impact of an increase in data quality and a decrease in sample size. Further, the variation among species in their response to data quality filtering was explored by clustering species based on four traits often related to data quality: commonness, popularity, difficulty, and body size. Findings show that model performance is affected by i) the quality of the filtered data, ii) the proportional reduction in sample size caused by filtering and the remaining absolute sample size, and iii) a species ‘quality profile’, resulting from a species classification based on the four traits related to data quality. The findings resulted in recommendations on when and how to filter volunteer generated and opportunistically collected data. This study confirms that correctly processed citizen science data can make a valuable contribution to ecological research and species conservation.Keywords: citizen science, data quality filtering, species distribution models, trait profiles
Procedia PDF Downloads 20324718 Data Quality Enhancement with String Length Distribution
Authors: Qi Xiu, Hiromu Hota, Yohsuke Ishii, Takuya Oda
Abstract:
Recently, collectable manufacturing data are rapidly increasing. On the other hand, mega recall is getting serious as a social problem. Under such circumstances, there are increasing needs for preventing mega recalls by defect analysis such as root cause analysis and abnormal detection utilizing manufacturing data. However, the time to classify strings in manufacturing data by traditional method is too long to meet requirement of quick defect analysis. Therefore, we present String Length Distribution Classification method (SLDC) to correctly classify strings in a short time. This method learns character features, especially string length distribution from Product ID, Machine ID in BOM and asset list. By applying the proposal to strings in actual manufacturing data, we verified that the classification time of strings can be reduced by 80%. As a result, it can be estimated that the requirement of quick defect analysis can be fulfilled.Keywords: string classification, data quality, feature selection, probability distribution, string length
Procedia PDF Downloads 31824717 Temporally Coherent 3D Animation Reconstruction from RGB-D Video Data
Authors: Salam Khalifa, Naveed Ahmed
Abstract:
We present a new method to reconstruct a temporally coherent 3D animation from single or multi-view RGB-D video data using unbiased feature point sampling. Given RGB-D video data, in form of a 3D point cloud sequence, our method first extracts feature points using both color and depth information. In the subsequent steps, these feature points are used to match two 3D point clouds in consecutive frames independent of their resolution. Our new motion vectors based dynamic alignment method then fully reconstruct a spatio-temporally coherent 3D animation. We perform extensive quantitative validation using novel error functions to analyze the results. We show that despite the limiting factors of temporal and spatial noise associated to RGB-D data, it is possible to extract temporal coherence to faithfully reconstruct a temporally coherent 3D animation from RGB-D video data.Keywords: 3D video, 3D animation, RGB-D video, temporally coherent 3D animation
Procedia PDF Downloads 37324716 Determining Abnomal Behaviors in UAV Robots for Trajectory Control in Teleoperation
Authors: Kiwon Yeom
Abstract:
Change points are abrupt variations in a data sequence. Detection of change points is useful in modeling, analyzing, and predicting time series in application areas such as robotics and teleoperation. In this paper, a change point is defined to be a discontinuity in one of its derivatives. This paper presents a reliable method for detecting discontinuities within a three-dimensional trajectory data. The problem of determining one or more discontinuities is considered in regular and irregular trajectory data from teleoperation. We examine the geometric detection algorithm and illustrate the use of the method on real data examples.Keywords: change point, discontinuity, teleoperation, abrupt variation
Procedia PDF Downloads 16724715 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs
Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro
Abstract:
This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression
Procedia PDF Downloads 44324714 Analysis of Production Forecasting in Unconventional Gas Resources Development Using Machine Learning and Data-Driven Approach
Authors: Dongkwon Han, Sangho Kim, Sunil Kwon
Abstract:
Unconventional gas resources have dramatically changed the future energy landscape. Unlike conventional gas resources, the key challenges in unconventional gas have been the requirement that applies to advanced approaches for production forecasting due to uncertainty and complexity of fluid flow. In this study, artificial neural network (ANN) model which integrates machine learning and data-driven approach was developed to predict productivity in shale gas. The database of 129 wells of Eagle Ford shale basin used for testing and training of the ANN model. The Input data related to hydraulic fracturing, well completion and productivity of shale gas were selected and the output data is a cumulative production. The performance of the ANN using all data sets, clustering and variables importance (VI) models were compared in the mean absolute percentage error (MAPE). ANN model using all data sets, clustering, and VI were obtained as 44.22%, 10.08% (cluster 1), 5.26% (cluster 2), 6.35%(cluster 3), and 32.23% (ANN VI), 23.19% (SVM VI), respectively. The results showed that the pre-trained ANN model provides more accurate results than the ANN model using all data sets.Keywords: unconventional gas, artificial neural network, machine learning, clustering, variables importance
Procedia PDF Downloads 19624713 Prevalence and Molecular Characterization of Extended-Spectrum–β Lactamase and Carbapenemase-Producing Enterobacterales from Tunisian Seafood
Authors: Mehdi Soula, Yosra Mani, Estelle Saras, Antoine Drapeau, Raoudha Grami, Mahjoub Aouni, Jean-Yves Madec, Marisa Haenni, Wejdene Mansour
Abstract:
Multi-resistance to antibiotics in gram-negative bacilli and particularly in enterobacteriaceae, has become frequent in hospitals in Tunisia. However, data on antibiotic resistant bacteria in aquatic products are scarce. The aims of this study are to estimate the proportion of ESBL- and carbapenemase-producing Enterobacterales in seafood (clams and fish) in Tunisia and to molecularly characterize the collected isolates. Two types of seafood were sampled in unrelated markets in four different regions in Tunisia (641 pieces of farmed fish and 1075 mediterranean clams divided into 215 pools, and each pool contained 5 pieces). Once purchased, all samples were incubated in tubes containing peptone salt broth for 24 to 48h at 37°C. After incubation, overnight cultures were isolated on selective MacConkey agar plates supplemented with either imipenem or cefotaxime, identified using API20E test strips (bioMérieux, Marcy-l’Étoile, France) and confirmed by Maldi-TOF MS. Antimicrobial susceptibility was determined by the disk diffusion method on Mueller-Hinton agar plates and results were interpreted according to CA-SFM 2021. ESBL-producing Enterobacterales were detected using the Double Disc Synergy Test (DDST). Carbapenem-resistance was detected using an ertapenem disk and was respectively confirmed using the ROSCO KPC/MBL and OXA-48 Confirm Kit (ROSCO Diagnostica, Taastrup, Denmark). DNA was extracted using a NucleoSpin Microbial DNA extraction kit (Macherey-Nagel, Hoerdt, France), according to the manufacturer’s instructions. Resistance genes were determined using the CGE online tools. The replicon content and plasmid formula were identified from the WGS data using PlasmidFinder 2.0.1 and pMLST 2.0. From farmed fishes, nine ESBL-producing strains (9/641, 1.4%) were isolated, which were identified as E. coli (n=6) and K. pneumoniae (n=3). Among the 215 pools of 5 clams analyzed, 18 ESBL-producing isolates were identified, including 14 E. coli and 4 K. pneumoniae. A low isolation rate of ESBL-producing Enterobacterales was detected 1.6% (18/1075) in clam pools. In fish, the ESBL phenotype was due to the presence of the blaCTX-M-15 gene in all nine isolates, but no carbapenemase gene was identified. In clams, the predominant ESBL phenotype was blaCTX-M-1 (n=6/18). blaCPE (NDM1, OXA48) was detected only in 3 isolates ‘K. pneumoniae isolates’. Replicon typing on the strains carring the ESBL and carbapenemase gene revelead that the major type plasmid carried ESBL were IncF (42.3%) [n=11/26]. In all, our results suggest that seafood can be a reservoir of multi-drug resistant bacteria, most probably of human origin but also by the selection pressure of antibiotic. Our findings raise concerns that seafood bought for consumption may serve as potential reservoirs of AMR genes and pose serious threat to public health.Keywords: BLSE, carbapenemase, enterobacterales, tunisian seafood
Procedia PDF Downloads 10924712 Procedure Model for Data-Driven Decision Support Regarding the Integration of Renewable Energies into Industrial Energy Management
Authors: M. Graus, K. Westhoff, X. Xu
Abstract:
The climate change causes a change in all aspects of society. While the expansion of renewable energies proceeds, industry could not be convinced based on general studies about the potential of demand side management to reinforce smart grid considerations in their operational business. In this article, a procedure model for a case-specific data-driven decision support for industrial energy management based on a holistic data analytics approach is presented. The model is executed on the example of the strategic decision problem, to integrate the aspect of renewable energies into industrial energy management. This question is induced due to considerations of changing the electricity contract model from a standard rate to volatile energy prices corresponding to the energy spot market which is increasingly more affected by renewable energies. The procedure model corresponds to a data analytics process consisting on a data model, analysis, simulation and optimization step. This procedure will help to quantify the potentials of sustainable production concepts based on the data from a factory. The model is validated with data from a printer in analogy to a simple production machine. The overall goal is to establish smart grid principles for industry via the transformation from knowledge-driven to data-driven decisions within manufacturing companies.Keywords: data analytics, green production, industrial energy management, optimization, renewable energies, simulation
Procedia PDF Downloads 43524711 Dissimilarity-Based Coloring for Symbolic and Multivariate Data Visualization
Authors: K. Umbleja, M. Ichino, H. Yaguchi
Abstract:
In this paper, we propose a coloring method for multivariate data visualization by using parallel coordinates based on dissimilarity and tree structure information gathered during hierarchical clustering. The proposed method is an extension for proximity-based coloring that suffers from a few undesired side effects if hierarchical tree structure is not balanced tree. We describe the algorithm by assigning colors based on dissimilarity information, show the application of proposed method on three commonly used datasets, and compare the results with proximity-based coloring. We found our proposed method to be especially beneficial for symbolic data visualization where many individual objects have already been aggregated into a single symbolic object.Keywords: data visualization, dissimilarity-based coloring, proximity-based coloring, symbolic data
Procedia PDF Downloads 17024710 The Impact of Data Science on Geography: A Review
Authors: Roberto Machado
Abstract:
We conducted a systematic review using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses methodology, analyzing 2,996 studies and synthesizing 41 of them to explore the evolution of data science and its integration into geography. By employing optimization algorithms, we accelerated the review process, significantly enhancing the efficiency and precision of literature selection. Our findings indicate that data science has developed over five decades, facing challenges such as the diversified integration of data and the need for advanced statistical and computational skills. In geography, the integration of data science underscores the importance of interdisciplinary collaboration and methodological innovation. Techniques like large-scale spatial data analysis and predictive algorithms show promise in natural disaster management and transportation route optimization, enabling faster and more effective responses. These advancements highlight the transformative potential of data science in geography, providing tools and methodologies to address complex spatial problems. The relevance of this study lies in the use of optimization algorithms in systematic reviews and the demonstrated need for deeper integration of data science into geography. Key contributions include identifying specific challenges in combining diverse spatial data and the necessity for advanced computational skills. Examples of connections between these two fields encompass significant improvements in natural disaster management and transportation efficiency, promoting more effective and sustainable environmental solutions with a positive societal impact.Keywords: data science, geography, systematic review, optimization algorithms, supervised learning
Procedia PDF Downloads 3024709 Developing Structured Sizing Systems for Manufacturing Ready-Made Garments of Indian Females Using Decision Tree-Based Data Mining
Authors: Hina Kausher, Sangita Srivastava
Abstract:
In India, there is a lack of standard, systematic sizing approach for producing readymade garments. Garments manufacturing companies use their own created size tables by modifying international sizing charts of ready-made garments. The purpose of this study is to tabulate the anthropometric data which covers the variety of figure proportions in both height and girth. 3,000 data has been collected by an anthropometric survey undertaken over females between the ages of 16 to 80 years from some states of India to produce the sizing system suitable for clothing manufacture and retailing. This data is used for the statistical analysis of body measurements, the formulation of sizing systems and body measurements tables. Factor analysis technique is used to filter the control body dimensions from a large number of variables. Decision tree-based data mining is used to cluster the data. The standard and structured sizing system can facilitate pattern grading and garment production. Moreover, it can exceed buying ratios and upgrade size allocations to retail segments.Keywords: anthropometric data, data mining, decision tree, garments manufacturing, sizing systems, ready-made garments
Procedia PDF Downloads 13424708 A Framework on Data and Remote Sensing for Humanitarian Logistics
Authors: Vishnu Nagendra, Marten Van Der Veen, Stefania Giodini
Abstract:
Effective humanitarian logistics operations are a cornerstone in the success of disaster relief operations. However, for effectiveness, they need to be demand driven and supported by adequate data for prioritization. Without this data operations are carried out in an ad hoc manner and eventually become chaotic. The current availability of geospatial data helps in creating models for predictive damage and vulnerability assessment, which can be of great advantage to logisticians to gain an understanding on the nature and extent of the disaster damage. This translates into actionable information on the demand for relief goods, the state of the transport infrastructure and subsequently the priority areas for relief delivery. However, due to the unpredictable nature of disasters, the accuracy in the models need improvement which can be done using remote sensing data from UAVs (Unmanned Aerial Vehicles) or satellite imagery, which again come with certain limitations. This research addresses the need for a framework to combine data from different sources to support humanitarian logistic operations and prediction models. The focus is on developing a workflow to combine data from satellites and UAVs post a disaster strike. A three-step approach is followed: first, the data requirements for logistics activities are made explicit, which is done by carrying out semi-structured interviews with on field logistics workers. Second, the limitations in current data collection tools are analyzed to develop workaround solutions by following a systems design approach. Third, the data requirements and the developed workaround solutions are fit together towards a coherent workflow. The outcome of this research will provide a new method for logisticians to have immediately accurate and reliable data to support data-driven decision making.Keywords: unmanned aerial vehicles, damage prediction models, remote sensing, data driven decision making
Procedia PDF Downloads 379