Search results for: data mining analytics
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25010

Search results for: data mining analytics

24170 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 557
24169 PDDA: Priority-Based, Dynamic Data Aggregation Approach for Sensor-Based Big Data Framework

Authors: Lutful Karim, Mohammed S. Al-kahtani

Abstract:

Sensors are being used in various applications such as agriculture, health monitoring, air and water pollution monitoring, traffic monitoring and control and hence, play the vital role in the growth of big data. However, sensors collect redundant data. Thus, aggregating and filtering sensors data are significantly important to design an efficient big data framework. Current researches do not focus on aggregating and filtering data at multiple layers of sensor-based big data framework. Thus, this paper introduces (i) three layers data aggregation and framework for big data and (ii) a priority-based, dynamic data aggregation scheme (PDDA) for the lowest layer at sensors. Simulation results show that the PDDA outperforms existing tree and cluster-based data aggregation scheme in terms of overall network energy consumptions and end-to-end data transmission delay.

Keywords: big data, clustering, tree topology, data aggregation, sensor networks

Procedia PDF Downloads 327
24168 Advancement of Computer Science Research in Nigeria: A Bibliometric Analysis of the Past Three Decades

Authors: Temidayo O. Omotehinwa, David O. Oyewola, Friday J. Agbo

Abstract:

This study aims to gather a proper perspective of the development landscape of Computer Science research in Nigeria. Therefore, a bibliometric analysis of 4,333 bibliographic records of Computer Science research in Nigeria in the last 31 years (1991-2021) was carried out. The bibliographic data were extracted from the Scopus database and analyzed using VOSviewer and the bibliometrix R package through the biblioshiny web interface. The findings of this study revealed that Computer Science research in Nigeria has a growth rate of 24.19%. The most developed and well-studied research areas in the Computer Science field in Nigeria are machine learning, data mining, and deep learning. The social structure analysis result revealed that there is a need for improved international collaborations. Sparsely established collaborations are largely influenced by geographic proximity. The funding analysis result showed that Computer Science research in Nigeria is under-funded. The findings of this study will be useful for researchers conducting Computer Science related research. Experts can gain insights into how to develop a strategic framework that will advance the field in a more impactful manner. Government agencies and policymakers can also utilize the outcome of this research to develop strategies for improved funding for Computer Science research.

Keywords: bibliometric analysis, biblioshiny, computer science, Nigeria, science mapping

Procedia PDF Downloads 93
24167 Uncertainty Quantification of Corrosion Anomaly Length of Oil and Gas Steel Pipelines Based on Inline Inspection and Field Data

Authors: Tammeen Siraj, Wenxing Zhou, Terry Huang, Mohammad Al-Amin

Abstract:

The high resolution inline inspection (ILI) tool is used extensively in the pipeline industry to identify, locate, and measure metal-loss corrosion anomalies on buried oil and gas steel pipelines. Corrosion anomalies may occur singly (i.e. individual anomalies) or as clusters (i.e. a colony of corrosion anomalies). Although the ILI technology has advanced immensely, there are measurement errors associated with the sizes of corrosion anomalies reported by ILI tools due limitations of the tools and associated sizing algorithms, and detection threshold of the tools (i.e. the minimum detectable feature dimension). Quantifying the measurement error in the ILI data is crucial for corrosion management and developing maintenance strategies that satisfy the safety and economic constraints. Studies on the measurement error associated with the length of the corrosion anomalies (in the longitudinal direction of the pipeline) has been scarcely reported in the literature and will be investigated in the present study. Limitations in the ILI tool and clustering process can sometimes cause clustering error, which is defined as the error introduced during the clustering process by including or excluding a single or group of anomalies in or from a cluster. Clustering error has been found to be one of the biggest contributory factors for relatively high uncertainties associated with ILI reported anomaly length. As such, this study focuses on developing a consistent and comprehensive framework to quantify the measurement errors in the ILI-reported anomaly length by comparing the ILI data and corresponding field measurements for individual and clustered corrosion anomalies. The analysis carried out in this study is based on the ILI and field measurement data for a set of anomalies collected from two segments of a buried natural gas pipeline currently in service in Alberta, Canada. Data analyses showed that the measurement error associated with the ILI-reported length of the anomalies without clustering error, denoted as Type I anomalies is markedly less than that for anomalies with clustering error, denoted as Type II anomalies. A methodology employing data mining techniques is further proposed to classify the Type I and Type II anomalies based on the ILI-reported corrosion anomaly information.

Keywords: clustered corrosion anomaly, corrosion anomaly assessment, corrosion anomaly length, individual corrosion anomaly, metal-loss corrosion, oil and gas steel pipeline

Procedia PDF Downloads 298
24166 Blue Economy and Marine Mining

Authors: Fani Sakellariadou

Abstract:

The Blue Economy includes all marine-based and marine-related activities. They correspond to established, emerging as well as unborn ocean-based industries. Seabed mining is an emerging marine-based activity; its operations depend particularly on cutting-edge science and technology. The 21st century will face a crisis in resources as a consequence of the world’s population growth and the rising standard of living. The natural capital stored in the global ocean is decisive for it to provide a wide range of sustainable ecosystem services. Seabed mineral deposits were identified as having a high potential for critical elements and base metals. They have a crucial role in the fast evolution of green technologies. The major categories of marine mineral deposits are deep-sea deposits, including cobalt-rich ferromanganese crusts, polymetallic nodules, phosphorites, and deep-sea muds, as well as shallow-water deposits including marine placers. Seabed mining operations may take place within continental shelf areas of nation-states. In international waters, the International Seabed Authority (ISA) has entered into 15-year contracts for deep-seabed exploration with 21 contractors. These contracts are for polymetallic nodules (18 contracts), polymetallic sulfides (7 contracts), and cobalt-rich ferromanganese crusts (5 contracts). Exploration areas are located in the Clarion-Clipperton Zone, the Indian Ocean, the Mid Atlantic Ridge, the South Atlantic Ocean, and the Pacific Ocean. Potential environmental impacts of deep-sea mining include habitat alteration, sediment disturbance, plume discharge, toxic compounds release, light and noise generation, and air emissions. They could cause burial and smothering of benthic species, health problems for marine species, biodiversity loss, reduced photosynthetic mechanism, behavior change and masking acoustic communication for mammals and fish, heavy metals bioaccumulation up the food web, decrease of the content of dissolved oxygen, and climate change. An important concern related to deep-sea mining is our knowledge gap regarding deep-sea bio-communities. The ecological consequences that will be caused in the remote, unique, fragile, and little-understood deep-sea ecosystems and inhabitants are still largely unknown. The blue economy conceptualizes oceans as developing spaces supplying socio-economic benefits for current and future generations but also protecting, supporting, and restoring biodiversity and ecological productivity. In that sense, people should apply holistic management and make an assessment of marine mining impacts on ecosystem services, including the categories of provisioning, regulating, supporting, and cultural services. The variety in environmental parameters, the range in sea depth, the diversity in the characteristics of marine species, and the possible proximity to other existing maritime industries cause a span of marine mining impact the ability of ecosystems to support people and nature. In conclusion, the use of the untapped potential of the global ocean demands a liable and sustainable attitude. Moreover, there is a need to change our lifestyle and move beyond the philosophy of single-use. Living in a throw-away society based on a linear approach to resource consumption, humans are putting too much pressure on the natural environment. Applying modern, sustainable and eco-friendly approaches according to the principle of circular economy, a substantial amount of natural resource savings will be achieved. Acknowledgement: This work is part of the MAREE project, financially supported by the Division VI of IUPAC. This work has been partly supported by the University of Piraeus Research Center.

Keywords: blue economy, deep-sea mining, ecosystem services, environmental impacts

Procedia PDF Downloads 69
24165 A Schema of Building an Efficient Quality Gate throughout the Software Development with Tools

Authors: Le Chen

Abstract:

This paper presents an efficient tool platform scheme to ensure quality protection throughout the software development process. The main principle is to manage the information of requirements, design, development, testing, operation and maintenance process with proper tools, and to set up the quality standards of each process. Through the tools’ display and summary of quality standards, the quality standards can be visualizad and ready for policy decision, which is called Quality Gate in this paper. In addition, the tools are also integrated to achieve the exchange and relation of information which highly improving operational efficiency. In this paper, the feasibility of the scheme is verified by practical application of development projects, and the overall information display and data mining are proposed to be further improved.

Keywords: efficiency, quality gate, software process, tools

Procedia PDF Downloads 345
24164 Analyzing Factors Impacting COVID-19 Vaccination Rates

Authors: Dongseok Cho, Mitchell Driedger, Sera Han, Noman Khan, Mohammed Elmorsy, Mohamad El-Hajj

Abstract:

Since the approval of the COVID-19 vaccine in late 2020, vaccination rates have varied around the globe. Access to a vaccine supply, mandated vaccination policy, and vaccine hesitancy contribute to these rates. This study used COVID-19 vaccination data from Our World in Data and the Multilateral Leaders Task Force on COVID-19 to create two COVID-19 vaccination indices. The first index is the Vaccine Utilization Index (VUI), which measures how effectively each country has utilized its vaccine supply to doubly vaccinate its population. The second index is the Vaccination Acceleration Index (VAI), which evaluates how efficiently each country vaccinated its population within its first 150 days. Pearson correlations were created between these indices and country indicators obtained from the World Bank. The results of these correlations identify countries with stronger health indicators, such as lower mortality rates, lower age dependency ratios, and higher rates of immunization to other diseases, displaying higher VUI and VAI scores than countries with lesser values. VAI scores are also positively correlated to Governance and Economic indicators, such as regulatory quality, control of corruption, and GDP per capita. As represented by the VUI, proper utilization of the COVID-19 vaccine supply by country is observed in countries that display excellence in health practices. A country’s motivation to accelerate its vaccination rates within the first 150 days of vaccinating, as represented by the VAI, was largely a product of the governing body’s effectiveness and economic status, as well as overall excellence in health practises.

Keywords: data mining, Pearson correlation, COVID-19, vaccination rates and hesitancy

Procedia PDF Downloads 101
24163 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu

Abstract:

Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 380
24162 Leveraging Advanced Technologies and Data to Eliminate Abandoned, Lost, or Otherwise Discarded Fishing Gear and Derelict Fishing Gear

Authors: Grant Bifolchi

Abstract:

As global environmental problems continue to have highly adverse effects, finding long-term, sustainable solutions to combat ecological distress are of growing paramount concern. Ghost Gear—also known as abandoned, lost or otherwise discarded fishing gear (ALDFG) and derelict fishing gear (DFG)—represents one of the greatest threats to the world’s oceans, posing a significant hazard to human health, livelihoods, and global food security. In fact, according to the UN Food and Agriculture Organization (FAO), abandoned, lost and discarded fishing gear represents approximately 10% of marine debris by volume. Around the world, many governments, governmental and non-profit organizations are doing their best to manage the reporting and retrieval of nets, lines, ropes, traps, floats and more from their respective bodies of water. However, these organizations’ ability to effectively manage files and documents about the environmental problem further complicates matters. In Ghost Gear monitoring and management, organizations face additional complexities. Whether it’s data ingest, industry regulations and standards, garnering actionable insights into the location, security, and management of data, or the application of enforcement due to disparate data—all of these factors are placing massive strains on organizations struggling to save the planet from the dangers of Ghost Gear. In this 90-minute educational session, globally recognized Ghost Gear technology expert Grant Bifolchi CET, BBA, Bcom, will provide real-world insight into how governments currently manage Ghost Gear and the technology that can accelerate success in combatting ALDFG and DFG. In this session, attendees will learn how to: • Identify specific technologies to solve the ingest and management of Ghost Gear data categories, including type, geo-location, size, ownership, regional assignment, collection and disposal. • Provide enhanced access to authorities, fisheries, independent fishing vessels, individuals, etc., while securely controlling confidential and privileged data to globally recognized standards. • Create and maintain processing accuracy to effectively track ALDFG/DFG reporting progress—including acknowledging receipt of the report and sharing it with all pertinent stakeholders to ensure approvals are secured. • Enable and utilize Business Intelligence (BI) and Analytics to store and analyze data to optimize organizational performance, maintain anytime-visibility of report status, user accountability, scheduling, management, and foster governmental transparency. • Maintain Compliance Reporting through highly defined, detailed and automated reports—enabling all stakeholders to share critical insights with internal colleagues, regulatory agencies, and national and international partners.

Keywords: ghost gear, ALDFG, DFG, abandoned, lost or otherwise discarded fishing gear, data, technology

Procedia PDF Downloads 81
24161 The Implementation of Corporate Social Responsibility to Contribute the Isolated District and the Drop behind District to Overcome the Poverty, Study Cases: PT. Kaltim Prima Coal (KPC) Sanggata, East Borneo, Indonesia

Authors: Sri Suryaningsum

Abstract:

The achievement ‘Best Practice Model’ holds by the government on behalf of the success implementation corporate social responsibility program that held on PT. Kaltim Prima Coal which had operation located in the isolated district in Sanggata, it could be the reference for the other companies to improve the social welfare in surrounding area, especially for the companies that have operated in the isolated area in Indonesia. The rule of Kaltim Prima Coal as the catalyst in the development progress to push up the independence of district especially for the district which has located in surrounding mining operation from village level to the regency level, those programs had written in the 7 field program in Corporate Social Responsibility, it was doing by stakeholders. The stakeholders are village government, sub-district government, Regency and citizen. One of the best programs that implement at PT. Kaltim Prima Coal is Regarding Resettlement that was completed based on Asian Development Bank Resettlement Best Practice and International Financial Corporation Resettlement Action Plan. This program contributed on the resettlement residences to develop the isolated and the neglected district.

Keywords: CSR, isolated, neglected, poverty, mining industry

Procedia PDF Downloads 237
24160 Human Digital Twin for Personal Conversation Automation Using Supervised Machine Learning Approaches

Authors: Aya Salama

Abstract:

Digital Twin is an emerging research topic that attracted researchers in the last decade. It is used in many fields, such as smart manufacturing and smart healthcare because it saves time and money. It is usually related to other technologies such as Data Mining, Artificial Intelligence, and Machine Learning. However, Human digital twin (HDT), in specific, is still a novel idea that still needs to prove its feasibility. HDT expands the idea of Digital Twin to human beings, which are living beings and different from the inanimate physical entities. The goal of this research was to create a Human digital twin that is responsible for real-time human replies automation by simulating human behavior. For this reason, clustering, supervised classification, topic extraction, and sentiment analysis were studied in this paper. The feasibility of the HDT for personal replies generation on social messaging applications was proved in this work. The overall accuracy of the proposed approach in this paper was 63% which is a very promising result that can open the way for researchers to expand the idea of HDT. This was achieved by using Random Forest for clustering the question data base and matching new questions. K-nearest neighbor was also applied for sentiment analysis.

Keywords: human digital twin, sentiment analysis, topic extraction, supervised machine learning, unsupervised machine learning, classification, clustering

Procedia PDF Downloads 78
24159 Development of Prediction Models of Day-Ahead Hourly Building Electricity Consumption and Peak Power Demand Using the Machine Learning Method

Authors: Dalin Si, Azizan Aziz, Bertrand Lasternas

Abstract:

To encourage building owners to purchase electricity at the wholesale market and reduce building peak demand, this study aims to develop models that predict day-ahead hourly electricity consumption and demand using artificial neural network (ANN) and support vector machine (SVM). All prediction models are built in Python, with tool Scikit-learn and Pybrain. The input data for both consumption and demand prediction are time stamp, outdoor dry bulb temperature, relative humidity, air handling unit (AHU), supply air temperature and solar radiation. Solar radiation, which is unavailable a day-ahead, is predicted at first, and then this estimation is used as an input to predict consumption and demand. Models to predict consumption and demand are trained in both SVM and ANN, and depend on cooling or heating, weekdays or weekends. The results show that ANN is the better option for both consumption and demand prediction. It can achieve 15.50% to 20.03% coefficient of variance of root mean square error (CVRMSE) for consumption prediction and 22.89% to 32.42% CVRMSE for demand prediction, respectively. To conclude, the presented models have potential to help building owners to purchase electricity at the wholesale market, but they are not robust when used in demand response control.

Keywords: building energy prediction, data mining, demand response, electricity market

Procedia PDF Downloads 306
24158 Powering Connections: Synergizing Sales and Marketing for Electronics Engineering with Web Development.

Authors: Muhammad Awais Kiani, Abdul Basit Kiani, Maryam Kiani

Abstract:

Synergizing Sales and Marketing for Electronics Engineering with Web Development, explores the dynamic relationship between sales, marketing, and web development within the electronics engineering industry. This study is important for the power of digital platforms to connect with customers. Which increases brand visibility and drives sales. It highlights the need for collaboration between sales and marketing teams, as well as the integration of web development strategies to create seamless user experiences and effective lead generation. Furthermore, It also emphasizes the role of data analytics and customer insights in optimizing sales and marketing efforts in the ever-evolving landscape of electronics engineering. Sales and marketing play a crucial role in driving business growth, and in today's digital landscape, web development has become an integral part of these strategies. Web development enables businesses to create visually appealing and user-friendly websites that effectively showcase their products or services. It allows for the integration of e-commerce functionalities, enabling seamless online transactions. Furthermore, web development helps businesses optimize their online presence through search engine optimization (SEO) techniques, social media integration, and content management systems. This abstract highlights the symbiotic relationship between sales marketing in the electronics industry and web development, emphasizing the importance of a strong online presence in achieving business success.

Keywords: electronics industry, web development, sales, marketing

Procedia PDF Downloads 100
24157 Business Intelligent to a Decision Support Tool for Green Entrepreneurship: Meso and Macro Regions

Authors: Anishur Rahman, Maria Areias, Diogo Simões, Ana Figeuiredo, Filipa Figueiredo, João Nunes

Abstract:

The circular economy (CE) has gained increased awareness among academics, businesses, and decision-makers as it stimulates resource circularity in the production and consumption systems. A large epistemological study has explored the principles of CE, but scant attention eagerly focused on analysing how CE is evaluated, consented to, and enforced using economic metabolism data and business intelligent framework. Economic metabolism involves the ongoing exchange of materials and energy within and across socio-economic systems and requires the assessment of vast amounts of data to provide quantitative analysis related to effective resource management. Limited concern, the present work has focused on the regional flows pilot region from Portugal. By addressing this gap, this study aims to promote eco-innovation and sustainability in the regions of Intermunicipal Communities Região de Coimbra, Viseu Dão Lafões and Beiras e Serra da Estrela, using this data to find precise synergies in terms of material flows and give companies a competitive advantage in form of valuable waste destinations, access to new resources and new markets, cost reduction and risk sharing benefits. In our work, emphasis on applying artificial intelligence (AI) and, more specifically, on implementing state-of-the-art deep learning algorithms is placed, contributing to construction a business intelligent approach. With the emergence of new approaches generally highlighted under the sub-heading of AI and machine learning (ML), the methods for statistical analysis of complex and uncertain production systems are facing significant changes. Therefore, various definitions of AI and its differences from traditional statistics are presented, and furthermore, ML is introduced to identify its place in data science and the differences in topics such as big data analytics and in production problems that using AI and ML are identified. A lifecycle-based approach is then taken to analyse the use of different methods in each phase to identify the most useful technologies and unifying attributes of AI in manufacturing. Most of macroeconomic metabolisms models are mainly direct to contexts of large metropolis, neglecting rural territories, so within this project, a dynamic decision support model coupled with artificial intelligence tools and information platforms will be developed, focused on the reality of these transition zones between the rural and urban. Thus, a real decision support tool is under development, which will surpass the scientific developments carried out to date and will allow to overcome imitations related to the availability and reliability of data.

Keywords: circular economy, artificial intelligence, economic metabolisms, machine learning

Procedia PDF Downloads 56
24156 The Investigation of Enzymatic Activity in the Soils Under the Impact of Metallurgical Industrial Activity in Lori Marz, Armenia

Authors: T. H. Derdzyan, K. A. Ghazaryan, G. A. Gevorgyan

Abstract:

Beta-glucosidase, chitinase, leucine-aminopeptidase, acid phosphomonoestearse and acetate-esterase enzyme activities in the soils under the impact of metallurgical industrial activity in Lori marz (district) were investigated. The results of the study showed that the activities of the investigated enzymes in the soils decreased with increasing distance from the Shamlugh copper mine, the Chochkan tailings storage facility and the ore transportation road. Statistical analysis revealed that the activities of the enzymes were positively correlated (significant) to each other according to the observation sites which indicated that enzyme activities were affected by the same anthropogenic factor. The investigations showed that the soils were polluted with heavy metals (Cu, Pb, As, Co, Ni, Zn) due to copper mining activity in this territory. The results of Pearson correlation analysis revealed a significant negative correlation between heavy metal pollution degree (Nemerow integrated pollution index) and soil enzyme activity. All of this indicated that copper mining activity in this territory causing the heavy metal pollution of the soils resulted in the inhabitation of the activities of the enzymes which are considered as biological catalysts to decompose organic materials and facilitate the cycling of nutrients.

Keywords: Armenia, metallurgical industrial activity, heavy metal pollutionl, soil enzyme activity

Procedia PDF Downloads 282
24155 Aviation versus Aerospace: A Differential Analysis of Workforce Jobs via Text Mining

Authors: Sarah Werner, Michael J. Pritchard

Abstract:

From pilots to engineers, the skills development within the aerospace industry is exceptionally broad. Employers often struggle with finding the right mixture of qualified skills to fill their organizational demands. This effort to find qualified talent is further complicated by the industrial delineation between two key areas: aviation and aerospace. In a broad sense, the aerospace industry overlaps with the aviation industry. In turn, the aviation industry is a smaller sector segment within the context of the broader definition of the aerospace industry. Furthermore, it could be conceptually argued that -in practice- there is little distinction between these two sectors (i.e., aviation and aerospace). However, through our unstructured text analysis of over 6,000 job listings captured, our team found a clear delineation between aviation-related jobs and aerospace-related jobs. Using techniques in natural language processing, our research identifies an integrated workforce skill pattern that clearly breaks between these two sectors. While the aviation sector has largely maintained its need for pilots, mechanics, and associated support personnel, the staffing needs of the aerospace industry are being progressively driven by integrative engineering needs. Increasingly, this is leading many aerospace-based organizations towards the acquisition of 'system level' staffing requirements. This research helps to better align higher educational institutions with the current industrial staffing complexities within the broader aerospace sector.

Keywords: aerospace industry, job demand, text mining, workforce development

Procedia PDF Downloads 251
24154 Development of a Technology Assessment Model by Patents and Customers' Review Data

Authors: Kisik Song, Sungjoo Lee

Abstract:

Recent years have seen an increasing number of patent disputes due to excessive competition in the global market and a reduced technology life-cycle; this has increased the risk of investment in technology development. While many global companies have started developing a methodology to identify promising technologies and assess for decisions, the existing methodology still has some limitations. Post hoc assessments of the new technology are not being performed, especially to determine whether the suggested technologies turned out to be promising. For example, in existing quantitative patent analysis, a patent’s citation information has served as an important metric for quality assessment, but this analysis cannot be applied to recently registered patents because such information accumulates over time. Therefore, we propose a new technology assessment model that can replace citation information and positively affect technological development based on post hoc analysis of the patents for promising technologies. Additionally, we collect customer reviews on a target technology to extract keywords that show the customers’ needs, and we determine how many keywords are covered in the new technology. Finally, we construct a portfolio (based on a technology assessment from patent information) and a customer-based marketability assessment (based on review data), and we use them to visualize the characteristics of the new technologies.

Keywords: technology assessment, patents, citation information, opinion mining

Procedia PDF Downloads 449
24153 Applying Arima Data Mining Techniques to ERP to Generate Sales Demand Forecasting: A Case Study

Authors: Ghaleb Y. Abbasi, Israa Abu Rumman

Abstract:

This paper modeled sales history archived from 2012 to 2015 bulked in monthly bins for five products for a medical supply company in Jordan. The sales forecasts and extracted consistent patterns in the sales demand history from the Enterprise Resource Planning (ERP) system were used to predict future forecasting and generate sales demand forecasting using time series analysis statistical technique called Auto Regressive Integrated Moving Average (ARIMA). This was used to model and estimate realistic sales demand patterns and predict future forecasting to decide the best models for five products. Analysis revealed that the current replenishment system indicated inventory overstocking.

Keywords: ARIMA models, sales demand forecasting, time series, R code

Procedia PDF Downloads 371
24152 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 311
24151 A Case Study of Ontology-Based Sentiment Analysis for Fan Pages

Authors: C. -L. Huang, J. -H. Ho

Abstract:

Social media has become more and more important in our life. Many enterprises promote their services and products to fans via the social media. The positive or negative sentiment of feedbacks from fans is very important for enterprises to improve their products, services, and promotion activities. The purpose of this paper is to understand the sentiment of the fan’s responses by analyzing the responses posted by fans on Facebook. The entity and aspect of fan’s responses were analyzed based on a predefined ontology. The ontology for cell phone sentiment analysis consists of aspect categories on the top level as follows: overall, shape, hardware, brand, price, and service. Each category consists of several sub-categories. All aspects for a fan’s response were found based on the ontology, and their corresponding sentimental terms were found using lexicon-based approach. The sentimental scores for aspects of fan responses were obtained by summarizing the sentimental terms in responses. The frequency of 'like' was also weighted in the sentimental score calculation. Three famous cell phone fan pages on Facebook were selected as demonstration cases to evaluate performances of the proposed methodology. Human judgment by several domain experts was also built for performance comparison. The performances of proposed approach were as good as those of human judgment on precision, recall and F1-measure.

Keywords: opinion mining, ontology, sentiment analysis, text mining

Procedia PDF Downloads 220
24150 Effect of Heavy Metals on the Life History Trait of Heterocephalobellus sp. and Cephalobus sp. (Nematode: Cephalobidae) Collected from a Small-Scale Mining Site, Davao de Oro, Philippines

Authors: Alissa Jane S. Mondejar, Florifern C. Paglinawan, Nanette Hope N. Sumaya, Joey Genevieve T. Martinez, Mylah Villacorte-Tabelin

Abstract:

Mining is associated with increased heavy metals in the environment, and heavy metal contamination disrupts the activities of soil fauna, such as nematodes, causing changes in the function of the soil ecosystem. Previous studies found that nematode community composition and diversity indices were strongly affected by heavy metals (e.g., Pb, Cu, and Zn). In this study, the influence of heavy metals on nematode survivability and reproduction were investigated. Life history analysis of the free-living nematodes, Heterocephalobellus sp. and Cephalobus sp. (Rhabditida: Cephalobidae) were assessed using the hanging drop technique, a technique often used in life history trait experiments. The nematodes were exposed to different temperatures, i.e.,20°C, 25°C, and 30°C, in different groups (control and heavy metal exposed) and fed with the same bacterial density of 1×109 Escherichia coli cells ml-1 for 30 days. Results showed that increasing temperature and exposure to heavy metals had a significant influence on the survivability and egg production of both species. Heterocephalobellus sp. and Cephalobus sp., when exposed to 20°C survived longer and produced few numbers of eggs but without subsequent hatching. Life history parameters of Heterocephalobellus sp. showed that the value of parameters was higher in the control group under net production rate (R0), fecundity (mx) which is also the same value for the total fertility rate (TFR), generation times (G0, G₁, and Gh) and Population doubling time (PDT). However, a lower rate of natural increase (rm) was observed since generation times were higher. Meanwhile, the life history parameters of Cephalobus sp. showed that the value of net production rate (R0) was higher in the exposed group. Fecundity (mx) which is also the same value for the TFR, G0, G1, Gh, and PDT, were higher in the control group. However, a lower rate of natural increase (rm) was observed since generation times were higher. In conclusion, temperature and exposure to heavy metals had a negative influence on the life history of the nematodes, however, further experiments should be considered.

Keywords: artisanal and small-scale gold mining (ASGM), hanging drop method, heavy metals, life history trait.

Procedia PDF Downloads 75
24149 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 449
24148 An Analysis System for Integrating High-Throughput Transcript Abundance Data with Metabolic Pathways in Green Algae

Authors: Han-Qin Zheng, Yi-Fan Chiang-Hsieh, Chia-Hung Chien, Wen-Chi Chang

Abstract:

As the most important non-vascular plants, algae have many research applications, including high species diversity, biofuel sources, adsorption of heavy metals and, following processing, health supplements. With the increasing availability of next-generation sequencing (NGS) data for algae genomes and transcriptomes, an integrated resource for retrieving gene expression data and metabolic pathway is essential for functional analysis and systems biology in algae. However, gene expression profiles and biological pathways are displayed separately in current resources, and making it impossible to search current databases directly to identify the cellular response mechanisms. Therefore, this work develops a novel AlgaePath database to retrieve gene expression profiles efficiently under various conditions in numerous metabolic pathways. AlgaePath, a web-based database, integrates gene information, biological pathways, and next-generation sequencing (NGS) datasets in Chlamydomonasreinhardtii and Neodesmus sp. UTEX 2219-4. Users can identify gene expression profiles and pathway information by using five query pages (i.e. Gene Search, Pathway Search, Differentially Expressed Genes (DEGs) Search, Gene Group Analysis, and Co-Expression Analysis). The gene expression data of 45 and 4 samples can be obtained directly on pathway maps in C. reinhardtii and Neodesmus sp. UTEX 2219-4, respectively. Genes that are differentially expressed between two conditions can be identified in Folds Search. Furthermore, the Gene Group Analysis of AlgaePath includes pathway enrichment analysis, and can easily compare the gene expression profiles of functionally related genes in a map. Finally, Co-Expression Analysis provides co-expressed transcripts of a target gene. The analysis results provide a valuable reference for designing further experiments and elucidating critical mechanisms from high-throughput data. More than an effective interface to clarify the transcript response mechanisms in different metabolic pathways under various conditions, AlgaePath is also a data mining system to identify critical mechanisms based on high-throughput sequencing.

Keywords: next-generation sequencing (NGS), algae, transcriptome, metabolic pathway, co-expression

Procedia PDF Downloads 394
24147 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 349
24146 Classification of Forest Types Using Remote Sensing and Self-Organizing Maps

Authors: Wanderson Goncalves e Goncalves, José Alberto Silva de Sá

Abstract:

Human actions are a threat to the balance and conservation of the Amazon forest. Therefore the environmental monitoring services play an important role as the preservation and maintenance of this environment. This study classified forest types using data from a forest inventory provided by the 'Florestal e da Biodiversidade do Estado do Pará' (IDEFLOR-BIO), located between the municipalities of Santarém, Juruti and Aveiro, in the state of Pará, Brazil, covering an area approximately of 600,000 hectares, Bands 3, 4 and 5 of the TM-Landsat satellite image, and Self - Organizing Maps. The information from the satellite images was extracted using QGIS software 2.8.1 Wien and was used as a database for training the neural network. The midpoints of each sample of forest inventory have been linked to images. Later the Digital Numbers of the pixels have been extracted, composing the database that fed the training process and testing of the classifier. The neural network was trained to classify two forest types: Rain Forest of Lowland Emerging Canopy (Dbe) and Rain Forest of Lowland Emerging Canopy plus Open with palm trees (Dbe + Abp) in the Mamuru Arapiuns glebes of Pará State, and the number of examples in the training data set was 400, 200 examples for each class (Dbe and Dbe + Abp), and the size of the test data set was 100, with 50 examples for each class (Dbe and Dbe + Abp). Therefore, total mass of data consisted of 500 examples. The classifier was compiled in Orange Data Mining 2.7 Software and was evaluated in terms of the confusion matrix indicators. The results of the classifier were considered satisfactory, and being obtained values of the global accuracy equal to 89% and Kappa coefficient equal to 78% and F1 score equal to 0,88. It evaluated also the efficiency of the classifier by the ROC plot (receiver operating characteristics), obtaining results close to ideal ratings, showing it to be a very good classifier, and demonstrating the potential of this methodology to provide ecosystem services, particularly in anthropogenic areas in the Amazon.

Keywords: artificial neural network, computational intelligence, pattern recognition, unsupervised learning

Procedia PDF Downloads 350
24145 An Improvement of Multi-Label Image Classification Method Based on Histogram of Oriented Gradient

Authors: Ziad Abdallah, Mohamad Oueidat, Ali El-Zaart

Abstract:

Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.

Keywords: data mining, information retrieval system, multi-label, problem transformation, histogram of gradients

Procedia PDF Downloads 361
24144 Toxic Metal and Radiological Risk Assessment of Soil, Water and Vegetables around a Gold Mine Turned Residential Area in Mokuro Area of Ile-Ife, Osun State Nigeria: An Implications for Human Health

Authors: Grace O. Akinlade, Danjuma D. Maza, Oluwakemi O. Olawolu, Delight O. Babalola, John A. O. Oyekunle, Joshua O. Ojo

Abstract:

The Mokuro area of Ile-Ife, South West Nigeria, was well known for gold mining in the past (about twenty years ago). However, the place has since been reclaimed and converted to residential area without any environmental risk assessment of the impact of the mining tailings on the environment. Soil, water, and plant samples were collected from 4 different locations around the mine-turned-residential area. Soil samples were pulverized and sieved into finer particles, while the plant samples were dried and pulverized. All the samples were digested and analyzed for As, Pb, Cd, and Zn using atomic absorption spectroscopy (AAS). From the analysis results, the hazard index (HI) was then calculated for the metals. The soil and plant samples were air dried and pulverized, then weighed, after which the samples were packed into special and properly sealed containers to prevent radon gas leakage. After the sealing, the samples were kept for 28 days to attain secular equilibrium. The concentrations of 40K, 238U, and 232Th in the samples were measured using a cesium iodide (CsI) spectrometer and URSA software. The AAS analysis showed that As, Pb, Cd (Toxic metals), and Zn (essential trace metals) are in concentrations lower than permissible limits in plants and soil samples, while the water samples had concentrations higher than permissible limits. The calculated health indices (HI) show that HI for water is >1 and that of plants and soil is <1. Gamma spectrometry result shows high levels of activity concentrations above the recommended limits for all the soil and plant samples collected from the area. Only the water samples have activity concentrations below the recommended limit. Consequently, the absorbed dose, annual effective dose, and excess lifetime cancer risk are all above the recommended safe limit for all the samples except for water samples. In conclusion, all the samples collected from the area are either contaminated with toxic metals or they pose radiological hazards to the consumers. Further detailed study is therefore recommended in order to be able to advise the residents appropriately.

Keywords: toxic metals, gamma spectrometry, Ile-Ife, radiological hazards, gold mining

Procedia PDF Downloads 37
24143 From Ride-Hailing App to Diversified and Sustainable Platform Business Model

Authors: Ridwan Dewayanto Rusli

Abstract:

We show how prisoner's dilemma-type competition problems can be mitigated through rapid platform diversification and ecosystem expansion. We analyze a ride-hailing company in Southeast Asia, Gojek, whose network grew to more than 170 million users comprising consumers, partner drivers, merchants, and complementors within a few years and has already achieved higher contribution margins than ride-hailing peers Uber and Lyft. Its ecosystem integrates ride-hailing, food delivery and logistics, merchant solutions, e-commerce, marketplace and advertising, payments, and fintech offerings. The company continues growing its network of complementors and App developers, expanding content and gaining critical mass in consumer data analytics and advertising. We compare the company's growth and diversification trajectory with those of its main international rivals and peers. The company's rapid growth and future potential are analyzed using Cusumano's (2012) Staying Power and Six Principles, Hax and Wilde's (2003) and Hax's (2010) The Delta Model as well as Santos' (2016) home-market advantages frameworks. The recently announced multi-billion-dollar merger with one of Southeast Asia's largest e-commerce majors lends additional support to the above arguments.

Keywords: ride-hailing, prisoner's dilemma, platform and ecosystem strategy, digital applications, diversification, home market advantages, e-commerce

Procedia PDF Downloads 85
24142 Optimization of Manufacturing Process Parameters: An Empirical Study from Taiwan's Tech Companies

Authors: Chao-Ton Su, Li-Fei Chen

Abstract:

The parameter design is crucial to improving the uniformity of a product or process. In the product design stage, parameter design aims to determine the optimal settings for the parameters of each element in the system, thereby minimizing the functional deviations of the product. In the process design stage, parameter design aims to determine the operating settings of the manufacturing processes so that non-uniformity in manufacturing processes can be minimized. The parameter design, trying to minimize the influence of noise on the manufacturing system, plays an important role in the high-tech companies. Taiwan has many well-known high-tech companies, which show key roles in the global economy. Quality remains the most important factor that enables these companies to sustain their competitive advantage. In Taiwan however, many high-tech companies face various quality problems. A common challenge is related to root causes and defect patterns. In the R&D stage, root causes are often unknown, and defect patterns are difficult to classify. Additionally, data collection is not easy. Even when high-volume data can be collected, data interpretation is difficult. To overcome these challenges, high-tech companies in Taiwan use more advanced quality improvement tools. In addition to traditional statistical methods and quality tools, the new trend is the application of powerful tools, such as neural network, fuzzy theory, data mining, industrial engineering, operations research, and innovation skills. In this study, several examples of optimizing the parameter settings for the manufacturing process in Taiwan’s tech companies will be presented to illustrate proposed approach’s effectiveness. Finally, a discussion of using traditional experimental design versus the proposed approach for process optimization will be made.

Keywords: quality engineering, parameter design, neural network, genetic algorithm, experimental design

Procedia PDF Downloads 131
24141 Recommender Systems Using Ensemble Techniques

Authors: Yeonjeong Lee, Kyoung-jae Kim, Youngtae Kim

Abstract:

This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.

Keywords: product recommender system, ensemble technique, association rules, decision tree, artificial neural networks

Procedia PDF Downloads 280