Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 25162

Search results for: data warehouses

25132 The Occurrence of Clavibacter michiganensis subsp. sepedonicus on Potato in South Sulawesi, Indonesia

Authors: Baharuddin Patandjengi, A. Pabborong, T. Kuswinanti

Abstract:

Bacterial ring rot caused by a gram-positive Coryneform bacterium Corynebacterium michiganensis subsp. sepedonicus is an important disease on potato crops in the world. The disease still belongs to an A1 quarantine pathogen in Indonesia, although it was found in West Java since 2013. The objective of this study was to know the presence of bacterial ring rot in four potato district areas in South Sulawesi. Infected samples were collected from potato fields and storage warehouses in Enrekang, Gowa, Jeneponto and Bantaeng districts. Potato tuber samples were cut and observed their vasiculer vessels and the bacterial ooze was used for isolation on Nutrient Agar and Nutrient Broth–Yeast extract medium. Bacterial isolates were then morphologically and physiologically characterized. A patogenicity test on eggplant and molecular characterization using PCR with specific primer for Cms (50F and Cms 50 R) was revealed for further identification. The results showed that Cms has become widespread in four districts of South Sulawesi. The bacterial ringrot disease incidence in these districts was reached above 30 %. All of 14 bacterial isolates that identified before using standard methods of EPPO, showed DNA band in size of 224 bp in PCR test, which indicated positively belong to C. michiganensis subsp. sepedonicus.

Keywords: bacterial ring rot, clavibacter michiganensis pv. sepedonicus, PCR, potato

Procedia PDF Downloads 334

25131 Processing Big Data: An Approach Using Feature Selection

Authors: Nikat Parveen, M. Ananthi

Abstract:

Big data is one of the emerging technology, which collects the data from various sensors and those data will be used in many fields. Data retrieval is one of the major issue where there is a need to extract the exact data as per the need. In this paper, large amount of data set is processed by using the feature selection. Feature selection helps to choose the data which are actually needed to process and execute the task. The key value is the one which helps to point out exact data available in the storage space. Here the available data is streamed and R-Center is proposed to achieve this task.

Keywords: big data, key value, feature selection, retrieval, performance

Procedia PDF Downloads 341

25130 Optimization of Lubricant Distribution with Alternative Coordinates and Number of Warehouses Considering Truck Capacity and Time Windows

Authors: Taufik Rizkiandi, Teuku Yuri M. Zagloel, Andri Dwi Setiawan

Abstract:

Distribution and growth in the transportation and warehousing business sector decreased by 15,04%. There was a decrease in Gross Domestic Product (GDP) contribution level from rank 7 of 4,41% in 2019 to 3,81% in rank 8 in 2020. A decline in the transportation and warehousing business sector contributes to GDP, resulting in oil and gas companies implementing an efficient supply chain strategy to ensure the availability of goods, especially lubricants. Fluctuating demand for lubricants and warehouse service time limits are essential things that are taken into account in determining an efficient route. Add depots points as a solution so that demand for lubricants is fulfilled (not stock out). However, adding a depot will increase operating costs and storage costs. Therefore, it is necessary to optimize the addition of depots using the Capacitated Vehicle Routing Problem with Time Windows (CVRPTW). This research case study was conducted at an oil and gas company that produces lubricants from 2019 to 2021. The study results obtained the optimal route and the addition of a depot with a minimum additional cost. The total cost remains efficient with the addition of a depot when compared to one depot from Jakarta.

Keywords: CVRPTW, optimal route, depot, tabu search algorithm

Procedia PDF Downloads 136

25129 Efficiency of Lavandula angustifolia Mill and Zataria multiflora Boiss essential oils on nutritional indices of Tribolium confusum Jacquelin du Val (Col.: Tenebrionidae)

Authors: Karim Saeidi

Abstract:

One of the most important pests in the warehouses is the flour beetle, Tribolium confusum Jacquelin du Val (Col.: Tenebrionidae). Regarding the high degree of damage of stored product pests and dangerous effects of the chemical control using plant extracts and their components are some of the best approaches to control these pests. Antifeedant activity of plant extracts from Lavandula angustifolia Mill and Zataria multiflora Boiss using hydro-distillation were tested against the flour beetle, Tribolium confusum Jacquelin du Val. The nutritional indices: relative growth rate (RGR), relative consumption rate (RCR), the efficiency of conversion of ingested food (ECI), and feeding deterrence index (FDI) were measured for adult insects. Treatments were evaluated using a flour disk bioassay in the dark; at 25±1ᵒC and 60±5% R. H. Concentrations of 0, 0.1, 0.5, 0.75, 1, 1.5, and 2 μl/disk were prepared from each essential oil. After 72 h, nutritional indices were calculated. L. angustifolia oils were more effective than Z. multiflora oils by significantly decreasing the RGR, RCR, and ECI. Feeding deterrence index (FDI) of L. angustifolia essential oil was increased significantly as essential oil concentration increased. The essential oil of L. angustifolia was more effective on FDI than Z. multiflora in some concentration.

Keywords: essential oil, nutritional indices, Tribolium confusum

Procedia PDF Downloads 399

25128 Applications of Big Data in Education

Authors: Faisal Kalota

Abstract:

Big Data and analytics have gained a huge momentum in recent years. Big Data feeds into the field of Learning Analytics (LA) that may allow academic institutions to better understand the learners’ needs and proactively address them. Hence, it is important to have an understanding of Big Data and its applications. The purpose of this descriptive paper is to provide an overview of Big Data, the technologies used in Big Data, and some of the applications of Big Data in education. Additionally, it discusses some of the concerns related to Big Data and current research trends. While Big Data can provide big benefits, it is important that institutions understand their own needs, infrastructure, resources, and limitation before jumping on the Big Data bandwagon.

Keywords: big data, learning analytics, analytics, big data in education, Hadoop

Procedia PDF Downloads 426

25127 Evaluation of the Effectiveness of Barriers for the Control of Rats in Rice Plantation Field

Authors: Melina, Jumardi Jumardi, Erwin Erwin, Sri Nuraminah, Andi Nasruddin

Abstract:

The rice field rat (Rattus argentiventer Robinson and Kloss) is a pest causing the greatest yield loss of rice plants, especially in lowland agroecosystems with intensive cropping patterns (2-3 plantings per year). Field mice damage rice plants at all stages of growth, from seedling to harvest, even in storage warehouses. Severe damage with yield loss of up to 100% occurs if rats attack rice at the generative stage because the plants are no longer able to recover by forming new tillers. Farmers mainly use rodenticides in the form of poisoned baits or as fumigants, which are applied to rat burrow holes. This practice is generally less effective because mice are able to avoid the poison or become resistant after several exposures to it. In addition, excessive use of rodenticides can have negative impacts on the environment and non-target organisms. For this reason, this research was conducted to evaluate the effectiveness of fences as an environmentally friendly mechanical control method in reducing rice yield losses due to rat attacks. This study used a factorial randomized block design. The first factor was the fence material, namely galvanized zinc plate and plastic. The second factor was the height of the fence, namely 25, 50, 75, and 100 cm from the ground level. Each treatment combination was repeated five times. Data shows that zinc fences with a height of 75 and 100 cm are able to provide full protection to plants from rat infestations throughout the planting season. However, zinc fences with a height of 25 and 50 cm failed to prevent rat attacks. Plastic fences with a height of 25 and 50 cm failed to prevent rat attacks during the planting season, whereas 75 and 100 cm were able to prevent rat attacks until all the crops outside of the fence had been eaten by rats. The rat managed to get into the fence by biting the plastic fence close to the ground. Thus, the research results show that fences made of zinc plate with a height of at least 75 cm from the ground surface are effective in preventing plant damage caused by rats. To our knowledge, this research is the first to quantify the effectiveness of fences as a control of field rodents.

Keywords: rice field rat, Rattus argentiventer, fence, rice

Procedia PDF Downloads 40

25126 Storage Method for Parts from End of Life Vehicles' Dismantling Process According to Sustainable Development Requirements: Polish Case Study

Authors: M. Kosacka, I. Kudelska

Abstract:

Vehicle is one of the most influential and complex product worldwide, which affects people’s life, state of the environment and condition of the economy (all aspects of sustainable development concept) during each stage of lifecycle. With the increase of vehicles’ number, there is growing potential for management of End of Life Vehicle (ELV), which is hazardous waste. From one point of view, the ELV should be managed to ensure risk elimination, but from another point, it should be treated as a source of valuable materials and spare parts. In order to obtain materials and spare parts, there are established recycling networks, which are an example of sustainable policy realization at the national level. The basic object in the polish recycling network is dismantling facility. The output material streams in dismantling stations include waste, which very often generate costs and spare parts, that have the biggest potential for revenues creation. Both outputs are stored into warehouses, according to the law. In accordance to the revenue creation and sustainability potential, it has been placed a strong emphasis on storage process. We present the concept of storage method, which takes into account the specific of the dismantling facility in order to support decision-making process with regard to the principles of sustainable development. The method was developed on the basis of case study of one of the greatest dismantling facility in Poland.

Keywords: dismantling, end of life vehicles, sustainability, storage

Procedia PDF Downloads 270

25125 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 564

25124 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue Opportunity

Authors: Hoda A. Abdel Hafez

Abstract:

Mining big data represents a big challenge nowadays. Many types of research are concerned with mining massive amounts of data and big data streams. Mining big data faces a lot of challenges including scalability, speed, heterogeneity, accuracy, provenance and privacy. In telecommunication industry, mining big data is like a mining for gold; it represents a big opportunity and maximizing the revenue streams in this industry. This paper discusses the characteristics of big data (volume, variety, velocity and veracity), data mining techniques and tools for handling very large data sets, mining big data in telecommunication and the benefits and opportunities gained from them.

Keywords: mining big data, big data, machine learning, telecommunication

Procedia PDF Downloads 410

25123 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach

Authors: Theertha Chandroth

Abstract:

This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.

Keywords: XML, JSON, data comparison, integration testing, Python, SQL

Procedia PDF Downloads 140

25122 Using Machine Learning Techniques to Extract Useful Information from Dark Data

Authors: Nigar Hussain

Abstract:

It is a subset of big data. Dark data means those data in which we fail to use for future decisions. There are many issues in existing work, but some need powerful tools for utilizing dark data. It needs sufficient techniques to deal with dark data. That enables users to exploit their excellence, adaptability, speed, less time utilization, execution, and accessibility. Another issue is the way to utilize dark data to extract helpful information to settle on better choices. In this paper, we proposed upgrade strategies to remove the dark side from dark data. Using a supervised model and machine learning techniques, we utilized dark data and achieved an F1 score of 89.48%.

Keywords: big data, dark data, machine learning, heatmap, random forest

Procedia PDF Downloads 29

25121 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 393

25120 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 525

25119 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 473

25118 Audio-Visual Co-Data Processing Pipeline

Authors: Rita Chattopadhyay, Vivek Anand Thoutam

Abstract:

Speech is the most acceptable means of communication where we can quickly exchange our feelings and thoughts. Quite often, people can communicate orally but cannot interact or work with computers or devices. It’s easy and quick to give speech commands than typing commands to computers. In the same way, it’s easy listening to audio played from a device than extract output from computers or devices. Especially with Robotics being an emerging market with applications in warehouses, the hospitality industry, consumer electronics, assistive technology, etc., speech-based human-machine interaction is emerging as a lucrative feature for robot manufacturers. Considering this factor, the objective of this paper is to design the “Audio-Visual Co-Data Processing Pipeline.” This pipeline is an integrated version of Automatic speech recognition, a Natural language model for text understanding, object detection, and text-to-speech modules. There are many Deep Learning models for each type of the modules mentioned above, but OpenVINO Model Zoo models are used because the OpenVINO toolkit covers both computer vision and non-computer vision workloads across Intel hardware and maximizes performance, and accelerates application development. A speech command is given as input that has information about target objects to be detected and start and end times to extract the required interval from the video. Speech is converted to text using the Automatic speech recognition QuartzNet model. The summary is extracted from text using a natural language model Generative Pre-Trained Transformer-3 (GPT-3). Based on the summary, essential frames from the video are extracted, and the You Only Look Once (YOLO) object detection model detects You Only Look Once (YOLO) objects on these extracted frames. Frame numbers that have target objects (specified objects in the speech command) are saved as text. Finally, this text (frame numbers) is converted to speech using text to speech model and will be played from the device. This project is developed for 80 You Only Look Once (YOLO) labels, and the user can extract frames based on only one or two target labels. This pipeline can be extended for more than two target labels easily by making appropriate changes in the object detection module. This project is developed for four different speech command formats by including sample examples in the prompt used by Generative Pre-Trained Transformer-3 (GPT-3) model. Based on user preference, one can come up with a new speech command format by including some examples of the respective format in the prompt used by the Generative Pre-Trained Transformer-3 (GPT-3) model. This pipeline can be used in many projects like human-machine interface, human-robot interaction, and surveillance through speech commands. All object detection projects can be upgraded using this pipeline so that one can give speech commands and output is played from the device.

Keywords: OpenVINO, automatic speech recognition, natural language processing, object detection, text to speech

Procedia PDF Downloads 80

25117 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 641

25116 Prioritizing Temporary Shelter Areas for Disaster Affected People Using Hybrid Decision Support Model

Authors: Ashish Trivedi, Amol Singh

Abstract:

In the recent years, the magnitude and frequency of disasters have increased at an alarming rate. Every year, more than 400 natural disasters affect global population. A large-scale disaster leads to destruction or damage to houses, thereby rendering a notable number of residents homeless. Since humanitarian response and recovery process takes considerable time, temporary establishments are arranged in order to provide shelter to affected population. These shelter areas are vital for an effective humanitarian relief; therefore, they must be strategically planned. Choosing the locations of temporary shelter areas for accommodating homeless people is critical to the quality of humanitarian assistance provided after a large-scale emergency. There has been extensive research on the facility location problem both in theory and in application. In order to deliver sufficient relief aid within a relatively short timeframe, humanitarian relief organisations pre-position warehouses at strategic locations. However, such approaches have received limited attention from the perspective of providing shelters to disaster-affected people. In present research work, this aspect of humanitarian logistics is considered. The present work proposes a hybrid decision support model to determine relative preference of potential shelter locations by assessing them based on key subjective criteria. Initially, the factors that are kept in mind while locating potential areas for establishing temporary shelters are identified by reviewing extant literature and through consultation from a panel of disaster management experts. In order to determine relative importance of individual criteria by taking into account subjectivity of judgements, a hybrid approach of fuzzy sets and Analytic Hierarchy Process (AHP) was adopted. Further, Technique for order preference by similarity to ideal solution (TOPSIS) was applied on an illustrative data set to evaluate potential locations for establishing temporary shelter areas for homeless people in a disaster scenario. The contribution of this work is to propose a range of possible shelter locations for a humanitarian relief organization, using a robust multi criteria decision support framework.

Keywords: AHP, disaster preparedness, fuzzy set theory, humanitarian logistics, TOPSIS, temporary shelters

Procedia PDF Downloads 202

25115 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 378

25114 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 162

25113 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

Procedia PDF Downloads 229

25112 A Machine Learning Decision Support Framework for Industrial Engineering Purposes

Authors: Anli Du Preez, James Bekker

Abstract:

Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.

Keywords: Data analytics, Industrial engineering, Machine learning, Value creation

Procedia PDF Downloads 168

25111 Providing Security to Private Cloud Using Advanced Encryption Standard Algorithm

Authors: Annapureddy Srikant Reddy, Atthanti Mahendra, Samala Chinni Krishna, N. Neelima

Abstract:

In our present world, we are generating a lot of data and we, need a specific device to store all these data. Generally, we store data in pen drives, hard drives, etc. Sometimes we may loss the data due to the corruption of devices. To overcome all these issues, we implemented a cloud space for storing the data, and it provides more security to the data. We can access the data with just using the internet from anywhere in the world. We implemented all these with the java using Net beans IDE. Once user uploads the data, he does not have any rights to change the data. Users uploaded files are stored in the cloud with the file name as system time and the directory will be created with some random words. Cloud accepts the data only if the size of the file is less than 2MB.

Keywords: cloud space, AES, FTP, NetBeans IDE

Procedia PDF Downloads 206

25110 Business Intelligence for Profiling of Telecommunication Customer

Authors: Rokhmatul Insani, Hira Laksmiwati Soemitro

Abstract:

Business Intelligence is a methodology that exploits the data to produce information and knowledge systematically, business intelligence can support the decision-making process. Some methods in business intelligence are data warehouse and data mining. A data warehouse can store historical data from transactional data. For data modelling in data warehouse, we apply dimensional modelling by Kimball. While data mining is used to extracting patterns from the data and get insight from the data. Data mining has many techniques, one of which is segmentation. For profiling of telecommunication customer, we use customer segmentation according to customer’s usage of services, customer invoice and customer payment. Customers can be grouped according to their characteristics and can be identified the profitable customers. We apply K-Means Clustering Algorithm for segmentation. The input variable for that algorithm we use RFM (Recency, Frequency and Monetary) model. All process in data mining, we use tools IBM SPSS modeller.

Keywords: business intelligence, customer segmentation, data warehouse, data mining

Procedia PDF Downloads 484

25109 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 574

25108 PDDA: Priority-Based, Dynamic Data Aggregation Approach for Sensor-Based Big Data Framework

Authors: Lutful Karim, Mohammed S. Al-kahtani

Abstract:

Sensors are being used in various applications such as agriculture, health monitoring, air and water pollution monitoring, traffic monitoring and control and hence, play the vital role in the growth of big data. However, sensors collect redundant data. Thus, aggregating and filtering sensors data are significantly important to design an efficient big data framework. Current researches do not focus on aggregating and filtering data at multiple layers of sensor-based big data framework. Thus, this paper introduces (i) three layers data aggregation and framework for big data and (ii) a priority-based, dynamic data aggregation scheme (PDDA) for the lowest layer at sensors. Simulation results show that the PDDA outperforms existing tree and cluster-based data aggregation scheme in terms of overall network energy consumptions and end-to-end data transmission delay.

Keywords: big data, clustering, tree topology, data aggregation, sensor networks

Procedia PDF Downloads 346

25107 Control the Flow of Big Data

Authors: Shizra Waris, Saleem Akhtar

Abstract:

Big data is a research area receiving attention from academia and IT communities. In the digital world, the amounts of data produced and stored have within a short period of time. Consequently this fast increasing rate of data has created many challenges. In this paper, we use functionalism and structuralism paradigms to analyze the genesis of big data applications and its current trends. This paper presents a complete discussion on state-of-the-art big data technologies based on group and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also covers big data analytics techniques, processing methods, some reported case studies from different vendor, several open research challenges and the chances brought about by big data. The similarities and differences of these techniques and technologies based on important limitations are also investigated. Emerging technologies are suggested as a solution for big data problems.

Keywords: computer, it community, industry, big data

Procedia PDF Downloads 194

25106 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia

Abstract:

Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: high performance computing, HPC, big data, data analysis

Procedia PDF Downloads 520

25105 Financial Burden of Occupational Slip and Fall Incidences in Taiwan

Authors: Kai Way Li, Lang Gan

Abstract:

Slip &Fall are common in Taiwan. They could result in injuries and even fatalities. Official statistics indicate that more than 15% of all occupational incidences were slip/fall related. All the workers in Taiwan are required by the law to join the worker’s insurance program administered by the Bureau of Labor Insurance (BLI). The BLI is a government agency under the supervision of the Ministry of Labor. Workers claim with the BLI for insurance compensations when they suffer fatalities or injuries at work. Injuries statistics based on worker’s compensation claims were rarely studied. The objective of this study was to quantify the injury statistics and financial cost due to slip-fall incidences based on the BLI compensation records. Compensation records in the BLI during 2007 to 2013 were retrieved. All the original application forms, approval opinions, results for worker’s compensations were in hardcopy and were stored in the BLI warehouses. Xerox copies of the claims, excluding the personal information of the applicants (or the victim if passed away), were obtained. The content in the filing forms were coded in an Excel worksheet for further analyses. Descriptive statistics were performed to analyze the data. There were a total of 35,024 claims including 82 deaths, 878 disabilities, and 34,064 injuries/illnesses which were slip/fall related. It was found that the average losses for the death cases were 40 months. The total dollar amount for these cases paid was 86,913,195 NTD. For the disability cases, the average losses were 367.36 days. The total dollar amount for these cases paid was almost 2.6 times of those for the death cases (233,324,004 NTD). For the injury/illness cases, the average losses for the illness cases were 58.78 days. The total dollar amount for these cases paid was approximately 13 times of those of the death cases (1134,850,821 NTD). For the applicants/victims, 52.3% were males. There were more males than females for the deaths, disability, and injury/illness cases. Most (57.8%) of the female victims were between 45 to 59 years old. Most of the male victims (62.6%) were, on the other hand, between 25 to 39 years old. Most of the victims were in manufacturing industry (26.41%), next the construction industry (22.20%), and next the retail industry (13.69%). For the fatality cases, head injury was the main problem for immediate or eventual death (74.4%). For the disability case, foot (17.46%) and knee (9.05%) injuries were the leading problems. The compensation claims other than fatality and disability were mainly associated with injuries of the foot (18%), hand (12.87%), knee (10.42%), back (8.83%), and shoulder (6.77%). The slip/fall cases studied indicate that the ratios among the death, disability, and injury/illness counts were 1:10:415. The ratios of dollar amount paid by the BLI for the three categories were 1:2.6:13. Such results indicate the significance of slip-fall incidences resulting in different severity. Such information should be incorporated in to slip-fall prevention program in industry.

Keywords: epidemiology, slip and fall, social burden, workers’ compensation

Procedia PDF Downloads 323

25104 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 325

25103 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 469