Search results for: large amounts of data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 29262

Search results for: large amounts of data

29202 Identification of Coauthors in Scientific Database

Authors: Thiago M. R Dias, Gray F. Moita

Abstract:

The analysis of scientific collaboration networks has contributed significantly to improving the understanding of how does the process of collaboration between researchers and also to understand how the evolution of scientific production of researchers or research groups occurs. However, the identification of collaborations in large scientific databases is not a trivial task given the high computational cost of the methods commonly used. This paper proposes a method for identifying collaboration in large data base of curriculum researchers. The proposed method has low computational cost with satisfactory results, proving to be an interesting alternative for the modeling and characterization of large scientific collaboration networks.

Keywords: extraction, data integration, information retrieval, scientific collaboration

Procedia PDF Downloads 363
29201 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 516
29200 Real-Time Big-Data Warehouse a Next-Generation Enterprise Data Warehouse and Analysis Framework

Authors: Abbas Raza Ali

Abstract:

Big Data technology is gradually becoming a dire need of large enterprises. These enterprises are generating massively large amount of off-line and streaming data in both structured and unstructured formats on daily basis. It is a challenging task to effectively extract useful insights from the large scale datasets, even though sometimes it becomes a technology constraint to manage transactional data history of more than a few months. This paper presents a framework to efficiently manage massively large and complex datasets. The framework has been tested on a communication service provider producing massively large complex streaming data in binary format. The communication industry is bound by the regulators to manage history of their subscribers’ call records where every action of a subscriber generates a record. Also, managing and analyzing transactional data allows service providers to better understand their customers’ behavior, for example, deep packet inspection requires transactional internet usage data to explain internet usage behaviour of the subscribers. However, current relational database systems limit service providers to only maintain history at semantic level which is aggregated at subscriber level. The framework addresses these challenges by leveraging Big Data technology which optimally manages and allows deep analysis of complex datasets. The framework has been applied to offload existing Intelligent Network Mediation and relational Data Warehouse of the service provider on Big Data. The service provider has 50+ million subscriber-base with yearly growth of 7-10%. The end-to-end process takes not more than 10 minutes which involves binary to ASCII decoding of call detail records, stitching of all the interrogations against a call (transformations) and aggregations of all the call records of a subscriber.

Keywords: big data, communication service providers, enterprise data warehouse, stream computing, Telco IN Mediation

Procedia PDF Downloads 147
29199 Big Data and Health: An Australian Perspective Which Highlights the Importance of Data Linkage to Support Health Research at a National Level

Authors: James Semmens, James Boyd, Anna Ferrante, Katrina Spilsbury, Sean Randall, Adrian Brown

Abstract:

‘Big data’ is a relatively new concept that describes data so large and complex that it exceeds the storage or computing capacity of most systems to perform timely and accurate analyses. Health services generate large amounts of data from a wide variety of sources such as administrative records, electronic health records, health insurance claims, and even smart phone health applications. Health data is viewed in Australia and internationally as highly sensitive. Strict ethical requirements must be met for the use of health data to support health research. These requirements differ markedly from those imposed on data use from industry or other government sectors and may have the impact of reducing the capacity of health data to be incorporated into the real time demands of the Big Data environment. This ‘big data revolution’ is increasingly supported by national governments, who have invested significant funds into initiatives designed to develop and capitalize on big data and methods for data integration using record linkage. The benefits to health following research using linked administrative data are recognised internationally and by the Australian Government through the National Collaborative Research Infrastructure Strategy Roadmap, which outlined a multi-million dollar investment strategy to develop national record linkage capabilities. This led to the establishment of the Population Health Research Network (PHRN) to coordinate and champion this initiative. The purpose of the PHRN was to establish record linkage units in all Australian states, to support the implementation of secure data delivery and remote access laboratories for researchers, and to develop the Centre for Data Linkage for the linkage of national and cross-jurisdictional data. The Centre for Data Linkage has been established within Curtin University in Western Australia; it provides essential record linkage infrastructure necessary for large-scale, cross-jurisdictional linkage of health related data in Australia and uses a best practice ‘separation principle’ to support data privacy and security. Privacy preserving record linkage technology is also being developed to link records without the use of names to overcome important legal and privacy constraint. This paper will present the findings of the first ‘Proof of Concept’ project selected to demonstrate the effectiveness of increased record linkage capacity in supporting nationally significant health research. This project explored how cross-jurisdictional linkage can inform the nature and extent of cross-border hospital use and hospital-related deaths. The technical challenges associated with national record linkage, and the extent of cross-border population movements, were explored as part of this pioneering research project. Access to person-level data linked across jurisdictions identified geographical hot spots of cross border hospital use and hospital-related deaths in Australia. This has implications for planning of health service delivery and for longitudinal follow-up studies, particularly those involving mobile populations.

Keywords: data integration, data linkage, health planning, health services research

Procedia PDF Downloads 185
29198 Project Financing and Poverty Trends in the Islamic Development Bank Member Countries

Authors: Sennanda Musa, Ahmed Mutunzi Kitunzi, Gerald Kasigwa, Ismail Kintu

Abstract:

This paper is an analysis of the empirical relationship between project financing by Islamic Development Bank (IsDB) and the poverty trends in the context of countries benefiting from IsDB. Specifically, the study seeks to find out whether there is a statistically significant relationship between the project financing dollar amounts by IsDB (PF) and the GNI Per Capita, PPP of 57 countries for the years 2002 to 2021. The research is a longitudinal, desk-top triangulation of correlation, regression, hypothesis-testing employing the linear dynamic panel data GMM model as an estimator of the empirical relationships between the key variables of the study. The study results show that there is a significant positive relationship between the PF dollar amounts from the IsDB and the GNI Per Capita, PPP in these 57 countries. Therefore, countries that receive higher PF dollar amounts from the IsDB, generally have more GNI Per Capita, PPP (less poverty) than their counterparts. It is, therefore, recommendable for countries to formulate policies that facilitate Islamically financed projects to mitigate poverty. This paper develops policy discussions regarding allocation of political attention to the policy topics on poverty mitigation, and their relation to financing projects Islamically, thus generate information on policy choices regarding the Islamic financing alternative.

Keywords: gross-national-income, IsDB-project-financing, public policy, poverty

Procedia PDF Downloads 53
29197 1/Sigma Term Weighting Scheme for Sentiment Analysis

Authors: Hanan Alshaher, Jinsheng Xu

Abstract:

Large amounts of data on the web can provide valuable information. For example, product reviews help business owners measure customer satisfaction. Sentiment analysis classifies texts into two polarities: positive and negative. This paper examines movie reviews and tweets using a new term weighting scheme, called one-over-sigma (1/sigma), on benchmark datasets for sentiment classification. The proposed method aims to improve the performance of sentiment classification. The results show that 1/sigma is more accurate than the popular term weighting schemes. In order to verify if the entropy reflects the discriminating power of terms, we report a comparison of entropy values for different term weighting schemes.

Keywords: 1/sigma, natural language processing, sentiment analysis, term weighting scheme, text classification

Procedia PDF Downloads 181
29196 Exploration of RFID in Healthcare: A Data Mining Approach

Authors: Shilpa Balan

Abstract:

Radio Frequency Identification, also popularly known as RFID is used to automatically identify and track tags attached to items. This study focuses on the application of RFID in healthcare. The adoption of RFID in healthcare is a crucial technology to patient safety and inventory management. Data from RFID tags are used to identify the locations of patients and inventory in real time. Medical errors are thought to be a prominent cause of loss of life and injury. The major advantage of RFID application in healthcare industry is the reduction of medical errors. The healthcare industry has generated huge amounts of data. By discovering patterns and trends within the data, big data analytics can help improve patient care and lower healthcare costs. The number of increasing research publications leading to innovations in RFID applications shows the importance of this technology. This study explores the current state of research of RFID in healthcare using a text mining approach. No study has been performed yet on examining the current state of RFID research in healthcare using a data mining approach. In this study, related articles were collected on RFID from healthcare journal and news articles. Articles collected were from the year 2000 to 2015. Significant keywords on the topic of focus are identified and analyzed using open source data analytics software such as Rapid Miner. These analytical tools help extract pertinent information from massive volumes of data. It is seen that the main benefits of adopting RFID technology in healthcare include tracking medicines and equipment, upholding patient safety, and security improvement. The real-time tracking features of RFID allows for enhanced supply chain management. By productively using big data, healthcare organizations can gain significant benefits. Big data analytics in healthcare enables improved decisions by extracting insights from large volumes of data.

Keywords: RFID, data mining, data analysis, healthcare

Procedia PDF Downloads 197
29195 Hydrothermal Energy Application Technology Using Dam Deep Water

Authors: Yooseo Pang, Jongwoong Choi, Yong Cho, Yongchae Jeong

Abstract:

Climate crisis, such as environmental problems related to energy supply, is getting emerged issues, so the use of renewable energy is essentially required to solve these problems, which are mainly managed by the Paris Agreement, the international treaty on climate change. The government of the Republic of Korea announced that the key long-term goal for a low-carbon strategy is “Carbon neutrality by 2050”. It is focused on the role of the internet data centers (IDC) in which large amounts of data, such as artificial intelligence (AI) and big data as an impact of the 4th industrial revolution, are managed. The demand for the cooling system market for IDC was about 9 billion US dollars in 2020, and 15.6% growth a year is expected in Korea. It is important to control the temperature in IDC with an efficient air conditioning system, so hydrothermal energy is one of the best options for saving energy in the cooling system. In order to save energy and optimize the operating conditions, it has been considered to apply ‘the dam deep water air conditioning system. Deep water at a specific level from the dam can supply constant water temperature year-round. It will be tested & analyzed the amount of energy saving with a pilot plant that has 100RT cooling capacity. Also, a target of this project is 1.2 PUE (Power Usage Effectiveness) which is the key parameter to check the efficiency of the cooling system.

Keywords: hydrothermal energy, HVAC, internet data center, free-cooling

Procedia PDF Downloads 55
29194 Characteristics of Cement Pastes Incorporating Different Amounts of Waste Cellular Concrete Powder

Authors: Mohammed Abed, Rita Nemes

Abstract:

In this study different amounts of waste cellular concrete powder (WCCP) as replacement of cement have been investigated as an attempt to produce green binder, which is useful for sustainable construction applications. From zero to up to 60% of WCCP by mass replacement amounts of cement has been conducted. Consistency, compressive strength, bending strength and the activity index of WCCP through seven to ninety days old specimens have been examined, where the optimum WCCP replacement was up to 30%, depending on which the activity index still increased to the end of test period (90 days) and this could be an evidence for its continuity to increase for longer age. Also up to 30% of WCCP increased the bending strength to be higher than the control one. The main point in the present study that there is a possibility of replacing cement by 30% of WCCP, however, it is preferable to be less than this amount.

Keywords: cellular concrete powder, waste cellular concrete powder (WCCP), supplementary cementatious material, SCM, activity index, mechanical properties

Procedia PDF Downloads 187
29193 Adoption of Big Data by Global Chemical Industries

Authors: Ashiff Khan, A. Seetharaman, Abhijit Dasgupta

Abstract:

The new era of big data (BD) is influencing chemical industries tremendously, providing several opportunities to reshape the way they operate and help them shift towards intelligent manufacturing. Given the availability of free software and the large amount of real-time data generated and stored in process plants, chemical industries are still in the early stages of big data adoption. The industry is just starting to realize the importance of the large amount of data it owns to make the right decisions and support its strategies. This article explores the importance of professional competencies and data science that influence BD in chemical industries to help it move towards intelligent manufacturing fast and reliable. This article utilizes a literature review and identifies potential applications in the chemical industry to move from conventional methods to a data-driven approach. The scope of this document is limited to the adoption of BD in chemical industries and the variables identified in this article. To achieve this objective, government, academia, and industry must work together to overcome all present and future challenges.

Keywords: chemical engineering, big data analytics, industrial revolution, professional competence, data science

Procedia PDF Downloads 55
29192 Power Plants between Environmental Pollution and Eco-Sustainable Recycling of Industrial Wastes

Authors: Liliana Crăc, Nicolae Giorgi, Gheorghe Fometescu, Mihai Cruceru

Abstract:

Power plants represent the main source of air pollution, through combustion processes, both by releasing large amounts of dust, greenhouse gases and acidifying, and large quantities of waste, slag and ash disposed in landfills covering significant areas. SC Turceni S.A. is one of the largest power generating unit from Romania. Their policy is focused on the production and delivery of electricity in order to increase energy efficiency and to reduce the environmental impact. The paper presents environmental impact produced by slag and ash storage, while pointing out that the recovery of this waste significant improves the air quality in the area. An important aspect is the proprieties of the ash and slag evacuated by Turceni power plant in order to use them for building materials manufacturing.

Keywords: ash and slag properties, air pollution, building materials industry, power plants

Procedia PDF Downloads 297
29191 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 248
29190 IoT Based Information Processing and Computing

Authors: Mannan Ahmad Rasheed, Sawera Kanwal, Mansoor Ahmad Rasheed

Abstract:

The Internet of Things (IoT) has revolutionized the way we collect and process information, making it possible to gather data from a wide range of connected devices and sensors. This has led to the development of IoT-based information processing and computing systems that are capable of handling large amounts of data in real time. This paper provides a comprehensive overview of the current state of IoT-based information processing and computing, as well as the key challenges and gaps that need to be addressed. This paper discusses the potential benefits of IoT-based information processing and computing, such as improved efficiency, enhanced decision-making, and cost savings. Despite the numerous benefits of IoT-based information processing and computing, several challenges need to be addressed to realize the full potential of these systems. These challenges include security and privacy concerns, interoperability issues, scalability and reliability of IoT devices, and the need for standardization and regulation of IoT technologies. Moreover, this paper identifies several gaps in the current research related to IoT-based information processing and computing. One major gap is the lack of a comprehensive framework for designing and implementing IoT-based information processing and computing systems.

Keywords: IoT, computing, information processing, Iot computing

Procedia PDF Downloads 150
29189 Studies on the Use of Sewage Sludge in Agriculture or in Incinerators

Authors: Catalina Iticescu, Lucian Georgescu, Mihaela Timofti, Dumitru Dima, Gabriel Murariu

Abstract:

The amounts of sludge resulting from the treatment of domestic and industrial wastewater can create serious environmental problems if no solutions are found to eliminate them. At present, the predominant method of sewage sludge disposal is to store and use them in agricultural applications. The sewage sludge has fertilizer properties and can be used to enrich agricultural soils due to the nutrient content. In addition to plant growth (nitrogen and phosphorus), the sludge also contains heavy metals in varying amounts. An increasingly used method is the incineration of sludge. Thermal processes can be used to convert large amounts of sludge into useful energy. The sewage sludge analyzed for the present paper was extracted from the Wastewater Treatment Station (WWTP) Galati, Romania. The physico-chemical parameters determined were: pH (upH), nutrients and heavy metals. The determination methods were electrochemical, spectrophotometric and energy dispersive X–ray analyses (EDX). The results of the tests made on the content of nutrients in the sewage sludge have shown that existing nutrients can be used to increase the fertility of agricultural soils. The conclusion reached was that these sludge can be safely used on agricultural land and with good agricultural productivity results. To be able to use sewage sludge as a fuel, we need to know its calorific values. For wet sludge, the caloric power is low, while for dry sludge it is high. Higher calorific value and lower calorific value are determined only for dry solids. The apparatus used to determine the calorific power was a Parr 6755 Solution Calorimeter Calorimeter (Parr Instrument Company USA 2010 model). The calorific capacities for the studied sludge indicate that they can be used successfully in incinerators. Mixed with coal, they can also be used to produce electricity. The advantages are: it reduces the cost of obtaining electricity and considerably reduces the amount of sewage sludge.

Keywords: agriculture, incinerators, properties, sewage sludge

Procedia PDF Downloads 150
29188 Indexing and Incremental Approach Using Map Reduce Bipartite Graph (MRBG) for Mining Evolving Big Data

Authors: Adarsh Shroff

Abstract:

Big data is a collection of dataset so large and complex that it becomes difficult to process using data base management tools. To perform operations like search, analysis, visualization on big data by using data mining; which is the process of extraction of patterns or knowledge from large data set. In recent years, the data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. This project uses i2MapReduce, an incremental processing extension to Map Reduce, the most widely used framework for mining big data. I2MapReduce performs key-value pair level incremental processing rather than task level re-computation, supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. To optimize the mining results, evaluate i2MapReduce using a one-step algorithm and three iterative algorithms with diverse computation characteristics for efficient mining.

Keywords: big data, map reduce, incremental processing, iterative computation

Procedia PDF Downloads 316
29187 Evaluation of UI for 3D Visualization-Based Building Information Applications

Authors: Monisha Pattanaik

Abstract:

In scenarios where users have to work with large amounts of hierarchical data structures combined with visualizations (For example, Construction 3d Models, Manufacturing equipment's models, Gantt charts, Building Plans), the data structures have a high density in terms of consisting multiple parent nodes up to 50 levels and their siblings to descendants, therefore convey an immediate feeling of complexity. With customers moving to consumer-grade enterprise software, it is crucial to make sophisticated features made available to touch devices or smaller screen sizes. This paper evaluates the UI component that allows users to scroll through all deep density levels using a slider overlay on top of the hierarchy table, performing several actions to focus on one set of objects at any point in time. This overlay component also solves the problem of excessive horizontal scrolling of the entire table on a fixed pane for a hierarchical table. This component can be customized to navigate through parents, only siblings, or a specific component of the hierarchy only. The evaluation of the UI component was done by End Users of application and Human-Computer Interaction (HCI) experts to test the UI component's usability with statistical results and recommendations to handle complex hierarchical data visualizations.

Keywords: building information modeling, digital twin, navigation, UI component, user interface, usability, visualization

Procedia PDF Downloads 114
29186 Luminescent Functionalized Graphene Oxide Based Sensitive Detection of Deadly Explosive TNP

Authors: Diptiman Dinda, Shyamal Kumar Saha

Abstract:

In the 21st century, sensitive and selective detection of trace amounts of explosives has become a serious problem. Generally, nitro compound and its derivatives are being used worldwide to prepare different explosives. Recently, TNP (2, 4, 6 trinitrophenol) is the most commonly used constituent to prepare powerful explosives all over the world. It is even powerful than TNT or RDX. As explosives are electron deficient in nature, it is very difficult to detect one separately from a mixture. Again, due to its tremendous water solubility, detection of TNP in presence of other explosives from water is very challenging. Simple instrumentation, cost-effective, fast and high sensitivity make fluorescence based optical sensing a grand success compared to other techniques. Graphene oxide (GO), with large no of epoxy grps, incorporate localized nonradiative electron-hole centres on its surface to give very weak fluorescence. In this work, GO is functionalized with 2, 6-diamino pyridine to remove those epoxy grps. through SN2 reaction. This makes GO into a bright blue luminescent fluorophore (DAP/rGO) which shows an intense PL spectrum at ∼384 nm when excited at 309 nm wavelength. We have also characterized the material by FTIR, XPS, UV, XRD and Raman measurements. Using this as fluorophore, a large fluorescence quenching (96%) is observed after addition of only 200 µL of 1 mM TNP in water solution. Other nitro explosives give very moderate PL quenching compared to TNP. Such high selectivity is related to the operation of FRET mechanism from fluorophore to TNP during this PL quenching experiment. TCSPC measurement also reveals that the lifetime of DAP/rGO drastically decreases from 3.7 to 1.9 ns after addition of TNP. Our material is also quite sensitive to 125 ppb level of TNP. Finally, we believe that this graphene based luminescent material will emerge a new class of sensing materials to detect trace amounts of explosives from aqueous solution.

Keywords: graphene, functionalization, fluorescence quenching, FRET, nitroexplosive detection

Procedia PDF Downloads 401
29185 VIAN-DH: Computational Multimodal Conversation Analysis Software and Infrastructure

Authors: Teodora Vukovic, Christoph Hottiger, Noah Bubenhofer

Abstract:

The development of VIAN-DH aims at bridging two linguistic approaches: conversation analysis/interactional linguistics (IL), so far a dominantly qualitative field, and computational/corpus linguistics and its quantitative and automated methods. Contemporary IL investigates the systematic organization of conversations and interactions composed of speech, gaze, gestures, and body positioning, among others. These highly integrated multimodal behaviour is analysed based on video data aimed at uncovering so called “multimodal gestalts”, patterns of linguistic and embodied conduct that reoccur in specific sequential positions employed for specific purposes. Multimodal analyses (and other disciplines using videos) are so far dependent on time and resource intensive processes of manual transcription of each component from video materials. Automating these tasks requires advanced programming skills, which is often not in the scope of IL. Moreover, the use of different tools makes the integration and analysis of different formats challenging. Consequently, IL research often deals with relatively small samples of annotated data which are suitable for qualitative analysis but not enough for making generalized empirical claims derived quantitatively. VIAN-DH aims to create a workspace where many annotation layers required for the multimodal analysis of videos can be created, processed, and correlated in one platform. VIAN-DH will provide a graphical interface that operates state-of-the-art tools for automating parts of the data processing. The integration of tools that already exist in computational linguistics and computer vision, facilitates data processing for researchers lacking programming skills, speeds up the overall research process, and enables the processing of large amounts of data. The main features to be introduced are automatic speech recognition for the transcription of language, automatic image recognition for extraction of gestures and other visual cues, as well as grammatical annotation for adding morphological and syntactic information to the verbal content. In the ongoing instance of VIAN-DH, we focus on gesture extraction (pointing gestures, in particular), making use of existing models created for sign language and adapting them for this specific purpose. In order to view and search the data, VIAN-DH will provide a unified format and enable the import of the main existing formats of annotated video data and the export to other formats used in the field, while integrating different data source formats in a way that they can be combined in research. VIAN-DH will adapt querying methods from corpus linguistics to enable parallel search of many annotation levels, combining token-level and chronological search for various types of data. VIAN-DH strives to bring crucial and potentially revolutionary innovation to the field of IL, (that can also extend to other fields using video materials). It will allow the processing of large amounts of data automatically and, the implementation of quantitative analyses, combining it with the qualitative approach. It will facilitate the investigation of correlations between linguistic patterns (lexical or grammatical) with conversational aspects (turn-taking or gestures). Users will be able to automatically transcribe and annotate visual, spoken and grammatical information from videos, and to correlate those different levels and perform queries and analyses.

Keywords: multimodal analysis, corpus linguistics, computational linguistics, image recognition, speech recognition

Procedia PDF Downloads 76
29184 Timing and Noise Data Mining Algorithm and Software Tool in Very Large Scale Integration (VLSI) Design

Authors: Qing K. Zhu

Abstract:

Very Large Scale Integration (VLSI) design becomes very complex due to the continuous integration of millions of gates in one chip based on Moore’s law. Designers have encountered numerous report files during design iterations using timing and noise analysis tools. This paper presented our work using data mining techniques combined with HTML tables to extract and represent critical timing/noise data. When we apply this data-mining tool in real applications, the running speed is important. The software employs table look-up techniques in the programming for the reasonable running speed based on performance testing results. We added several advanced features for the application in one industry chip design.

Keywords: VLSI design, data mining, big data, HTML forms, web, VLSI, EDA, timing, noise

Procedia PDF Downloads 228
29183 Toward Cloud E-learning System Based on Smart Tools

Authors: Mohsen Maraoui

Abstract:

In the face of the growth in the quantity of data produced, several methods and techniques appear to remedy the problems of processing and analyzing large amounts of information mainly in the field of teaching. In this paper, we propose an intelligent cloud-based teaching system for E-learning content services. This system makes easy the manipulation of various educational content forms, including text, images, videos, 3 dimensions objects and scenes of virtual reality and augmented reality. We discuss the integration of institutional and external services to provide personalized assistance to university members in their daily activities. The proposed system provides an intelligent solution for media services that can be accessed from smart devices cloud-based intelligent service environment with a fully integrated system.

Keywords: cloud computing, e-learning, indexation, IoT, learning in Arabic language, smart tools

Procedia PDF Downloads 107
29182 Big Data Analysis with Rhipe

Authors: Byung Ho Jung, Ji Eun Shin, Dong Hoon Lim

Abstract:

Rhipe that integrates R and Hadoop environment made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data. Experimental results for comparing the performance of our Rhipe with stats and biglm packages available on bigmemory, showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases. We also compared the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster. The results showed that fully-distributed mode was faster than pseudo-distributed mode, and computing speeds of fully-distributed mode were faster as the number of data nodes increases.

Keywords: big data, Hadoop, Parallel regression analysis, R, Rhipe

Procedia PDF Downloads 477
29181 Analysis of Genomics Big Data in Cloud Computing Using Fuzzy Logic

Authors: Mohammad Vahed, Ana Sadeghitohidi, Majid Vahed, Hiroki Takahashi

Abstract:

In the genomics field, the huge amounts of data have produced by the next-generation sequencers (NGS). Data volumes are very rapidly growing, as it is postulated that more than one billion bases will be produced per year in 2020. The growth rate of produced data is much faster than Moore's law in computer technology. This makes it more difficult to deal with genomics data, such as storing data, searching information, and finding the hidden information. It is required to develop the analysis platform for genomics big data. Cloud computing newly developed enables us to deal with big data more efficiently. Hadoop is one of the frameworks distributed computing and relies upon the core of a Big Data as a Service (BDaaS). Although many services have adopted this technology, e.g. amazon, there are a few applications in the biology field. Here, we propose a new algorithm to more efficiently deal with the genomics big data, e.g. sequencing data. Our algorithm consists of two parts: First is that BDaaS is applied for handling the data more efficiently. Second is that the hybrid method of MapReduce and Fuzzy logic is applied for data processing. This step can be parallelized in implementation. Our algorithm has great potential in computational analysis of genomics big data, e.g. de novo genome assembly and sequence similarity search. We will discuss our algorithm and its feasibility.

Keywords: big data, fuzzy logic, MapReduce, Hadoop, cloud computing

Procedia PDF Downloads 269
29180 Multi-Level Clustering Based Congestion Control Protocol for Cyber Physical Systems

Authors: Manpreet Kaur, Amita Rani, Sanjay Kumar

Abstract:

The Internet of Things (IoT), a cyber-physical paradigm, allows a large number of devices to connect and send the sensory data in the network simultaneously. This tremendous amount of data generated leads to very high network load consequently resulting in network congestion. It further amounts to frequent loss of useful information and depletion of significant amount of nodes’ energy. Therefore, there is a need to control congestion in IoT so as to prolong network lifetime and improve the quality of service (QoS). Hence, we propose a two-level clustering based routing algorithm considering congestion score and packet priority metrics that focus on minimizing the network congestion. In the proposed Priority based Congestion Control (PBCC) protocol the sensor nodes in IoT network form clusters that reduces the amount of traffic and the nodes are prioritized to emphasize important data. Simultaneously, a congestion score determines the occurrence of congestion at a particular node. The proposed protocol outperforms the existing Packet Discard Network Clustering (PDNC) protocol in terms of buffer size, packet transmission range, network region and number of nodes, under various simulation scenarios.

Keywords: internet of things, cyber-physical systems, congestion control, priority, transmission rate

Procedia PDF Downloads 283
29179 Development of Broad Spectrum Nitrilase Biocatalysts and Bioprocesses for Nitrile Biotransformation

Authors: Avinash Vellore Sunder, Shikha Shah, Pramod P. Wangikar

Abstract:

The enzymatic conversion of nitriles to carboxylic acids by nitrilases has gained significance in the green synthesis of several pharmaceutical precursors and fine chemicals. While nitrilases have been characterized from different sources, the industrial application requires the identification of nitrilases that possess higher substrate tolerance, wider specificity and better thermostability, along with the development of an efficient bioprocess for producing large amounts of nitrilase. To produce large amounts of nitrilase, we developed a fed-batch fermentation process on defined media for the high cell density cultivation of E. coli cells expressing the well-studied nitrilase from Alcaligenes fecalis. A DO-stat feeding approach was employed combined with an optimized post-induction strategy to achieve nitrilase titer of 2.5*105 U/l and 78 g/l dry cell weight. We also identified 16 novel nitrilase sequences from genome mining and analysis of substrate binding residues. The nitrilases were expressed in E. coli and their biocatalytic potential was evaluated on a panel of 22 industrially relevant nitrile substrates using high-throughput screening and HPLC analysis. Nine nitrilases were identified to exhibit high activity on structurally diverse nitriles including aliphatic and aromatic dinitriles, heterocyclic, -hydroxy and -keto nitriles. With fed-batch biotransformation, whole-cell Zobelia galactanivorans nitrilase achieved yields of 2.4 M nicotinic acid and 1.8 M isonicotinic acid from 3-cyanopyridine and 4-cyanopyridine respectively within 5 h, while Cupravidus necator nitrilase enantioselectively converted 740 mM mandelonitrile to (R)–mandelic acid. The nitrilase from Achromobacter insolitus could hydrolyze 542 mM iminodiacetonitrile in 1 h. The availability of highly active nitrilases along with bioprocesses for enzyme production expands the toolbox for industrial biocatalysis.

Keywords: biocatalysis, isonicotinic acid, iminodiacetic acid, mandelic acid, nitrilase

Procedia PDF Downloads 200
29178 Proxisch: An Optimization Approach of Large-Scale Unstable Proxy Servers Scheduling

Authors: Xiaoming Jiang, Jinqiao Shi, Qingfeng Tan, Wentao Zhang, Xuebin Wang, Muqian Chen

Abstract:

Nowadays, big companies such as Google, Microsoft, which have adequate proxy servers, have perfectly implemented their web crawlers for a certain website in parallel. But due to lack of expensive proxy servers, it is still a puzzle for researchers to crawl large amounts of information from a single website in parallel. In this case, it is a good choice for researchers to use free public proxy servers which are crawled from the Internet. In order to improve efficiency of web crawler, the following two issues should be considered primarily: (1) Tasks may fail owing to the instability of free proxy servers; (2) A proxy server will be blocked if it visits a single website frequently. In this paper, we propose Proxisch, an optimization approach of large-scale unstable proxy servers scheduling, which allow anyone with extremely low cost to run a web crawler efficiently. Proxisch is designed to work efficiently by making maximum use of reliable proxy servers. To solve second problem, it establishes a frequency control mechanism which can ensure the visiting frequency of any chosen proxy server below the website’s limit. The results show that our approach performs better than the other scheduling algorithms.

Keywords: proxy server, priority queue, optimization algorithm, distributed web crawling

Procedia PDF Downloads 186
29177 Microarray Gene Expression Data Dimensionality Reduction Using PCA

Authors: Fuad M. Alkoot

Abstract:

Different experimental technologies such as microarray sequencing have been proposed to generate high-resolution genetic data, in order to understand the complex dynamic interactions between complex diseases and the biological system components of genes and gene products. However, the generated samples have a very large dimension reaching thousands. Therefore, hindering all attempts to design a classifier system that can identify diseases based on such data. Additionally, the high overlap in the class distributions makes the task more difficult. The data we experiment with is generated for the identification of autism. It includes 142 samples, which is small compared to the large dimension of the data. The classifier systems trained on this data yield very low classification rates that are almost equivalent to a guess. We aim at reducing the data dimension and improve it for classification. Here, we experiment with applying a multistage PCA on the genetic data to reduce its dimensionality. Results show a significant improvement in the classification rates which increases the possibility of building an automated system for autism detection.

Keywords: PCA, gene expression, dimensionality reduction, classification, autism

Procedia PDF Downloads 533
29176 An Analysis of Sequential Pattern Mining on Databases Using Approximate Sequential Patterns

Authors: J. Suneetha, Vijayalaxmi

Abstract:

Sequential Pattern Mining involves applying data mining methods to large data repositories to extract usage patterns. Sequential pattern mining methodologies used to analyze the data and identify patterns. The patterns have been used to implement efficient systems can recommend on previously observed patterns, in making predictions, improve usability of systems, detecting events, and in general help in making strategic product decisions. In this paper, identified performance of approximate sequential pattern mining defines as identifying patterns approximately shared with many sequences. Approximate sequential patterns can effectively summarize and represent the databases by identifying the underlying trends in the data. Conducting an extensive and systematic performance over synthetic and real data. The results demonstrate that ApproxMAP effective and scalable in mining large sequences databases with long patterns.

Keywords: multiple data, performance analysis, sequential pattern, sequence database scalability

Procedia PDF Downloads 306
29175 Volatile Composition of Sucuks: A Traditional Dry-Fermented Sausage Affected by Meat and Fat Types

Authors: Mina Kargozari, Isabel Revilla Martin, Ángel A. Carbonell-Barrachina, Antoni Szumny

Abstract:

The profiles of volatile compounds of differently formulated sausages including CH (camel meat-hump), CB (camel meat-beef fat), BH (beef-hump) and BB (beef-beef fat) were analyzed by gas chromatography/mass spectrometry (GC-MS) using a solid phase micro-extraction (SPME) in order to investigate the role of meat and fat type in aroma compounds release. A total of 47 compounds identified, were consisted of 3 acids, 1 ester, 3 alcohols, 7 aldehydes, 5 sulphur compounds, and 27 terpenes. The significant differences were observed in the aroma compounds among four batches. The CH sucuk samples containing the highest (p<0.05) fat amount among the others showed higher amounts of volatiles in consequence. The sausages prepared with hump showed higher amounts of aldehydes and lower amounts of terpenes compared to the sausages made with beef fat (p<0.05). It seemed that meat type had an inconsiderable effect on the volatile profile of the sausages.

Keywords: aromatic compounds, camel meat, hump, SPME

Procedia PDF Downloads 402
29174 Interpreting Privacy Harms from a Non-Economic Perspective

Authors: Christopher Muhawe, Masooda Bashir

Abstract:

With increased Internet Communication Technology(ICT), the virtual world has become the new normal. At the same time, there is an unprecedented collection of massive amounts of data by both private and public entities. Unfortunately, this increase in data collection has been in tandem with an increase in data misuse and data breach. Regrettably, the majority of data breach and data misuse claims have been unsuccessful in the United States courts for the failure of proof of direct injury to physical or economic interests. The requirement to express data privacy harms from an economic or physical stance negates the fact that not all data harms are physical or economic in nature. The challenge is compounded by the fact that data breach harms and risks do not attach immediately. This research will use a descriptive and normative approach to show that not all data harms can be expressed in economic or physical terms. Expressing privacy harms purely from an economic or physical harm perspective negates the fact that data insecurity may result into harms which run counter the functions of privacy in our lives. The promotion of liberty, selfhood, autonomy, promotion of human social relations and the furtherance of the existence of a free society. There is no economic value that can be placed on these functions of privacy. The proposed approach addresses data harms from a psychological and social perspective.

Keywords: data breach and misuse, economic harms, privacy harms, psychological harms

Procedia PDF Downloads 167
29173 The Business of American Football: The Kicker Position and Performance to Salary Correlation

Authors: James R. Ogden, Denise T. Ogden

Abstract:

The National Football League (USA) is the largest sporting business in the United States. In order to generate revenue, it is important that NFL teams win. Coaches, owners and general managers of the NFL teams want to create powerful teams with reliable players and they are willing to spend large amounts of money in order to do so. This research looks at one of the National Football League’s key players, the kicker. It would be intuitively obvious to suggest that those kickers who perform the best get paid the most. In this paper the researchers performed a correlation and regression analysis to determine if there is a correlation between an NFL kicker’s field goal percentage and salary. The research proposition was that higher performing kickers receive higher salaries. The data suggest that there is no correlation between salary and on-field performance.

Keywords: business management, sports marketing, tourism, American football

Procedia PDF Downloads 275