Search results for: missing data estimation
25322 Improved K-Means Clustering Algorithm Using RHadoop with Combiner
Authors: Ji Eun Shin, Dong Hoon Lim
Abstract:
Data clustering is a common technique used in data analysis and is used in many applications, such as artificial intelligence, pattern recognition, economics, ecology, psychiatry and marketing. K-means clustering is a well-known clustering algorithm aiming to cluster a set of data points to a predefined number of clusters. In this paper, we implement K-means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. The main idea is to introduce a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. The experimental results demonstrated that K-means algorithm using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also showed that our K-means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases.Keywords: big data, combiner, K-means clustering, RHadoop
Procedia PDF Downloads 44025321 A Self Organized Map Method to Classify Auditory-Color Synesthesia from Frontal Lobe Brain Blood Volume
Authors: Takashi Kaburagi, Takamasa Komura, Yosuke Kurihara
Abstract:
Absolute pitch is the ability to identify a musical note without a reference tone. Training for absolute pitch often occurs in preschool education. It is necessary to clarify how well the trainee can make use of synesthesia in order to evaluate the effect of the training. To the best of our knowledge, there are no existing methods for objectively confirming whether the subject is using synesthesia. Therefore, in this study, we present a method to distinguish the use of color-auditory synesthesia from the separate use of color and audition during absolute pitch training. This method measures blood volume in the prefrontal cortex using functional Near-infrared spectroscopy (fNIRS) and assumes that the cognitive step has two parts, a non-linear step and a linear step. For the linear step, we assume a second order ordinary differential equation. For the non-linear part, it is extremely difficult, if not impossible, to create an inverse filter of such a complex system as the brain. Therefore, we apply a method based on a self-organizing map (SOM) and are guided by the available data. The presented method was tested using 15 subjects, and the estimation accuracy is reported.Keywords: absolute pitch, functional near-infrared spectroscopy, prefrontal cortex, synesthesia
Procedia PDF Downloads 26325320 Framework for Integrating Big Data and Thick Data: Understanding Customers Better
Authors: Nikita Valluri, Vatcharaporn Esichaikul
Abstract:
With the popularity of data-driven decision making on the rise, this study focuses on providing an alternative outlook towards the process of decision-making. Combining quantitative and qualitative methods rooted in the social sciences, an integrated framework is presented with a focus on delivering a much more robust and efficient approach towards the concept of data-driven decision-making with respect to not only Big data but also 'Thick data', a new form of qualitative data. In support of this, an example from the retail sector has been illustrated where the framework is put into action to yield insights and leverage business intelligence. An interpretive approach to analyze findings from both kinds of quantitative and qualitative data has been used to glean insights. Using traditional Point-of-sale data as well as an understanding of customer psychographics and preferences, techniques of data mining along with qualitative methods (such as grounded theory, ethnomethodology, etc.) are applied. This study’s final goal is to establish the framework as a basis for providing a holistic solution encompassing both the Big and Thick aspects of any business need. The proposed framework is a modified enhancement in lieu of traditional data-driven decision-making approach, which is mainly dependent on quantitative data for decision-making.Keywords: big data, customer behavior, customer experience, data mining, qualitative methods, quantitative methods, thick data
Procedia PDF Downloads 16325319 Adequacy of Advanced Earthquake Intensity Measures for Estimation of Damage under Seismic Excitation with Arbitrary Orientation
Authors: Konstantinos G. Kostinakis, Manthos K. Papadopoulos, Asimina M. Athanatopoulou
Abstract:
An important area of research in seismic risk analysis is the evaluation of expected seismic damage of structures under a specific earthquake ground motion. Several conventional intensity measures of ground motion have been used to estimate their damage potential to structures. Yet, none of them was proved to be able to predict adequately the seismic damage of any structural system. Therefore, alternative advanced intensity measures which take into account not only ground motion characteristics but also structural information have been proposed. The adequacy of a number of advanced earthquake intensity measures in prediction of structural damage of 3D R/C buildings under seismic excitation which attacks the building with arbitrary incident angle is investigated in the present paper. To achieve this purpose, a symmetric in plan and an asymmetric 5-story R/C building are studied. The two buildings are subjected to 20 bidirectional earthquake ground motions. The two horizontal accelerograms of each ground motion are applied along horizontal orthogonal axes forming 72 different angles with the structural axes. The response is computed by non-linear time history analysis. The structural damage is expressed in terms of the maximum interstory drift as well as the overall structural damage index. The values of the aforementioned seismic damage measures determined for incident angle 0° as well as their maximum values over all seismic incident angles are correlated with 9 structure-specific ground motion intensity measures. The research identified certain intensity measures which exhibited strong correlation with the seismic damage of the two buildings. However, their adequacy for estimation of the structural damage depends on the response parameter adopted. Furthermore, it was confirmed that the widely used spectral acceleration at the fundamental period of the structure is a good indicator of the expected earthquake damage level.Keywords: damage indices, non-linear response, seismic excitation angle, structure-specific intensity measures
Procedia PDF Downloads 49425318 Prediction of Formation Pressure Using Artificial Intelligence Techniques
Authors: Abdulmalek Ahmed
Abstract:
Formation pressure is the main function that affects drilling operation economically and efficiently. Knowing the pore pressure and the parameters that affect it will help to reduce the cost of drilling process. Many empirical models reported in the literature were used to calculate the formation pressure based on different parameters. Some of these models used only drilling parameters to estimate pore pressure. Other models predicted the formation pressure based on log data. All of these models required different trends such as normal or abnormal to predict the pore pressure. Few researchers applied artificial intelligence (AI) techniques to predict the formation pressure by only one method or a maximum of two methods of AI. The objective of this research is to predict the pore pressure based on both drilling parameters and log data namely; weight on bit, rotary speed, rate of penetration, mud weight, bulk density, porosity and delta sonic time. A real field data is used to predict the formation pressure using five different artificial intelligence (AI) methods such as; artificial neural networks (ANN), radial basis function (RBF), fuzzy logic (FL), support vector machine (SVM) and functional networks (FN). All AI tools were compared with different empirical models. AI methods estimated the formation pressure by a high accuracy (high correlation coefficient and low average absolute percentage error) and outperformed all previous. The advantage of the new technique is its simplicity, which represented from its estimation of pore pressure without the need of different trends as compared to other models which require a two different trend (normal or abnormal pressure). Moreover, by comparing the AI tools with each other, the results indicate that SVM has the advantage of pore pressure prediction by its fast processing speed and high performance (a high correlation coefficient of 0.997 and a low average absolute percentage error of 0.14%). In the end, a new empirical correlation for formation pressure was developed using ANN method that can estimate pore pressure with a high precision (correlation coefficient of 0.998 and average absolute percentage error of 0.17%).Keywords: Artificial Intelligence (AI), Formation pressure, Artificial Neural Networks (ANN), Fuzzy Logic (FL), Support Vector Machine (SVM), Functional Networks (FN), Radial Basis Function (RBF)
Procedia PDF Downloads 15025317 Incremental Learning of Independent Topic Analysis
Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda
Abstract:
In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.Keywords: text mining, topic extraction, independent, incremental, independent component analysis
Procedia PDF Downloads 30925316 Open Data for e-Governance: Case Study of Bangladesh
Authors: Sami Kabir, Sadek Hossain Khoka
Abstract:
Open Government Data (OGD) refers to all data produced by government which are accessible in reusable way by common people with access to Internet and at free of cost. In line with “Digital Bangladesh” vision of Bangladesh government, the concept of open data has been gaining momentum in the country. Opening all government data in digital and customizable format from single platform can enhance e-governance which will make government more transparent to the people. This paper presents a well-in-progress case study on OGD portal by Bangladesh Government in order to link decentralized data. The initiative is intended to facilitate e-service towards citizens through this one-stop web portal. The paper further discusses ways of collecting data in digital format from relevant agencies with a view to making it publicly available through this single point of access. Further, possible layout of this web portal is presented.Keywords: e-governance, one-stop web portal, open government data, reusable data, web of data
Procedia PDF Downloads 35625315 Verification of Space System Dynamics Using the MATLAB Identification Toolbox in Space Qualification Test
Authors: Yuri V. Kim
Abstract:
This article presents a new approach to the Functional Testing of Space Systems (SS). It can be considered as a generic test and used for a wide class of SS that from the point of view of System Dynamics and Control may be described by the ordinary differential equations. Suggested methodology is based on using semi-natural experiment- laboratory stand that doesn’t require complicated, precise and expensive technological control-verification equipment. However, it allows for testing system as a whole totally assembled unit during Assembling, Integration and Testing (AIT) activities, involving system hardware (HW) and software (SW). The test physically activates system input (sensors) and output (actuators) and requires recording their outputs in real time. The data is then inserted in laboratory PC where it is post-experiment processed by Matlab/Simulink Identification Toolbox. It allows for estimating system dynamics in form of estimation of system differential equations by the experimental way and comparing them with expected mathematical model prematurely verified by mathematical simulation during the design process.Keywords: system dynamics, space system ground tests and space qualification, system dynamics identification, satellite attitude control, assembling, integration and testing
Procedia PDF Downloads 16325314 Modeling of Flows in Porous Materials under Pressure Difference
Authors: Nicoleta O. Tanase, Ciprian S. Mateescu
Abstract:
This paper is concerned with the numerical study of the flow through porous media. The purpose of this project is to determine the permeability of a medium and its connection to porosity to be able to identify how the permeability of said medium can be altered without changing the porosity. The numerical simulations are performed in 2D flow configurations with the laminar solvers implemented in Workbench - ANSYS Fluent. The direction of flow of the working fluid (water) is axial, from left to right, and in steady-state conditions. The working fluid is water. The 2D geometry is a channel with 300 mm length and 30 mm width, with a different number of circles that are positioned differently, modelling a porous medium. The permeability of a porous medium can be altered without changing the porosity by positioning the circles differently (by missing the same number of circles) in the flow domain, which induces a change in the flow spectrum. The main goal of the paper is to investigate the flow pattern and permeability under controlled perturbations induced by the variation of velocity and porous medium. Numerical solutions provide insight into all flow magnitudes, one of the most important being the WSS distribution on the circles.Keywords: CFD, porous media, permeability, flow spectrum
Procedia PDF Downloads 5525313 Resource Framework Descriptors for Interestingness in Data
Authors: C. B. Abhilash, Kavi Mahesh
Abstract:
Human beings are the most advanced species on earth; it's all because of the ability to communicate and share information via human language. In today's world, a huge amount of data is available on the web in text format. This has also resulted in the generation of big data in structured and unstructured formats. In general, the data is in the textual form, which is highly unstructured. To get insights and actionable content from this data, we need to incorporate the concepts of text mining and natural language processing. In our study, we mainly focus on Interesting data through which interesting facts are generated for the knowledge base. The approach is to derive the analytics from the text via the application of natural language processing. Using semantic web Resource framework descriptors (RDF), we generate the triple from the given data and derive the interesting patterns. The methodology also illustrates data integration using the RDF for reliable, interesting patterns.Keywords: RDF, interestingness, knowledge base, semantic data
Procedia PDF Downloads 16425312 Rejuvenate: Face and Body Retouching Using Image Inpainting
Authors: Hossam Abdelrahman, Sama Rostom, Reem Yassein, Yara Mohamed, Salma Salah, Nour Awny
Abstract:
In today’s environment, people are becoming increasingly interested in their appearance. However, they are afraid of their unknown appearance after a plastic surgery or treatment. Accidents, burns and genetic problems such as bowing of body parts of people have a negative impact on their mental health with their appearance and this makes them feel uncomfortable and underestimated. The approach presents a revolutionary deep learning-based image inpainting method that analyses the various picture structures and corrects damaged images. In this study, A model is proposed based on the in-painting of medical images with Stable Diffusion Inpainting method. Reconstructing missing and damaged sections of an image is known as image inpainting is a key progress facilitated by deep neural networks. The system uses the input of the user of an image to indicate a problem, the system will then modify the image and output the fixed image, facilitating for the patient to see the final result.Keywords: generative adversarial network, large mask inpainting, stable diffusion inpainting, plastic surgery
Procedia PDF Downloads 7725311 Web-Based Decision Support Systems and Intelligent Decision-Making: A Systematic Analysis
Authors: Serhat Tüzün, Tufan Demirel
Abstract:
Decision Support Systems (DSS) have been investigated by researchers and technologists for more than 35 years. This paper analyses the developments in the architecture and software of these systems, provides a systematic analysis for different Web-based DSS approaches and Intelligent Decision-making Technologies (IDT), with the suggestion for future studies. Decision Support Systems literature begins with building model-oriented DSS in the late 1960s, theory developments in the 1970s, and the implementation of financial planning systems and Group DSS in the early and mid-80s. Then it documents the origins of Executive Information Systems, online analytic processing (OLAP) and Business Intelligence. The implementation of Web-based DSS occurred in the mid-1990s. With the beginning of the new millennia, intelligence is the main focus on DSS studies. Web-based technologies are having a major impact on design, development and implementation processes for all types of DSS. Web technologies are being utilized for the development of DSS tools by leading developers of decision support technologies. Major companies are encouraging its customers to port their DSS applications, such as data mining, customer relationship management (CRM) and OLAP systems, to a web-based environment. Similarly, real-time data fed from manufacturing plants are now helping floor managers make decisions regarding production adjustment to ensure that high-quality products are produced and delivered. Web-based DSS are being employed by organizations as decision aids for employees as well as customers. A common usage of Web-based DSS has been to assist customers configure product and service according to their needs. These systems allow individual customers to design their own products by choosing from a menu of attributes, components, prices and delivery options. The Intelligent Decision-making Technologies (IDT) domain is a fast growing area of research that integrates various aspects of computer science and information systems. This includes intelligent systems, intelligent technology, intelligent agents, artificial intelligence, fuzzy logic, neural networks, machine learning, knowledge discovery, computational intelligence, data science, big data analytics, inference engines, recommender systems or engines, and a variety of related disciplines. Innovative applications that emerge using IDT often have a significant impact on decision-making processes in government, industry, business, and academia in general. This is particularly pronounced in finance, accounting, healthcare, computer networks, real-time safety monitoring and crisis response systems. Similarly, IDT is commonly used in military decision-making systems, security, marketing, stock market prediction, and robotics. Even though lots of research studies have been conducted on Decision Support Systems, a systematic analysis on the subject is still missing. Because of this necessity, this paper has been prepared to search recent articles about the DSS. The literature has been deeply reviewed and by classifying previous studies according to their preferences, taxonomy for DSS has been prepared. With the aid of the taxonomic review and the recent developments over the subject, this study aims to analyze the future trends in decision support systems.Keywords: decision support systems, intelligent decision-making, systematic analysis, taxonomic review
Procedia PDF Downloads 28025310 Data Mining Practices: Practical Studies on the Telecommunication Companies in Jordan
Authors: Dina Ahmad Alkhodary
Abstract:
This study aimed to investigate the practices of Data Mining on the telecommunication companies in Jordan, from the viewpoint of the respondents. In order to achieve the goal of the study, and test the validity of hypotheses, the researcher has designed a questionnaire to collect data from managers and staff members from main department in the researched companies. The results shows improvements stages of the telecommunications companies towered Data Mining.Keywords: data, mining, development, business
Procedia PDF Downloads 49825309 Electrical Investigations of Polyaniline/Graphitic Carbon Nitride Composites Using Broadband Dielectric Spectroscopy
Authors: M. A. Moussa, M. H. Abdel Rehim, G.M. Turky
Abstract:
Polyaniline composites with carbon nitride, to overcome compatibility restriction with graphene, were prepared with the solution method. FTIR and Uv-vis spectra were used for structural conformation. While XRD and XPS confirmed the structures in addition to estimation of nitrogen atom surroundings, the pore sizes and the active surface area were determined from BET adsorption isotherm. The electrical and dielectric parameters were measured and calculated with BDS .Keywords: carbon nitride, dynamic relaxation, electrical conductivity, polyaniline
Procedia PDF Downloads 14325308 The Impact of System and Data Quality on Organizational Success in the Kingdom of Bahrain
Authors: Amal M. Alrayes
Abstract:
Data and system quality play a central role in organizational success, and the quality of any existing information system has a major influence on the effectiveness of overall system performance.Given the importance of system and data quality to an organization, it is relevant to highlight their importance on organizational performance in the Kingdom of Bahrain. This research aims to discover whether system quality and data quality are related, and to study the impact of system and data quality on organizational success. A theoretical model based on previous research is used to show the relationship between data and system quality, and organizational impact. We hypothesize, first, that system quality is positively associated with organizational impact, secondly that system quality is positively associated with data quality, and finally that data quality is positively associated with organizational impact. A questionnaire was conducted among public and private organizations in the Kingdom of Bahrain. The results show that there is a strong association between data and system quality, that affects organizational success.Keywords: data quality, performance, system quality, Kingdom of Bahrain
Procedia PDF Downloads 49725307 Cloud Computing in Data Mining: A Technical Survey
Authors: Ghaemi Reza, Abdollahi Hamid, Dashti Elham
Abstract:
Cloud computing poses a diversity of challenges in data mining operation arising out of the dynamic structure of data distribution as against the use of typical database scenarios in conventional architecture. Due to immense number of users seeking data on daily basis, there is a serious security concerns to cloud providers as well as data providers who put their data on the cloud computing environment. Big data analytics use compute intensive data mining algorithms (Hidden markov, MapReduce parallel programming, Mahot Project, Hadoop distributed file system, K-Means and KMediod, Apriori) that require efficient high performance processors to produce timely results. Data mining algorithms to solve or optimize the model parameters. The challenges that operation has to encounter is the successful transactions to be established with the existing virtual machine environment and the databases to be kept under the control. Several factors have led to the distributed data mining from normal or centralized mining. The approach is as a SaaS which uses multi-agent systems for implementing the different tasks of system. There are still some problems of data mining based on cloud computing, including design and selection of data mining algorithms.Keywords: cloud computing, data mining, computing models, cloud services
Procedia PDF Downloads 48125306 Cross-border Data Transfers to and from South Africa
Authors: Amy Gooden, Meshandren Naidoo
Abstract:
Genetic research and transfers of big data are not confined to a particular jurisdiction, but there is a lack of clarity regarding the legal requirements for importing and exporting such data. Using direct-to-consumer genetic testing (DTC-GT) as an example, this research assesses the status of data sharing into and out of South Africa (SA). While SA laws cover the sending of genetic data out of SA, prohibiting such transfer unless a legal ground exists, the position where genetic data comes into the country depends on the laws of the country from where it is sent – making the legal position less clear.Keywords: cross-border, data, genetic testing, law, regulation, research, sharing, South Africa
Procedia PDF Downloads 12725305 Performance Evaluation of MIMO-OFDM Communication Systems
Authors: M. I. Youssef, A. E. Emam, M. Abd Elghany
Abstract:
This paper evaluates the bit error rate (BER) performance of MIMO-OFDM communication system. MIMO system uses multiple transmitting and receiving antennas with different coding techniques to either enhance the transmission diversity or spatial multiplexing gain. Utilizing alamouti algorithm were the same information transmitted over multiple antennas at different time intervals and then collected again at the receivers to minimize the probability of error, combat fading and thus improve the received signal to noise ratio. While utilizing V-BLAST algorithm, the transmitted signals are divided into different transmitting channels and transferred over the channel to be received by different receiving antennas to increase the transmitted data rate and achieve higher throughput. The paper provides a study of different diversity gain coding schemes and spatial multiplexing coding for MIMO systems. A comparison of various channels' estimation and equalization techniques are given. The simulation is implemented using MATLAB, and the results had shown the performance of transmission models under different channel environments.Keywords: MIMO communication, BER, space codes, channels, alamouti, V-BLAST
Procedia PDF Downloads 17525304 Teaching the Tacit Nuances of Japanese Onomatopoeia through an E-Learning System: An Evaluation Approach of Narrative Interpretation
Authors: Xiao-Yan Li, Takashi Hashimoto, Guanhong Li, Shuo Yang
Abstract:
In Japanese, onomatopoeia is an important element in the lively expression of feelings and experiences. It is very difficult for students of Japanese to acquire onomatopoeia, especially, its nuances. In this paper, based on traditional L2 learning theories, we propose a new method to improve the efficiency of teaching the nuances – both explicit and tacit - to non-native speakers of Japanese. The method for teaching the tacit nuances of onomatopoeia consists of three elements. First is to teach the formal rules representing the explicit nuances of onomatopoeic words. Second is to have the students create new onomatopoeic words by utilizing those formal rules. The last element is to provide feedback by evaluating the onomatopoeias created. Our previous study used five-grade relative estimation. However students were confused about the five-grade system, because they could not understand the evaluation criteria only based on a figure. In this new system, then, we built an evaluation database through native speakers’ narrative interpretation. We asked Japanese native speakers to describe their awareness of the nuances of onomatopoeia in writing. Then they voted on site and defined priorities for showing to learners on the system. To verify the effectiveness of the proposed method and the learning system, we conducted a preliminary experiment involving two groups of subjects. While Group A got feedback about the appropriateness of their onomatopoeic constructions from the native speakers’ narrative interpretation, Group B got feedback just in the form of the five-grade relative estimation. A questionnaire survey administered to all of the learners clarified our learning system availability and also identified areas that should be improved. Repetitive learning of word-formation rules, creating new onomatopoeias and gaining new awareness from narrative interpretation is the total process used to teach the explicit and tacit nuances of onomatopoeia.Keywords: onomatopoeia, tacit nuance, narrative interpretation, e-learning system, second language teaching
Procedia PDF Downloads 39725303 The Study of Security Techniques on Information System for Decision Making
Authors: Tejinder Singh
Abstract:
Information system is the flow of data from different levels to different directions for decision making and data operations in information system (IS). Data can be violated by different manner like manual or technical errors, data tampering or loss of integrity. Security system called firewall of IS is effected by such type of violations. The flow of data among various levels of Information System is done by networking system. The flow of data on network is in form of packets or frames. To protect these packets from unauthorized access, virus attacks, and to maintain the integrity level, network security is an important factor. To protect the data to get pirated, various security techniques are used. This paper represents the various security techniques and signifies different harmful attacks with the help of detailed data analysis. This paper will be beneficial for the organizations to make the system more secure, effective, and beneficial for future decisions making.Keywords: information systems, data integrity, TCP/IP network, vulnerability, decision, data
Procedia PDF Downloads 30825302 Data Integration with Geographic Information System Tools for Rural Environmental Monitoring
Authors: Tamas Jancso, Andrea Podor, Eva Nagyne Hajnal, Peter Udvardy, Gabor Nagy, Attila Varga, Meng Qingyan
Abstract:
The paper deals with the conditions and circumstances of integration of remotely sensed data for rural environmental monitoring purposes. The main task is to make decisions during the integration process when we have data sources with different resolution, location, spectral channels, and dimension. In order to have exact knowledge about the integration and data fusion possibilities, it is necessary to know the properties (metadata) that characterize the data. The paper explains the joining of these data sources using their attribute data through a sample project. The resulted product will be used for rural environmental analysis.Keywords: remote sensing, GIS, metadata, integration, environmental analysis
Procedia PDF Downloads 12125301 Analysis of Genomics Big Data in Cloud Computing Using Fuzzy Logic
Authors: Mohammad Vahed, Ana Sadeghitohidi, Majid Vahed, Hiroki Takahashi
Abstract:
In the genomics field, the huge amounts of data have produced by the next-generation sequencers (NGS). Data volumes are very rapidly growing, as it is postulated that more than one billion bases will be produced per year in 2020. The growth rate of produced data is much faster than Moore's law in computer technology. This makes it more difficult to deal with genomics data, such as storing data, searching information, and finding the hidden information. It is required to develop the analysis platform for genomics big data. Cloud computing newly developed enables us to deal with big data more efficiently. Hadoop is one of the frameworks distributed computing and relies upon the core of a Big Data as a Service (BDaaS). Although many services have adopted this technology, e.g. amazon, there are a few applications in the biology field. Here, we propose a new algorithm to more efficiently deal with the genomics big data, e.g. sequencing data. Our algorithm consists of two parts: First is that BDaaS is applied for handling the data more efficiently. Second is that the hybrid method of MapReduce and Fuzzy logic is applied for data processing. This step can be parallelized in implementation. Our algorithm has great potential in computational analysis of genomics big data, e.g. de novo genome assembly and sequence similarity search. We will discuss our algorithm and its feasibility.Keywords: big data, fuzzy logic, MapReduce, Hadoop, cloud computing
Procedia PDF Downloads 30025300 Forthcoming Big Data on Smart Buildings and Cities: An Experimental Study on Correlations among Urban Data
Authors: Yu-Mi Song, Sung-Ah Kim, Dongyoun Shin
Abstract:
Cities are complex systems of diverse and inter-tangled activities. These activities and their complex interrelationships create diverse urban phenomena. And such urban phenomena have considerable influences on the lives of citizens. This research aimed to develop a method to reveal the causes and effects among diverse urban elements in order to enable better understanding of urban activities and, therefrom, to make better urban planning strategies. Specifically, this study was conducted to solve a data-recommendation problem found on a Korean public data homepage. First, a correlation analysis was conducted to find the correlations among random urban data. Then, based on the results of that correlation analysis, the weighted data network of each urban data was provided to people. It is expected that the weights of urban data thereby obtained will provide us with insights into cities and show us how diverse urban activities influence each other and induce feedback.Keywords: big data, machine learning, ontology model, urban data model
Procedia PDF Downloads 41925299 Estimation of Morbidity Level of Industrial Labour Conditions at Zestafoni Ferroalloy Plant
Authors: M. Turmanauli, T. Todua, O. Gvaberidze, R. Javakhadze, N. Chkhaidze, N. Khatiashvili
Abstract:
Background: Mining process has the significant influence on human health and quality of life. In recent years the events in Georgia were reflected on the industry working process, especially minimal requirements of labor safety, hygiene standards of workplace and the regime of work and rest are not observed. This situation is often caused by the lack of responsibility, awareness, and knowledge both of workers and employers. The control of working conditions and its protection has been worsened in many of industries. Materials and Methods: For evaluation of the current situation the prospective epidemiological study by face to face interview method was conducted at Georgian “Manganese Zestafoni Ferroalloy Plant” in 2011-2013. 65.7% of employees (1428 bulletin) were surveyed and the incidence rates of temporary disability days were studied. Results: The average length of a temporary disability single accident was studied taking into consideration as sex groups as well as the whole cohort. According to the classes of harmfulness the following results were received: Class 2.0-10.3%; 3.1-12.4%; 3.2-35.1%; 3.3-12.1%; 3.4-17.6%; 4.0-12.5%. Among the employees 47.5% and 83.1% were tobacco and alcohol consumers respectively. According to the age groups and years of work on the base of previous experience ≥50 ages and ≥21 years of work data prevalence respectively. The obtained data revealed increased morbidity rate according to age and years of work. It was found that the bone and articulate system and connective tissue diseases, aggravation of chronic respiratory diseases, ischemic heart diseases, hypertension and cerebral blood discirculation were the leading among the other diseases. High prevalence of morbidity observed in the workplace with not satisfactory labor conditions from the hygienic point of view. Conclusion: According to received data the causes of morbidity are the followings: unsafety labor conditions; incomplete of preventive medical examinations (preliminary and periodic); lack of access to appropriate health care services; derangement of gathering, recording, and analysis of morbidity data. This epidemiological study was conducted at the JSC “Manganese Ferro Alloy Plant” according to State program “ Prevention of Occupational Diseases” (Program code is 35 03 02 05).Keywords: occupational health, mining process, morbidity level, cerebral blood discirculation
Procedia PDF Downloads 42825298 Data-driven Decision-Making in Digital Entrepreneurship
Authors: Abeba Nigussie Turi, Xiangming Samuel Li
Abstract:
Data-driven business models are more typical for established businesses than early-stage startups that strive to penetrate a market. This paper provided an extensive discussion on the principles of data analytics for early-stage digital entrepreneurial businesses. Here, we developed data-driven decision-making (DDDM) framework that applies to startups prone to multifaceted barriers in the form of poor data access, technical and financial constraints, to state some. The startup DDDM framework proposed in this paper is novel in its form encompassing startup data analytics enablers and metrics aligning with startups' business models ranging from customer-centric product development to servitization which is the future of modern digital entrepreneurship.Keywords: startup data analytics, data-driven decision-making, data acquisition, data generation, digital entrepreneurship
Procedia PDF Downloads 32925297 An Unified Model for Longshore Sediment Transport Rate Estimation
Authors: Aleksandra Dudkowska, Gabriela Gic-Grusza
Abstract:
Wind wave-induced sediment transport is an important multidimensional and multiscale dynamic process affecting coastal seabed changes and coastline evolution. The knowledge about sediment transport rate is important to solve many environmental and geotechnical issues. There are many types of sediment transport models but none of them is widely accepted. It is bacause the process is not fully defined. Another problem is a lack of sufficient measurment data to verify proposed hypothesis. There are different types of models for longshore sediment transport (LST, which is discussed in this work) and cross-shore transport which is related to different time and space scales of the processes. There are models describing bed-load transport (discussed in this work), suspended and total sediment transport. LST models use among the others the information about (i) the flow velocity near the bottom, which in case of wave-currents interaction in coastal zone is a separate problem (ii) critical bed shear stress that strongly depends on the type of sediment and complicates in the case of heterogeneous sediment. Moreover, LST rate is strongly dependant on the local environmental conditions. To organize existing knowledge a series of sediment transport models intercomparisons was carried out as a part of the project “Development of a predictive model of morphodynamic changes in the coastal zone”. Four classical one-grid-point models were studied and intercompared over wide range of bottom shear stress conditions, corresponding with wind-waves conditions appropriate for coastal zone in polish marine areas. The set of models comprises classical theories that assume simplified influence of turbulence on the sediment transport (Du Boys, Meyer-Peter & Muller, Ribberink, Engelund & Hansen). It turned out that the values of estimated longshore instantaneous mass sediment transport are in general in agreement with earlier studies and measurements conducted in the area of interest. However, none of the formulas really stands out from the rest as being particularly suitable for the test location over the whole analyzed flow velocity range. Therefore, based on the models discussed a new unified formula for longshore sediment transport rate estimation is introduced, which constitutes the main original result of this study. Sediment transport rate is calculated based on the bed shear stress and critical bed shear stress. The dependence of environmental conditions is expressed by one coefficient (in a form of constant or function) thus the model presented can be quite easily adjusted to the local conditions. The discussion of the importance of each model parameter for specific velocity ranges is carried out. Moreover, it is shown that the value of near-bottom flow velocity is the main determinant of longshore bed-load in storm conditions. Thus, the accuracy of the results depends less on the sediment transport model itself and more on the appropriate modeling of the near-bottom velocities.Keywords: bedload transport, longshore sediment transport, sediment transport models, coastal zone
Procedia PDF Downloads 38725296 Sensitivity and Specificity of Some Serological Tests Used for Diagnosis of Bovine Brucellosis in Egypt on Bacteriological and Molecular Basis
Authors: Hosein I. Hosein, Ragab Azzam, Ahmed M. S. Menshawy, Sherin Rouby, Khaled Hendy, Ayman Mahrous, Hany Hussien
Abstract:
Brucellosis is a highly contagious bacterial zoonotic disease of a worldwide spread and has different names; Infectious or enzootic abortion and Bang's disease in animals; and Mediterranean or Malta fever, Undulant Fever and Rock fever in humans. It is caused by the different species of genus Brucella which is a Gram-negative, aerobic, non-spore forming, facultative intracellular bacterium. Brucella affects a wide range of mammals including bovines, small ruminants, pigs, equines, rodents, marine mammals as well as human resulting in serious economic losses in animal populations. In human, Brucella causes a severe illness representing a great public health problem. The disease was reported in Egypt for the first time in 1939; since then the disease remained endemic at high levels among cattle, buffalo, sheep and goat and is still representing a public health hazard. The annual economic losses due to brucellosis were estimated to be about 60 million Egyptian pounds yearly, but actual estimates are still missing despite almost 30 years of implementation of the Egyptian control programme. Despite being the gold standard, bacterial isolation has been reported to show poor sensitivity for samples with low-level of Brucella and is impractical for regular screening of large populations. Thus, serological tests still remain the corner stone for routine diagnosis of brucellosis, especially in developing countries. In the present study, a total of 1533 cows (256 from Beni-Suef Governorate, 445 from Al-Fayoum Governorate and 832 from Damietta Governorate), were employed for estimation of relative sensitivity, relative specificity, positive predictive value and negative predictive value of buffered acidified plate antigen test (BPAT), rose bengal test (RBT) and complement fixation test (CFT). The overall seroprevalence of brucellosis revealed (19.63%). Relative sensitivity, relative specificity, positive predictive value and negative predictive value of BPAT,RBT and CFT were estimated as, (96.27 %, 96.76 %, 87.65 % and 99.10 %), (93.42 %, 96.27 %, 90.16 % and 98.35%) and (89.30 %, 98.60 %, 94.35 %and 97.24 %) respectively. BPAT showed the highest sensitivity among the three employed serological tests. RBT was less specific than BPAT. CFT showed the least sensitivity 89.30 % among the three employed serological tests but showed the highest specificity. Different tissues specimens of 22 seropositive cows (spleen, retropharyngeal udder, and supra-mammary lymph nodes) were subjected for bacteriological studies for isolation and identification of Brucella organisms. Brucella melitensis biovar 3 could be recovered from 12 (54.55%) cows. Bacteriological examinations failed to classify 10 cases (45.45%) and were culture negative. Bruce-ladder PCR was carried out for molecular identification of the 12 Brucella isolates at the species level. Three fragments of 587 bp, 1071 bp and 1682 bp sizes were amplified indicating Brucella melitensis. The results indicated the importance of using several procedures to overcome the problem of escaping of some infected animals from diagnosis.Bruce-ladder PCR is an important tool for diagnosis and epidemiologic studies, providing relevant information for identification of Brucella spp.Keywords: brucellosis, relative sensitivity, relative specificity, Bruce-ladder, Egypt
Procedia PDF Downloads 35625295 Ribotaxa: Combined Approaches for Taxonomic Resolution Down to the Species Level from Metagenomics Data Revealing Novelties
Authors: Oshma Chakoory, Sophie Comtet-Marre, Pierre Peyret
Abstract:
Metagenomic classifiers are widely used for the taxonomic profiling of metagenomic data and estimation of taxa relative abundance. Small subunit rRNA genes are nowadays a gold standard for the phylogenetic resolution of complex microbial communities, although the power of this marker comes down to its use as full-length. We benchmarked the performance and accuracy of rRNA-specialized versus general-purpose read mappers, reference-targeted assemblers and taxonomic classifiers. We then built a pipeline called RiboTaxa to generate a highly sensitive and specific metataxonomic approach. Using metagenomics data, RiboTaxa gave the best results compared to other tools (Kraken2, Centrifuge (1), METAXA2 (2), PhyloFlash (3)) with precise taxonomic identification and relative abundance description, giving no false positive detection. Using real datasets from various environments (ocean, soil, human gut) and from different approaches (metagenomics and gene capture by hybridization), RiboTaxa revealed microbial novelties not seen by current bioinformatics analysis opening new biological perspectives in human and environmental health. In a study focused on corals’ health involving 20 metagenomic samples (4), an affiliation of prokaryotes was limited to the family level with Endozoicomonadaceae characterising healthy octocoral tissue. RiboTaxa highlighted 2 species of uncultured Endozoicomonas which were dominant in the healthy tissue. Both species belonged to a genus not yet described, opening new research perspectives on corals’ health. Applied to metagenomics data from a study on human gut and extreme longevity (5), RiboTaxa detected the presence of an uncultured archaeon in semi-supercentenarians (aged 105 to 109 years) highlighting an archaeal genus, not yet described, and 3 uncultured species belonging to the Enorma genus that could be species of interest participating in the longevity process. RiboTaxa is user-friendly, rapid, allowing microbiota structure description from any environment and the results can be easily interpreted. This software is freely available at https://github.com/oschakoory/RiboTaxa under the GNU Affero General Public License 3.0.Keywords: metagenomics profiling, microbial diversity, SSU rRNA genes, full-length phylogenetic marker
Procedia PDF Downloads 12325294 Fuzzy Multi-Objective Approach for Emergency Location Transportation Problem
Authors: Bidzina Matsaberidze, Anna Sikharulidze, Gia Sirbiladze, Bezhan Ghvaberidze
Abstract:
In the modern world emergency management decision support systems are actively used by state organizations, which are interested in extreme and abnormal processes and provide optimal and safe management of supply needed for the civil and military facilities in geographical areas, affected by disasters, earthquakes, fires and other accidents, weapons of mass destruction, terrorist attacks, etc. Obviously, these kinds of extreme events cause significant losses and damages to the infrastructure. In such cases, usage of intelligent support technologies is very important for quick and optimal location-transportation of emergency service in order to avoid new losses caused by these events. Timely servicing from emergency service centers to the affected disaster regions (response phase) is a key task of the emergency management system. Scientific research of this field takes the important place in decision-making problems. Our goal was to create an expert knowledge-based intelligent support system, which will serve as an assistant tool to provide optimal solutions for the above-mentioned problem. The inputs to the mathematical model of the system are objective data, as well as expert evaluations. The outputs of the system are solutions for Fuzzy Multi-Objective Emergency Location-Transportation Problem (FMOELTP) for disasters’ regions. The development and testing of the Intelligent Support System were done on the example of an experimental disaster region (for some geographical zone of Georgia) which was generated using a simulation modeling. Four objectives are considered in our model. The first objective is to minimize an expectation of total transportation duration of needed products. The second objective is to minimize the total selection unreliability index of opened humanitarian aid distribution centers (HADCs). The third objective minimizes the number of agents needed to operate the opened HADCs. The fourth objective minimizes the non-covered demand for all demand points. Possibility chance constraints and objective constraints were constructed based on objective-subjective data. The FMOELTP was constructed in a static and fuzzy environment since the decisions to be made are taken immediately after the disaster (during few hours) with the information available at that moment. It is assumed that the requests for products are estimated by homeland security organizations, or their experts, based upon their experience and their evaluation of the disaster’s seriousness. Estimated transportation times are considered to take into account routing access difficulty of the region and the infrastructure conditions. We propose an epsilon-constraint method for finding the exact solutions for the problem. It is proved that this approach generates the exact Pareto front of the multi-objective location-transportation problem addressed. Sometimes for large dimensions of the problem, the exact method requires long computing times. Thus, we propose an approximate method that imposes a number of stopping criteria on the exact method. For large dimensions of the FMOELTP the Estimation of Distribution Algorithm’s (EDA) approach is developed.Keywords: epsilon-constraint method, estimation of distribution algorithm, fuzzy multi-objective combinatorial programming problem, fuzzy multi-objective emergency location/transportation problem
Procedia PDF Downloads 32225293 Climate Adaptability of Vernacular Courtyards in Jiangnan Area, Southeast China
Authors: Yu Bingqing
Abstract:
Research on the meteorological observation data of conventional meteorological stations in Jiangnan area from 2001 to 2020 and digital elevation DEM, the "golden section" comfort index calculation method was used to refine the spatial estimation of climate comfort in Jiangnan area under undulating terrain on the Gis platform, and its spatiotemporal distribution characteristics in the region were analyzed. The results can provide reference for the development and utilization of climate resources in Jiangnan area.The results show that: ① there is a significant spatial difference between winter and summer climate comfort from low latitude to high latitude. ②There is a significant trend of decreasing climate comfort from low altitude to high altitude in winter, but the opposite is true in summer. ③There is a trend of decreasing climate comfort from offshore to inland in winter, but the difference is not significant in summer. The climate comfort level in the natural lake area is higher in summer than in the surrounding areas, but not in winter. ⑤ In winter and summer, altitude has the greatest influence on the difference in comfort level.Keywords: vernacular courtyards, thermal environment, depth-to-height ratio, climate adaptability,Southeast China
Procedia PDF Downloads 59