Search results for: maximal data sets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25882

Search results for: maximal data sets

25162 Effect of the Aluminum Fraction “X” on the Laser Wavelengths in GaAs/AlxGa1-xAs Superlattices

Authors: F.Bendahma, S.Bentata

Abstract:

In this paper, we study numerically the eigenstates existing in a GaAs/AlxGa1-xAs superlattice with structural disorder in trimer height barrier (THB). Aluminium concentration x takes at random two different values, one of them appears only in triply and remains inferior to the second in the studied structure. In spite of the presence of disorder, the system exhibits two kinds of sets of propagating states lying below the barrier due to the characteristic structure of the superlattice. This result allows us to note the existence of a single laser emission in trimer and wavelengths are obtained in the mid-infrared.

Keywords: infrared (IR), laser emission, superlattice, trimer

Procedia PDF Downloads 448
25161 A Method of Detecting the Difference in Two States of Brain Using Statistical Analysis of EEG Raw Data

Authors: Digvijaysingh S. Bana, Kiran R. Trivedi

Abstract:

This paper introduces various methods for the alpha wave to detect the difference between two states of brain. One healthy subject participated in the experiment. EEG was measured on the forehead above the eye (FP1 Position) with reference and ground electrode are on the ear clip. The data samples are obtained in the form of EEG raw data. The time duration of reading is of one minute. Various test are being performed on the alpha band EEG raw data.The readings are performed in different time duration of the entire day. The statistical analysis is being carried out on the EEG sample data in the form of various tests.

Keywords: electroencephalogram(EEG), biometrics, authentication, EEG raw data

Procedia PDF Downloads 464
25160 European Food Safety Authority (EFSA) Safety Assessment of Food Additives: Data and Methodology Used for the Assessment of Dietary Exposure for Different European Countries and Population Groups

Authors: Petra Gergelova, Sofia Ioannidou, Davide Arcella, Alexandra Tard, Polly E. Boon, Oliver Lindtner, Christina Tlustos, Jean-Charles Leblanc

Abstract:

Objectives: To assess chronic dietary exposure to food additives in different European countries and population groups. Method and Design: The European Food Safety Authority’s (EFSA) Panel on Food Additives and Nutrient Sources added to Food (ANS) estimates chronic dietary exposure to food additives with the purpose of re-evaluating food additives that were previously authorized in Europe. For this, EFSA uses concentration values (usage and/or analytical occurrence data) reported through regular public calls for data by food industry and European countries. These are combined, at individual level, with national food consumption data from the EFSA Comprehensive European Food Consumption Database including data from 33 dietary surveys from 19 European countries and considering six different population groups (infants, toddlers, children, adolescents, adults and the elderly). EFSA ANS Panel estimates dietary exposure for each individual in the EFSA Comprehensive Database by combining the occurrence levels per food group with their corresponding consumption amount per kg body weight. An individual average exposure per day is calculated, resulting in distributions of individual exposures per survey and population group. Based on these distributions, the average and 95th percentile of exposure is calculated per survey and per population group. Dietary exposure is assessed based on two different sets of data: (a) Maximum permitted levels (MPLs) of use set down in the EU legislation (defined as regulatory maximum level exposure assessment scenario) and (b) usage levels and/or analytical occurrence data (defined as refined exposure assessment scenario). The refined exposure assessment scenario is sub-divided into the brand-loyal consumer scenario and the non-brand-loyal consumer scenario. For the brand-loyal consumer scenario, the consumer is considered to be exposed on long-term basis to the highest reported usage/analytical level for one food group, and at the mean level for the remaining food groups. For the non-brand-loyal consumer scenario, the consumer is considered to be exposed on long-term basis to the mean reported usage/analytical level for all food groups. An additional exposure from sources other than direct addition of food additives (i.e. natural presence, contaminants, and carriers of food additives) is also estimated, as appropriate. Results: Since 2014, this methodology has been applied in about 30 food additive exposure assessments conducted as part of scientific opinions of the EFSA ANS Panel. For example, under the non-brand-loyal scenario, the highest 95th percentile of exposure to α-tocopherol (E 307) and ammonium phosphatides (E 442) was estimated in toddlers up to 5.9 and 8.7 mg/kg body weight/day, respectively. The same estimates under the brand-loyal scenario in toddlers resulted in exposures of 8.1 and 20.7 mg/kg body weight/day, respectively. For the regulatory maximum level exposure assessment scenario, the highest 95th percentile of exposure to α-tocopherol (E 307) and ammonium phosphatides (E 442) was estimated in toddlers up to 11.9 and 30.3 mg/kg body weight/day, respectively. Conclusions: Detailed and up-to-date information on food additive concentration values (usage and/or analytical occurrence data) and food consumption data enable the assessment of chronic dietary exposure to food additives to more realistic levels.

Keywords: α-tocopherol, ammonium phosphatides, dietary exposure assessment, European Food Safety Authority, food additives, food consumption data

Procedia PDF Downloads 325
25159 Estimating Knowledge Flow Patterns of Business Method Patents with a Hidden Markov Model

Authors: Yoonjung An, Yongtae Park

Abstract:

Knowledge flows are a critical source of faster technological progress and stouter economic growth. Knowledge flows have been accelerated dramatically with the establishment of a patent system in which each patent is required by law to disclose sufficient technical information for the invention to be recreated. Patent analysis, thus, has been widely used to help investigate technological knowledge flows. However, the existing research is limited in terms of both subject and approach. Particularly, in most of the previous studies, business method (BM) patents were not covered although they are important drivers of knowledge flows as other patents. In addition, these studies usually focus on the static analysis of knowledge flows. Some use approaches that incorporate the time dimension, yet they still fail to trace a true dynamic process of knowledge flows. Therefore, we investigate dynamic patterns of knowledge flows driven by BM patents using a Hidden Markov Model (HMM). An HMM is a popular statistical tool for modeling a wide range of time series data, with no general theoretical limit in regard to statistical pattern classification. Accordingly, it enables characterizing knowledge patterns that may differ by patent, sector, country and so on. We run the model in sets of backward citations and forward citations to compare the patterns of knowledge utilization and knowledge dissemination.

Keywords: business method patents, dynamic pattern, Hidden-Markov Model, knowledge flow

Procedia PDF Downloads 328
25158 Framework for Integrating Big Data and Thick Data: Understanding Customers Better

Authors: Nikita Valluri, Vatcharaporn Esichaikul

Abstract:

With the popularity of data-driven decision making on the rise, this study focuses on providing an alternative outlook towards the process of decision-making. Combining quantitative and qualitative methods rooted in the social sciences, an integrated framework is presented with a focus on delivering a much more robust and efficient approach towards the concept of data-driven decision-making with respect to not only Big data but also 'Thick data', a new form of qualitative data. In support of this, an example from the retail sector has been illustrated where the framework is put into action to yield insights and leverage business intelligence. An interpretive approach to analyze findings from both kinds of quantitative and qualitative data has been used to glean insights. Using traditional Point-of-sale data as well as an understanding of customer psychographics and preferences, techniques of data mining along with qualitative methods (such as grounded theory, ethnomethodology, etc.) are applied. This study’s final goal is to establish the framework as a basis for providing a holistic solution encompassing both the Big and Thick aspects of any business need. The proposed framework is a modified enhancement in lieu of traditional data-driven decision-making approach, which is mainly dependent on quantitative data for decision-making.

Keywords: big data, customer behavior, customer experience, data mining, qualitative methods, quantitative methods, thick data

Procedia PDF Downloads 162
25157 Managing the Transition from Voluntary to Mandatory Climate Reporting: The Role of Carbon Accounting

Authors: Qingliang Tang

Abstract:

The transition from voluntary to mandatory carbon reporting (also refers to climate reporting) poses serious challenges for accounting professionals aiming to support firms in achieving net-zero goals. The accounting literature addresses the topics that are currently bewildering accounting academics and professional accountants on how to make accounting as a useful tool for the management to achieve a carbon neutral business model. This paper explores the evolving role of carbon accounting within corporate financial reporting systems, emphasizing its integration as a crucial component. Key challenges addressed include data availability, climate risk assessment, defining reporting boundaries, selecting appropriate greenhouse gas (GHG) accounting methodologies, and integrating climate-related events into traditional financial statements. A dynamic, integrated carbon accounting framework is proposed to facilitate this transformative process effectively. Furthermore, the paper identifies critical knowledge gaps and sets forth a research agenda aimed at enhancing transparency and relevance in carbon accounting and reporting systems, thereby empowering informed decision-making. The purpose of the paper is to succinctly capture the essence of carbon accounting practice in the transitional period, focusing on the challenges, proposed solutions, and future research directions in the realm of carbon accounting and mandatory climate reporting.

Keywords: mandatory carbon reporting, carbon management, net zero target, sustainability, climate risks

Procedia PDF Downloads 18
25156 Some Results on the Generalized Higher Rank Numerical Ranges

Authors: Mohsen Zahraei

Abstract:

‎In this paper, ‎the notion of ‎rank-k numerical range of rectangular complex matrix polynomials‎ ‎are introduced. ‎Some algebraic and geometrical properties are investigated. ‎Moreover, ‎for ε>0 the notion of Birkhoff-James approximate orthogonality sets for ε-higher ‎rank numerical ranges of rectangular matrix polynomials is also introduced and studied. ‎The proposed definitions yield a natural generalization of the standard higher rank numerical ranges.

Keywords: ‎‎Rank-k numerical range‎, ‎isometry‎, ‎numerical range‎, ‎rectangular matrix polynomials

Procedia PDF Downloads 459
25155 Approach for Demonstrating Reliability Targets for Rail Transport during Low Mileage Accumulation in the Field: Methodology and Case Study

Authors: Nipun Manirajan, Heeralal Gargama, Sushil Guhe, Manoj Prabhakaran

Abstract:

In railway industry, train sets are designed based on contractual requirements (mission profile), where reliability targets are measured in terms of mean distance between failures (MDBF). However, during the beginning of revenue services, trains do not achieve the designed mission profile distance (mileage) within the timeframe due to infrastructure constraints, scarcity of commuters or other operational challenges thereby not respecting the original design inputs. Since trains do not run sufficiently and do not achieve the designed mileage within the specified time, car builder has a risk of not achieving the contractual MDBF target. This paper proposes a constant failure rate based model to deal with the situations where mileage accumulation is not a part of the design mission profile. The model provides appropriate MDBF target to be demonstrated based on actual accumulated mileage. A case study of rolling stock running in the field is undertaken to analyze the failure data and MDBF target demonstration during low mileage accumulation. The results of case study prove that with the proposed method, reliability targets are achieved under low mileage accumulation.

Keywords: mean distance between failures, mileage-based reliability, reliability target appropriations, rolling stock reliability

Procedia PDF Downloads 266
25154 Incremental Learning of Independent Topic Analysis

Authors: Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda

Abstract:

In this paper, we present a method of applying Independent Topic Analysis (ITA) to increasing the number of document data. The number of document data has been increasing since the spread of the Internet. ITA was presented as one method to analyze the document data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis (ICA). ICA is a technique in the signal processing; however, it is difficult to apply the ITA to increasing number of document data. Because ITA must use the all document data so temporal and spatial cost is very high. Therefore, we present Incremental ITA which extracts the independent topics from increasing number of document data. Incremental ITA is a method of updating the independent topics when the document data is added after extracted the independent topics from a just previous the data. In addition, Incremental ITA updates the independent topics when the document data is added. And we show the result applied Incremental ITA to benchmark datasets.

Keywords: text mining, topic extraction, independent, incremental, independent component analysis

Procedia PDF Downloads 309
25153 Open Data for e-Governance: Case Study of Bangladesh

Authors: Sami Kabir, Sadek Hossain Khoka

Abstract:

Open Government Data (OGD) refers to all data produced by government which are accessible in reusable way by common people with access to Internet and at free of cost. In line with “Digital Bangladesh” vision of Bangladesh government, the concept of open data has been gaining momentum in the country. Opening all government data in digital and customizable format from single platform can enhance e-governance which will make government more transparent to the people. This paper presents a well-in-progress case study on OGD portal by Bangladesh Government in order to link decentralized data. The initiative is intended to facilitate e-service towards citizens through this one-stop web portal. The paper further discusses ways of collecting data in digital format from relevant agencies with a view to making it publicly available through this single point of access. Further, possible layout of this web portal is presented.

Keywords: e-governance, one-stop web portal, open government data, reusable data, web of data

Procedia PDF Downloads 355
25152 Subjective Evaluation of Mathematical Morphology Edge Detection on Computed Tomography (CT) Images

Authors: Emhimed Saffor

Abstract:

In this paper, the problem of edge detection in digital images is considered. Three methods of edge detection based on mathematical morphology algorithm were applied on two sets (Brain and Chest) CT images. 3x3 filter for first method, 5x5 filter for second method and 7x7 filter for third method under MATLAB programming environment. The results of the above-mentioned methods are subjectively evaluated. The results show these methods are more efficient and satiable for medical images, and they can be used for different other applications.

Keywords: CT images, Matlab, medical images, edge detection

Procedia PDF Downloads 336
25151 Parameters of Main Stage of Discharge between Artificial Charged Aerosol Cloud and Ground in Presence of Model Hydrometeor Arrays

Authors: D. S. Zhuravkova, A. G. Temnikov, O. S. Belova, L. L. Chernensky, T. K. Gerastenok, I. Y. Kalugina, N. Y. Lysov, A.V. Orlov

Abstract:

Investigation of the discharges from the artificial charged water aerosol clouds in presence of the arrays of the model hydrometeors could help to receive the new data about the peculiarities of the return stroke formation between the thundercloud and the ground when the large volumes of the hail particles participate in the lightning discharge initiation and propagation stimulation. Artificial charged water aerosol clouds of the negative or positive polarity with the potential up to one million volts have been used. Hail has been simulated by the group of the conductive model hydrometeors of the different form. Parameters of the impulse current of the main stage of the discharge between the artificial positively and negatively charged water aerosol clouds and the ground in presence of the model hydrometeors array and of its corresponding electromagnetic radiation have been determined. It was established that the parameters of the array of the model hydrometeors influence on the parameters of the main stage of the discharge between the artificial thundercloud cell and the ground. The maximal values of the main stage current impulse parameters and the electromagnetic radiation registered by the plate antennas have been found for the array of the model hydrometeors of the cylinder revolution form for the negatively charged aerosol cloud and for the array of the hydrometeors of the plate rhombus form for the positively charged aerosol cloud, correspondingly. It was found that parameters of the main stage of the discharge between the artificial charged water aerosol cloud and the ground in presence of the model hydrometeor array of the different considered forms depend on the polarity of the artificial charged aerosol cloud. In average, for all forms of the investigated model hydrometeors arrays, the values of the amplitude and the current rise of the main stage impulse current and the amplitude of the corresponding electromagnetic radiation for the artificial charged aerosol cloud of the positive polarity were in 1.1-1.9 times higher than for the charged aerosol cloud of the negative polarity. Thus, the received results could indicate to the possible more important role of the big volumes of the large hail arrays in the thundercloud on the parameters of the return stroke for the positive lightning.

Keywords: main stage of discharge, hydrometeor form, lightning parameters, negative and positive artificial charged aerosol cloud

Procedia PDF Downloads 256
25150 Complex Fuzzy Evolution Equation with Nonlocal Conditions

Authors: Abdelati El Allaoui, Said Melliani, Lalla Saadia Chadli

Abstract:

The objective of this paper is to study the existence and uniqueness of Mild solutions for a complex fuzzy evolution equation with nonlocal conditions that accommodates the notion of fuzzy sets defined by complex-valued membership functions. We first propose definition of complex fuzzy strongly continuous semigroups. We then give existence and uniqueness result relevant to the complex fuzzy evolution equation.

Keywords: Complex fuzzy evolution equations, nonlocal conditions, mild solution, complex fuzzy semigroups

Procedia PDF Downloads 281
25149 Resource Framework Descriptors for Interestingness in Data

Authors: C. B. Abhilash, Kavi Mahesh

Abstract:

Human beings are the most advanced species on earth; it's all because of the ability to communicate and share information via human language. In today's world, a huge amount of data is available on the web in text format. This has also resulted in the generation of big data in structured and unstructured formats. In general, the data is in the textual form, which is highly unstructured. To get insights and actionable content from this data, we need to incorporate the concepts of text mining and natural language processing. In our study, we mainly focus on Interesting data through which interesting facts are generated for the knowledge base. The approach is to derive the analytics from the text via the application of natural language processing. Using semantic web Resource framework descriptors (RDF), we generate the triple from the given data and derive the interesting patterns. The methodology also illustrates data integration using the RDF for reliable, interesting patterns.

Keywords: RDF, interestingness, knowledge base, semantic data

Procedia PDF Downloads 162
25148 Data Mining Practices: Practical Studies on the Telecommunication Companies in Jordan

Authors: Dina Ahmad Alkhodary

Abstract:

This study aimed to investigate the practices of Data Mining on the telecommunication companies in Jordan, from the viewpoint of the respondents. In order to achieve the goal of the study, and test the validity of hypotheses, the researcher has designed a questionnaire to collect data from managers and staff members from main department in the researched companies. The results shows improvements stages of the telecommunications companies towered Data Mining.

Keywords: data, mining, development, business

Procedia PDF Downloads 497
25147 Algebraic Characterization of Sheaves over Boolean Spaces

Authors: U. M. Swamy

Abstract:

A compact Hausdorff and totally disconnected topological space are known as Boolean space in view of the stone duality between Boolean algebras and such topological spaces. A sheaf over X is a triple (S, p, X) where S and X are topological spaces and p is a local homeomorphism of S onto X (that is, for each element s in S, there exist open sets U and G containing s and p(s) in S and X respectively such that the restriction of p to U is a homeomorphism of U onto G). Here we mainly concern on sheaves over Boolean spaces. From a given sheaf over a Boolean space, we obtain an algebraic structure in such a way that there is a one-to-one correspondence between these algebraic structures and sheaves over Boolean spaces.

Keywords: Boolean algebra, Boolean space, sheaf, stone duality

Procedia PDF Downloads 349
25146 The Impact of System and Data Quality on Organizational Success in the Kingdom of Bahrain

Authors: Amal M. Alrayes

Abstract:

Data and system quality play a central role in organizational success, and the quality of any existing information system has a major influence on the effectiveness of overall system performance.Given the importance of system and data quality to an organization, it is relevant to highlight their importance on organizational performance in the Kingdom of Bahrain. This research aims to discover whether system quality and data quality are related, and to study the impact of system and data quality on organizational success. A theoretical model based on previous research is used to show the relationship between data and system quality, and organizational impact. We hypothesize, first, that system quality is positively associated with organizational impact, secondly that system quality is positively associated with data quality, and finally that data quality is positively associated with organizational impact. A questionnaire was conducted among public and private organizations in the Kingdom of Bahrain. The results show that there is a strong association between data and system quality, that affects organizational success.

Keywords: data quality, performance, system quality, Kingdom of Bahrain

Procedia PDF Downloads 493
25145 Cloud Computing in Data Mining: A Technical Survey

Authors: Ghaemi Reza, Abdollahi Hamid, Dashti Elham

Abstract:

Cloud computing poses a diversity of challenges in data mining operation arising out of the dynamic structure of data distribution as against the use of typical database scenarios in conventional architecture. Due to immense number of users seeking data on daily basis, there is a serious security concerns to cloud providers as well as data providers who put their data on the cloud computing environment. Big data analytics use compute intensive data mining algorithms (Hidden markov, MapReduce parallel programming, Mahot Project, Hadoop distributed file system, K-Means and KMediod, Apriori) that require efficient high performance processors to produce timely results. Data mining algorithms to solve or optimize the model parameters. The challenges that operation has to encounter is the successful transactions to be established with the existing virtual machine environment and the databases to be kept under the control. Several factors have led to the distributed data mining from normal or centralized mining. The approach is as a SaaS which uses multi-agent systems for implementing the different tasks of system. There are still some problems of data mining based on cloud computing, including design and selection of data mining algorithms.

Keywords: cloud computing, data mining, computing models, cloud services

Procedia PDF Downloads 479
25144 Kernel-Based Double Nearest Proportion Feature Extraction for Hyperspectral Image Classification

Authors: Hung-Sheng Lin, Cheng-Hsuan Li

Abstract:

Over the past few years, kernel-based algorithms have been widely used to extend some linear feature extraction methods such as principal component analysis (PCA), linear discriminate analysis (LDA), and nonparametric weighted feature extraction (NWFE) to their nonlinear versions, kernel principal component analysis (KPCA), generalized discriminate analysis (GDA), and kernel nonparametric weighted feature extraction (KNWFE), respectively. These nonlinear feature extraction methods can detect nonlinear directions with the largest nonlinear variance or the largest class separability based on the given kernel function. Moreover, they have been applied to improve the target detection or the image classification of hyperspectral images. The double nearest proportion feature extraction (DNP) can effectively reduce the overlap effect and have good performance in hyperspectral image classification. The DNP structure is an extension of the k-nearest neighbor technique. For each sample, there are two corresponding nearest proportions of samples, the self-class nearest proportion and the other-class nearest proportion. The term “nearest proportion” used here consider both the local information and other more global information. With these settings, the effect of the overlap between the sample distributions can be reduced. Usually, the maximum likelihood estimator and the related unbiased estimator are not ideal estimators in high dimensional inference problems, particularly in small data-size situation. Hence, an improved estimator by shrinkage estimation (regularization) is proposed. Based on the DNP structure, LDA is included as a special case. In this paper, the kernel method is applied to extend DNP to kernel-based DNP (KDNP). In addition to the advantages of DNP, KDNP surpasses DNP in the experimental results. According to the experiments on the real hyperspectral image data sets, the classification performance of KDNP is better than that of PCA, LDA, NWFE, and their kernel versions, KPCA, GDA, and KNWFE.

Keywords: feature extraction, kernel method, double nearest proportion feature extraction, kernel double nearest feature extraction

Procedia PDF Downloads 344
25143 Cross-border Data Transfers to and from South Africa

Authors: Amy Gooden, Meshandren Naidoo

Abstract:

Genetic research and transfers of big data are not confined to a particular jurisdiction, but there is a lack of clarity regarding the legal requirements for importing and exporting such data. Using direct-to-consumer genetic testing (DTC-GT) as an example, this research assesses the status of data sharing into and out of South Africa (SA). While SA laws cover the sending of genetic data out of SA, prohibiting such transfer unless a legal ground exists, the position where genetic data comes into the country depends on the laws of the country from where it is sent – making the legal position less clear.

Keywords: cross-border, data, genetic testing, law, regulation, research, sharing, South Africa

Procedia PDF Downloads 125
25142 The Study of Security Techniques on Information System for Decision Making

Authors: Tejinder Singh

Abstract:

Information system is the flow of data from different levels to different directions for decision making and data operations in information system (IS). Data can be violated by different manner like manual or technical errors, data tampering or loss of integrity. Security system called firewall of IS is effected by such type of violations. The flow of data among various levels of Information System is done by networking system. The flow of data on network is in form of packets or frames. To protect these packets from unauthorized access, virus attacks, and to maintain the integrity level, network security is an important factor. To protect the data to get pirated, various security techniques are used. This paper represents the various security techniques and signifies different harmful attacks with the help of detailed data analysis. This paper will be beneficial for the organizations to make the system more secure, effective, and beneficial for future decisions making.

Keywords: information systems, data integrity, TCP/IP network, vulnerability, decision, data

Procedia PDF Downloads 307
25141 Data Integration with Geographic Information System Tools for Rural Environmental Monitoring

Authors: Tamas Jancso, Andrea Podor, Eva Nagyne Hajnal, Peter Udvardy, Gabor Nagy, Attila Varga, Meng Qingyan

Abstract:

The paper deals with the conditions and circumstances of integration of remotely sensed data for rural environmental monitoring purposes. The main task is to make decisions during the integration process when we have data sources with different resolution, location, spectral channels, and dimension. In order to have exact knowledge about the integration and data fusion possibilities, it is necessary to know the properties (metadata) that characterize the data. The paper explains the joining of these data sources using their attribute data through a sample project. The resulted product will be used for rural environmental analysis.

Keywords: remote sensing, GIS, metadata, integration, environmental analysis

Procedia PDF Downloads 120
25140 Comparison of Multivariate Adaptive Regression Splines and Random Forest Regression in Predicting Forced Expiratory Volume in One Second

Authors: P. V. Pramila , V. Mahesh

Abstract:

Pulmonary Function Tests are important non-invasive diagnostic tests to assess respiratory impairments and provides quantifiable measures of lung function. Spirometry is the most frequently used measure of lung function and plays an essential role in the diagnosis and management of pulmonary diseases. However, the test requires considerable patient effort and cooperation, markedly related to the age of patients esulting in incomplete data sets. This paper presents, a nonlinear model built using Multivariate adaptive regression splines and Random forest regression model to predict the missing spirometric features. Random forest based feature selection is used to enhance both the generalization capability and the model interpretability. In the present study, flow-volume data are recorded for N= 198 subjects. The ranked order of feature importance index calculated by the random forests model shows that the spirometric features FVC, FEF 25, PEF,FEF 25-75, FEF50, and the demographic parameter height are the important descriptors. A comparison of performance assessment of both models prove that, the prediction ability of MARS with the `top two ranked features namely the FVC and FEF 25 is higher, yielding a model fit of R2= 0.96 and R2= 0.99 for normal and abnormal subjects. The Root Mean Square Error analysis of the RF model and the MARS model also shows that the latter is capable of predicting the missing values of FEV1 with a notably lower error value of 0.0191 (normal subjects) and 0.0106 (abnormal subjects). It is concluded that combining feature selection with a prediction model provides a minimum subset of predominant features to train the model, yielding better prediction performance. This analysis can assist clinicians with a intelligence support system in the medical diagnosis and improvement of clinical care.

Keywords: FEV, multivariate adaptive regression splines pulmonary function test, random forest

Procedia PDF Downloads 310
25139 Analysis of Genomics Big Data in Cloud Computing Using Fuzzy Logic

Authors: Mohammad Vahed, Ana Sadeghitohidi, Majid Vahed, Hiroki Takahashi

Abstract:

In the genomics field, the huge amounts of data have produced by the next-generation sequencers (NGS). Data volumes are very rapidly growing, as it is postulated that more than one billion bases will be produced per year in 2020. The growth rate of produced data is much faster than Moore's law in computer technology. This makes it more difficult to deal with genomics data, such as storing data, searching information, and finding the hidden information. It is required to develop the analysis platform for genomics big data. Cloud computing newly developed enables us to deal with big data more efficiently. Hadoop is one of the frameworks distributed computing and relies upon the core of a Big Data as a Service (BDaaS). Although many services have adopted this technology, e.g. amazon, there are a few applications in the biology field. Here, we propose a new algorithm to more efficiently deal with the genomics big data, e.g. sequencing data. Our algorithm consists of two parts: First is that BDaaS is applied for handling the data more efficiently. Second is that the hybrid method of MapReduce and Fuzzy logic is applied for data processing. This step can be parallelized in implementation. Our algorithm has great potential in computational analysis of genomics big data, e.g. de novo genome assembly and sequence similarity search. We will discuss our algorithm and its feasibility.

Keywords: big data, fuzzy logic, MapReduce, Hadoop, cloud computing

Procedia PDF Downloads 299
25138 Forthcoming Big Data on Smart Buildings and Cities: An Experimental Study on Correlations among Urban Data

Authors: Yu-Mi Song, Sung-Ah Kim, Dongyoun Shin

Abstract:

Cities are complex systems of diverse and inter-tangled activities. These activities and their complex interrelationships create diverse urban phenomena. And such urban phenomena have considerable influences on the lives of citizens. This research aimed to develop a method to reveal the causes and effects among diverse urban elements in order to enable better understanding of urban activities and, therefrom, to make better urban planning strategies. Specifically, this study was conducted to solve a data-recommendation problem found on a Korean public data homepage. First, a correlation analysis was conducted to find the correlations among random urban data. Then, based on the results of that correlation analysis, the weighted data network of each urban data was provided to people. It is expected that the weights of urban data thereby obtained will provide us with insights into cities and show us how diverse urban activities influence each other and induce feedback.

Keywords: big data, machine learning, ontology model, urban data model

Procedia PDF Downloads 418
25137 Regional Dynamics of Innovation and Entrepreneurship in the Optics and Photonics Industry

Authors: Mustafa İlhan Akbaş, Özlem Garibay, Ivan Garibay

Abstract:

The economic entities in innovation ecosystems form various industry clusters, in which they compete and cooperate to survive and grow. Within a successful and stable industry cluster, the entities acquire different roles that complement each other in the system. The universities and research centers have been accepted to have a critical role in these systems for the creation and development of innovations. However, the real effect of research institutions on regional economic growth is difficult to assess. In this paper, we present our approach for the identification of the impact of research activities on the regional entrepreneurship for a specific high-tech industry: optics and photonics. The optics and photonics has been defined as an enabling industry, which combines the high-tech photonics technology with the developing optics industry. The recent literature suggests that the growth of optics and photonics firms depends on three important factors: the embedded regional specializations in the labor market, the research and development infrastructure, and a dynamic small firm network capable of absorbing new technologies, products and processes. Therefore, the role of each factor and the dynamics among them must be understood to identify the requirements of the entrepreneurship activities in optics and photonics industry. There are three main contributions of our approach. The recent studies show that the innovation in optics and photonics industry is mostly located around metropolitan areas. There are also studies mentioning the importance of research center locations and universities in the regional development of optics and photonics industry. These studies are mostly limited with the number of patents received within a short period of time or some limited survey results. Therefore the first contribution of our approach is conducting a comprehensive analysis for the state and recent history of the photonics and optics research in the US. For this purpose, both the research centers specialized in optics and photonics and the related research groups in various departments of institutions (e.g. Electrical Engineering, Materials Science) are identified and a geographical study of their locations is presented. The second contribution of the paper is the analysis of regional entrepreneurship activities in optics and photonics in recent years. We use the membership data of the International Society for Optics and Photonics (SPIE) and the regional photonics clusters to identify the optics and photonics companies in the US. Then the profiles and activities of these companies are gathered by extracting and integrating the related data from the National Establishment Time Series (NETS) database, ES-202 database and the data sets from the regional photonics clusters. The number of start-ups, their employee numbers and sales are some examples of the extracted data for the industry. Our third contribution is the utilization of collected data to investigate the impact of research institutions on the regional optics and photonics industry growth and entrepreneurship. In this analysis, the regional and periodical conditions of the overall market are taken into consideration while discovering and quantifying the statistical correlations.

Keywords: entrepreneurship, industrial clusters, optics, photonics, emerging industries, research centers

Procedia PDF Downloads 406
25136 Data-driven Decision-Making in Digital Entrepreneurship

Authors: Abeba Nigussie Turi, Xiangming Samuel Li

Abstract:

Data-driven business models are more typical for established businesses than early-stage startups that strive to penetrate a market. This paper provided an extensive discussion on the principles of data analytics for early-stage digital entrepreneurial businesses. Here, we developed data-driven decision-making (DDDM) framework that applies to startups prone to multifaceted barriers in the form of poor data access, technical and financial constraints, to state some. The startup DDDM framework proposed in this paper is novel in its form encompassing startup data analytics enablers and metrics aligning with startups' business models ranging from customer-centric product development to servitization which is the future of modern digital entrepreneurship.

Keywords: startup data analytics, data-driven decision-making, data acquisition, data generation, digital entrepreneurship

Procedia PDF Downloads 328
25135 Brain Tumor Segmentation Based on Minimum Spanning Tree

Authors: Simeon Mayala, Ida Herdlevær, Jonas Bull Haugsøen, Shamundeeswari Anandan, Sonia Gavasso, Morten Brun

Abstract:

In this paper, we propose a minimum spanning tree-based method for segmenting brain tumors. The proposed method performs interactive segmentation based on the minimum spanning tree without tuning parameters. The steps involve preprocessing, making a graph, constructing a minimum spanning tree, and a newly implemented way of interactively segmenting the region of interest. In the preprocessing step, a Gaussian filter is applied to 2D images to remove the noise. Then, the pixel neighbor graph is weighted by intensity differences and the corresponding minimum spanning tree is constructed. The image is loaded in an interactive window for segmenting the tumor. The region of interest and the background are selected by clicking to split the minimum spanning tree into two trees. One of these trees represents the region of interest and the other represents the background. Finally, the segmentation given by the two trees is visualized. The proposed method was tested by segmenting two different 2D brain T1-weighted magnetic resonance image data sets. The comparison between our results and the standard gold segmentation confirmed the validity of the minimum spanning tree approach. The proposed method is simple to implement and the results indicate that it is accurate and efficient.

Keywords: brain tumor, brain tumor segmentation, minimum spanning tree, segmentation, image processing

Procedia PDF Downloads 122
25134 Genomic Prediction Reliability Using Haplotypes Defined by Different Methods

Authors: Sohyoung Won, Heebal Kim, Dajeong Lim

Abstract:

Genomic prediction is an effective way to measure the abilities of livestock for breeding based on genomic estimated breeding values, statistically predicted values from genotype data using best linear unbiased prediction (BLUP). Using haplotypes, clusters of linked single nucleotide polymorphisms (SNPs), as markers instead of individual SNPs can improve the reliability of genomic prediction since the probability of a quantitative trait loci to be in strong linkage disequilibrium (LD) with markers is higher. To efficiently use haplotypes in genomic prediction, finding optimal ways to define haplotypes is needed. In this study, 770K SNP chip data was collected from Hanwoo (Korean cattle) population consisted of 2506 cattle. Haplotypes were first defined in three different ways using 770K SNP chip data: haplotypes were defined based on 1) length of haplotypes (bp), 2) the number of SNPs, and 3) k-medoids clustering by LD. To compare the methods in parallel, haplotypes defined by all methods were set to have comparable sizes; in each method, haplotypes defined to have an average number of 5, 10, 20 or 50 SNPs were tested respectively. A modified GBLUP method using haplotype alleles as predictor variables was implemented for testing the prediction reliability of each haplotype set. Also, conventional genomic BLUP (GBLUP) method, which uses individual SNPs were tested to evaluate the performance of the haplotype sets on genomic prediction. Carcass weight was used as the phenotype for testing. As a result, using haplotypes defined by all three methods showed increased reliability compared to conventional GBLUP. There were not many differences in the reliability between different haplotype defining methods. The reliability of genomic prediction was highest when the average number of SNPs per haplotype was 20 in all three methods, implying that haplotypes including around 20 SNPs can be optimal to use as markers for genomic prediction. When the number of alleles generated by each haplotype defining methods was compared, clustering by LD generated the least number of alleles. Using haplotype alleles for genomic prediction showed better performance, suggesting improved accuracy in genomic selection. The number of predictor variables was decreased when the LD-based method was used while all three haplotype defining methods showed similar performances. This suggests that defining haplotypes based on LD can reduce computational costs and allows efficient prediction. Finding optimal ways to define haplotypes and using the haplotype alleles as markers can provide improved performance and efficiency in genomic prediction.

Keywords: best linear unbiased predictor, genomic prediction, haplotype, linkage disequilibrium

Procedia PDF Downloads 141
25133 Relationship of Silent Myocardial Ischemia to Erectile Dysfunction in Patients with Diabetes Mellitus

Authors: Ali Kassem, Esam Nada, Amro Abdelhamed, Shigeo Horie

Abstract:

Objective: Diabetes mellitus (DM) is associated with macrovascular complications, including coronary artery disease (CAD), and microvascular complications that contribute to the pathogenesis of erectile dysfunction (ED). On the other hand, silent myocardial ischemia (SMI) is more common in diabetic patients and is a strong predictor of cardiac events and mortality in diabetic and non-diabetic patients. Recently, Multidetector computed tomographic coronary angiography (MDCT-CA) has become a reliable non-invasive imaging modality for screening diabetic patients for SMI. We aim to evaluate the presence of SMI using (MDCT-CA) in patients with type 2DM having ED. Methods: This study evaluated 20 patients (mean age 61.45 ± 10.7 years), with DM and ED without any history of angina or angina equivalent. ED was tested with the Sexual Health Inventory for Men score, erection hardness score (EHS), and maximal penile circumferential change by an erect meter. Results: Of twenty studied patients, coronary artery stenosis was detected in 13 (65%) patients in the form of one-vessel disease (n = 6, 30%), two-vessel disease (n = 2, 10%), and three-vessel disease (n = 5, 25%). Maximum coronary artery stenosis was positively correlated with age (P < 0.016,) and negatively correlated with EHS (P <04). Multivariate regression analysis using age and EHS showed that age was the only independent predictor of SMI (P <04). Conclusion: MDCT-CA is a useful tool to identify SMI in patients with diabetes mellitus and ED. One should consider the possibility of SMI especially in elderly patients with DM who have ED.

Keywords: diabetes mellitus, erectile dysfunction, microvascular, silent ischemia

Procedia PDF Downloads 172