Search results for: data labelling
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 24200

Search results for: data labelling

24170 U11 Functionalised Luminescent Gold Nanoclusters for Pancreatic Tumor Cells Labelling

Authors: Regina M. Chiechio, Rémi Leguevél, Helene Solhi, Marie Madeleine Gueguen, Stephanie Dutertre, Xavier, Jean-Pierre Bazureau, Olivier Mignen, Pascale Even-Hernandez, Paolo Musumeci, Maria Jose Lo Faro, Valerie Marchi

Abstract:

Thanks to their ultra-small size, high electron density, and low toxicity, gold nanoclusters (Au NCs) have unique photoelectrochemical and luminescence properties that make them very interesting for diagnosis bio-imaging and theranostics. These applications require control of their delivery and interaction with cells; for this reason, the surface chemistry of Au NCs is essential to determine their interaction with the targeted biological objects. Here we demonstrate their ability as markers of pancreatic tumor cells. By functionalizing the surface of the NCs with a recognition peptite (U11), the nanostructures are able to preferentially bind to pancreatic cancer cells via a receptor (uPAR) overexpressed by these cells. Furthermore, the NCs can mark even the nucleus without the need of fixing the cells. These nanostructures can therefore be used as a non-toxic, multivalent luminescent platform, capable of selectively recognizing tumor cells for bioimaging, drug delivery, and radiosensitization.

Keywords: gold nanoclusters, luminescence, biomarkers, pancreatic cancer, biomedical applications, bioimaging, fluorescent probes, drug delivery

Procedia PDF Downloads 119
24169 Evaluation of Labelling Conditions, Quality Control, and Biodistribution Study of 99mTc- D-Aminolevulinic Acid (5-ALA)

Authors: Kalimullah Khan, Samina Roohi, Mohammad Rafi, Rizwana Zahoor

Abstract:

Labeling of 5-Aminolevulinic acid (5-ALA) with 99 mTc was achieved by using tin chloride dihydrate (Sncl2.2H2O) as reducing agent. Radiochemical purity and labeling efficiency was determined by Whattman paper No.3 and instant thin layer chromatographic strips impregnated with silica gel (ITLC/SG). Labeling efficiency was dependent on many parameters such as amount of ligand, reducing agent, pH, and incubation time. Therefore, optimum conditions for maximum labeling were selected. Stability of 99 mTc- 5-ALA was also checked in fresh human serum. Tissue bio-distribution of 99 mTc-5-ALA was evaluated in Spargue Dawley rats. 5-ALA was 98% labeled with 99 mTc under optimum conditions, i.e. 100µg of 5-ALA, pH: 4, 10µg of Sncl2.2H2O and 30 minutes incubation at room temperature. 99 mTc labelled 5- ALA remained stable for 24 hours in human serum. Bio-distribution study (%ID/gm) in rats revealed that maximum accumulation of 99 mTc-5-ALA was in liver, spleen, stomach and intestine after half hour, 4 hours, and 24 hours. Significant activity in bladder and urine indicated urinary mode of excretion.

Keywords: 99mTc-ALA, aminolevulinic acid, quality control, radiopharmaceuticals

Procedia PDF Downloads 360
24168 JavaScript Object Notation Data against eXtensible Markup Language Data in Software Applications a Software Testing Approach

Authors: Theertha Chandroth

Abstract:

This paper presents a comparative study on how to check JSON (JavaScript Object Notation) data against XML (eXtensible Markup Language) data from a software testing point of view. JSON and XML are widely used data interchange formats, each with its unique syntax and structure. The objective is to explore various techniques and methodologies for validating comparison and integration between JSON data to XML and vice versa. By understanding the process of checking JSON data against XML data, testers, developers and data practitioners can ensure accurate data representation, seamless data interchange, and effective data validation.

Keywords: XML, JSON, data comparison, integration testing, Python, SQL

Procedia PDF Downloads 90
24167 Multi-Source Data Fusion for Urban Comprehensive Management

Authors: Bolin Hua

Abstract:

In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.

Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data

Procedia PDF Downloads 352
24166 Reviewing Privacy Preserving Distributed Data Mining

Authors: Sajjad Baghernezhad, Saeideh Baghernezhad

Abstract:

Nowadays considering human involved in increasing data development some methods such as data mining to extract science are unavoidable. One of the discussions of data mining is inherent distribution of the data usually the bases creating or receiving such data belong to corporate or non-corporate persons and do not give their information freely to others. Yet there is no guarantee to enable someone to mine special data without entering in the owner’s privacy. Sending data and then gathering them by each vertical or horizontal software depends on the type of their preserving type and also executed to improve data privacy. In this study it was attempted to compare comprehensively preserving data methods; also general methods such as random data, coding and strong and weak points of each one are examined.

Keywords: data mining, distributed data mining, privacy protection, privacy preserving

Procedia PDF Downloads 490
24165 Information Processing and Visual Attention: An Eye Tracking Study on Nutrition Labels

Authors: Rosa Hendijani, Amir Ghadimi Herfeh

Abstract:

Nutrition labels are diet-related health policies. They help individuals improve food-choice decisions and reduce intake of calories and unhealthy food elements, like cholesterol. However, many individuals do not pay attention to nutrition labels or fail to appropriately understand them. According to the literature, thinking and cognitive styles can have significant effects on attention to nutrition labels. According to the author's knowledge, the effect of global/local processing on attention to nutrition labels have not been previously studied. Global/local processing encourages individuals to attend to the whole/specific parts of an object and can have a significant impact on people's visual attention. In this study, this effect was examined with an experimental design using the eye-tracking technique. The research hypothesis was that individuals with local processing would pay more attention to nutrition labels, including nutrition tables and traffic lights. An experiment was designed with two conditions: global and local information processing. Forty participants were randomly assigned to either global or local conditions, and their processing style was manipulated accordingly. Results supported the hypothesis for nutrition tables but not for traffic lights.

Keywords: eye-tracking, nutrition labelling, global/local information processing, individual differences

Procedia PDF Downloads 127
24164 The Right to Data Portability and Its Influence on the Development of Digital Services

Authors: Roman Bieda

Abstract:

The General Data Protection Regulation (GDPR) will come into force on 25 May 2018 which will create a new legal framework for the protection of personal data in the European Union. Article 20 of GDPR introduces a right to data portability. This right allows for data subjects to receive the personal data which they have provided to a data controller, in a structured, commonly used and machine-readable format, and to transmit this data to another data controller. The right to data portability, by facilitating transferring personal data between IT environments (e.g.: applications), will also facilitate changing the provider of services (e.g. changing a bank or a cloud computing service provider). Therefore, it will contribute to the development of competition and the digital market. The aim of this paper is to discuss the right to data portability and its influence on the development of new digital services.

Keywords: data portability, digital market, GDPR, personal data

Procedia PDF Downloads 444
24163 Detection of Safety Goggles on Humans in Industrial Environment Using Faster-Region Based on Convolutional Neural Network with Rotated Bounding Box

Authors: Ankit Kamboj, Shikha Talwar, Nilesh Powar

Abstract:

To successfully deliver our products in the market, the employees need to be in a safe environment, especially in an industrial and manufacturing environment. The consequences of delinquency in wearing safety glasses while working in industrial plants could be high risk to employees, hence the need to develop a real-time automatic detection system which detects the persons (violators) not wearing safety glasses. In this study a convolutional neural network (CNN) algorithm called faster region based CNN (Faster RCNN) with rotated bounding box has been used for detecting safety glasses on persons; the algorithm has an advantage of detecting safety glasses with different orientation angles on the persons. The proposed method of rotational bounding boxes with a convolutional neural network first detects a person from the images, and then the method detects whether the person is wearing safety glasses or not. The video data is captured at the entrance of restricted zones of the industrial environment (manufacturing plant), which is further converted into images at 2 frames per second. In the first step, the CNN with pre-trained weights on COCO dataset is used for person detection where the detections are cropped as images. Then the safety goggles are labelled on the cropped images using the image labelling tool called roLabelImg, which is used to annotate the ground truth values of rotated objects more accurately, and the annotations obtained are further modified to depict four coordinates of the rectangular bounding box. Next, the faster RCNN with rotated bounding box is used to detect safety goggles, which is then compared with traditional bounding box faster RCNN in terms of detection accuracy (average precision), which shows the effectiveness of the proposed method for detection of rotatory objects. The deep learning benchmarking is done on a Dell workstation with a 16GB Nvidia GPU.

Keywords: CNN, deep learning, faster RCNN, roLabelImg rotated bounding box, safety goggle detection

Procedia PDF Downloads 111
24162 Model-Based Field Extraction from Different Class of Administrative Documents

Authors: Jinen Daghrir, Anis Kricha, Karim Kalti

Abstract:

The amount of incoming administrative documents is massive and manually processing these documents is a costly task especially on the timescale. In fact, this problem has led an important amount of research and development in the context of automatically extracting fields from administrative documents, in order to reduce the charges and to increase the citizen satisfaction in administrations. In this matter, we introduce an administrative document understanding system. Given a document in which a user has to select fields that have to be retrieved from a document class, a document model is automatically built. A document model is represented by an attributed relational graph (ARG) where nodes represent fields to extract, and edges represent the relation between them. Both of vertices and edges are attached with some feature vectors. When another document arrives to the system, the layout objects are extracted and an ARG is generated. The fields extraction is translated into a problem of matching two ARGs which relies mainly on the comparison of the spatial relationships between layout objects. Experimental results yield accuracy rates from 75% to 100% tested on eight document classes. Our proposed method has a good performance knowing that the document model is constructed using only one single document.

Keywords: administrative document understanding, logical labelling, logical layout analysis, fields extraction from administrative documents

Procedia PDF Downloads 185
24161 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 368
24160 How to Use Big Data in Logistics Issues

Authors: Mehmet Akif Aslan, Mehmet Simsek, Eyup Sensoy

Abstract:

Big Data stands for today’s cutting-edge technology. As the technology becomes widespread, so does Data. Utilizing massive data sets enable companies to get competitive advantages over their adversaries. Out of many area of Big Data usage, logistics has significance role in both commercial sector and military. This paper lays out what big data is and how it is used in both military and commercial logistics.

Keywords: big data, logistics, operational efficiency, risk management

Procedia PDF Downloads 611
24159 A Comprehensive Methodology for Voice Segmentation of Large Sets of Speech Files Recorded in Naturalistic Environments

Authors: Ana Londral, Burcu Demiray, Marcus Cheetham

Abstract:

Speech recording is a methodology used in many different studies related to cognitive and behaviour research. Modern advances in digital equipment brought the possibility of continuously recording hours of speech in naturalistic environments and building rich sets of sound files. Speech analysis can then extract from these files multiple features for different scopes of research in Language and Communication. However, tools for analysing a large set of sound files and automatically extract relevant features from these files are often inaccessible to researchers that are not familiar with programming languages. Manual analysis is a common alternative, with a high time and efficiency cost. In the analysis of long sound files, the first step is the voice segmentation, i.e. to detect and label segments containing speech. We present a comprehensive methodology aiming to support researchers on voice segmentation, as the first step for data analysis of a big set of sound files. Praat, an open source software, is suggested as a tool to run a voice detection algorithm, label segments and files and extract other quantitative features on a structure of folders containing a large number of sound files. We present the validation of our methodology with a set of 5000 sound files that were collected in the daily life of a group of voluntary participants with age over 65. A smartphone device was used to collect sound using the Electronically Activated Recorder (EAR): an app programmed to record 30-second sound samples that were randomly distributed throughout the day. Results demonstrated that automatic segmentation and labelling of files containing speech segments was 74% faster when compared to a manual analysis performed with two independent coders. Furthermore, the methodology presented allows manual adjustments of voiced segments with visualisation of the sound signal and the automatic extraction of quantitative information on speech. In conclusion, we propose a comprehensive methodology for voice segmentation, to be used by researchers that have to work with large sets of sound files and are not familiar with programming tools.

Keywords: automatic speech analysis, behavior analysis, naturalistic environments, voice segmentation

Procedia PDF Downloads 258
24158 Pandemic-Era WIC Participation in Delaware, U.S.: Participants' Experiences and Challenges

Authors: McKenna Halverson, Allison Karpyn

Abstract:

Introduction: The COVID-19 pandemic posed unprecedented challenges for families with young children in the United States. The Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), a federal nutrition assistance program that provides low-income mothers and young children with access to healthy foods (e.g., infant formula, milk, and peanut butter), mitigated some financial challenges for families. However, the U.S. experienced a national infant formula shortage and rising inflation rates during the pandemic, which likely impacted WIC participants’ shopping experiences and well-being. As such, this study aimed to characterize how the COVID-19 pandemic and related events impacted Delaware WIC participants’ in-store benefit redemption experiences and overall well-being. Method: The authors conducted semi-structured interviews with 51 WIC participants in Wilmington, Delaware. Survey measures included demographic questions and open-ended questions regarding participants’ experiences with WIC benefit redemption during the COVID-19 pandemic. Data were analyzed using a hybrid inductive and deductive coding approach. Findings: The COVID-19 pandemic significantly impacted WIC participants’ shopping experiences and well-being. Specifically, participants were forced to alter their shopping behaviors to account for rising food prices (e.g., used coupons, bought less food, used food banks). Additionally, WIC participants experienced significant distress during the national infant formula shortage resulting from difficulty finding formula to feed their children. Participants also struggled with in-store benefit redemption due to inconsistencies in shelf labelling, the WIC app, and low stock of WIC foods. These findings highlight the need to reexamine WIC operations and emergency food response policy in the United States during times of crisis to optimize public health and ensure federal nutrition assistance programs meeting the needs of low-income families with young children.

Keywords: benefit redemption, COVID-19 pandemic, infant formula shortage, inflation, shopping, WIC

Procedia PDF Downloads 48
24157 Europium Chelates as a Platform for Biosensing

Authors: Eiman A. Al-Enezi, Gin Jose, Sikha Saha, Paul Millner

Abstract:

Rare earth nanotechnology has gained a considerable amount of interest in the field of biosensing due to the unique luminescence properties of lanthanides. Chelating rare earth ions plays a significant role in biological labelling applications including medical diagnostics, due to their different excitation and emission wavelengths, variety of their spectral properties, sharp emission peaks and long fluorescence lifetimes. We aimed to develop a platform for biosensors based on Europium (Eu³⁺) chelates against biomarkers of cardiac injury (heart-type fatty acid binding protein; H-FABP3) and stroke (glial fibrillary acidic protein; GFAP). Additional novelty in this project is the use of synthetic binding proteins (Affimers), which could offer an excellent alternative targeting strategy to the existing antibodies. Anti-GFAP and anti-HFABP3 Affimer binders were modified to increase the number of carboxy functionalities. Europium nitrate then incubated with the modified Affimer. The luminescence characteristics of the Eu³⁺ complex with modified Affimers and antibodies against anti-GFAP and anti-HFABP3 were measured against different concentrations of the respective analytes on excitation wavelength of 395nm. Bovine serum albumin (BSA) was used as a control against the IgG/Affimer Eu³⁺ complexes. The emission spectrum of Eu³⁺ complex resulted in 5 emission peaks ranging between 550-750 nm with the highest intensity peaks were at 592 and 698 nm. The fluorescence intensity of Eu³⁺ chelates with the modified Affimer or antibodies increased significantly by 4-7 folder compared to the emission spectrum of Eu³⁺ complex. The fluorescence intensity of the Affimer complex was quenched proportionally with increased analyte concentration, but this did not occur with antibody complex. In contrast, the fluorescence intensity for Eu³⁺ complex increased slightly against increased concentration of BSA. These data demonstrate that modified Affimers Eu³⁺ complexes can function as nanobiosensors with potential diagnostic and analytical applications.

Keywords: lanthanides, europium, chelates, biosensors

Procedia PDF Downloads 497
24156 Implementation of an IoT Sensor Data Collection and Analysis Library

Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee

Abstract:

Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.

Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data

Procedia PDF Downloads 347
24155 Government (Big) Data Ecosystem: Definition, Classification of Actors, and Their Roles

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Organizations, including governments, generate (big) data that are high in volume, velocity, veracity, and come from a variety of sources. Public Administrations are using (big) data, implementing base registries, and enforcing data sharing within the entire government to deliver (big) data related integrated services, provision of insights to users, and for good governance. Government (Big) data ecosystem actors represent distinct entities that provide data, consume data, manipulate data to offer paid services, and extend data services like data storage, hosting services to other actors. In this research work, we perform a systematic literature review. The key objectives of this paper are to propose a robust definition of government (big) data ecosystem and a classification of government (big) data ecosystem actors and their roles. We showcase a graphical view of actors, roles, and their relationship in the government (big) data ecosystem. We also discuss our research findings. We did not find too much published research articles about the government (big) data ecosystem, including its definition and classification of actors and their roles. Therefore, we lent ideas for the government (big) data ecosystem from numerous areas that include scientific research data, humanitarian data, open government data, industry data, in the literature.

Keywords: big data, big data ecosystem, classification of big data actors, big data actors roles, definition of government (big) data ecosystem, data-driven government, eGovernment, gaps in data ecosystems, government (big) data, public administration, systematic literature review

Procedia PDF Downloads 132
24154 Government Big Data Ecosystem: A Systematic Literature Review

Authors: Syed Iftikhar Hussain Shah, Vasilis Peristeras, Ioannis Magnisalis

Abstract:

Data that is high in volume, velocity, veracity and comes from a variety of sources is usually generated in all sectors including the government sector. Globally public administrations are pursuing (big) data as new technology and trying to adopt a data-centric architecture for hosting and sharing data. Properly executed, big data and data analytics in the government (big) data ecosystem can be led to data-driven government and have a direct impact on the way policymakers work and citizens interact with governments. In this research paper, we conduct a systematic literature review. The main aims of this paper are to highlight essential aspects of the government (big) data ecosystem and to explore the most critical socio-technical factors that contribute to the successful implementation of government (big) data ecosystem. The essential aspects of government (big) data ecosystem include definition, data types, data lifecycle models, and actors and their roles. We also discuss the potential impact of (big) data in public administration and gaps in the government data ecosystems literature. As this is a new topic, we did not find specific articles on government (big) data ecosystem and therefore focused our research on various relevant areas like humanitarian data, open government data, scientific research data, industry data, etc.

Keywords: applications of big data, big data, big data types. big data ecosystem, critical success factors, data-driven government, egovernment, gaps in data ecosystems, government (big) data, literature review, public administration, systematic review

Procedia PDF Downloads 182
24153 A Machine Learning Decision Support Framework for Industrial Engineering Purposes

Authors: Anli Du Preez, James Bekker

Abstract:

Data is currently one of the most critical and influential emerging technologies. However, the true potential of data is yet to be exploited since, currently, about 1% of generated data are ever actually analyzed for value creation. There is a data gap where data is not explored due to the lack of data analytics infrastructure and the required data analytics skills. This study developed a decision support framework for data analytics by following Jabareen’s framework development methodology. The study focused on machine learning algorithms, which is a subset of data analytics. The developed framework is designed to assist data analysts with little experience, in choosing the appropriate machine learning algorithm given the purpose of their application.

Keywords: Data analytics, Industrial engineering, Machine learning, Value creation

Procedia PDF Downloads 141
24152 Providing Security to Private Cloud Using Advanced Encryption Standard Algorithm

Authors: Annapureddy Srikant Reddy, Atthanti Mahendra, Samala Chinni Krishna, N. Neelima

Abstract:

In our present world, we are generating a lot of data and we, need a specific device to store all these data. Generally, we store data in pen drives, hard drives, etc. Sometimes we may loss the data due to the corruption of devices. To overcome all these issues, we implemented a cloud space for storing the data, and it provides more security to the data. We can access the data with just using the internet from anywhere in the world. We implemented all these with the java using Net beans IDE. Once user uploads the data, he does not have any rights to change the data. Users uploaded files are stored in the cloud with the file name as system time and the directory will be created with some random words. Cloud accepts the data only if the size of the file is less than 2MB.

Keywords: cloud space, AES, FTP, NetBeans IDE

Procedia PDF Downloads 176
24151 Graphic Calculator Effectiveness in Biology Teaching and Learning

Authors: Nik Azmah Nik Yusuff, Faridah Hassan Basri, Rosnidar Mansor

Abstract:

The purpose of the study is to find out the effectiveness of using Graphic calculators (GC) with Calculator Based Laboratory 2 (CBL2) in teaching and learning of form four biology for these topics: Nutrition, Respiration and Dynamic Ecosystem. Sixty form four science stream students were the participants of this study. The participants were divided equally into the treatment and control groups. The treatment group used GC with CBL2 during experiments while the control group used the ordinary conventional laboratory apparatus without using GC with CBL2. Instruments in this study were a set of pre-test and post-test and a questionnaire. T-Test was used to compare the student’s biology achievement while a descriptive statistic was used to analyze the outcome of the questionnaire. The findings of this study indicated the use of GC with CBL2 in biology had significant positive effect. The highest mean was 4.43 for item stating the use of GC with CBL2 had saved collecting experiment result’s time. The second highest mean was 4.10 for item stating GC with CBL2 had saved drawing and labelling graphs. The outcome from the questionnaire also showed that GC with CBL2 were easy to use and save time. Thus, teachers should use GC with CBL2 in support of efforts by Malaysia Ministry of Education in encouraging technology-enhanced lessons.

Keywords: biology experiments, Calculator-Based Laboratory 2 (CBL2), graphic calculators, Malaysia Secondary School, teaching/learning

Procedia PDF Downloads 379
24150 Business Intelligence for Profiling of Telecommunication Customer

Authors: Rokhmatul Insani, Hira Laksmiwati Soemitro

Abstract:

Business Intelligence is a methodology that exploits the data to produce information and knowledge systematically, business intelligence can support the decision-making process. Some methods in business intelligence are data warehouse and data mining. A data warehouse can store historical data from transactional data. For data modelling in data warehouse, we apply dimensional modelling by Kimball. While data mining is used to extracting patterns from the data and get insight from the data. Data mining has many techniques, one of which is segmentation. For profiling of telecommunication customer, we use customer segmentation according to customer’s usage of services, customer invoice and customer payment. Customers can be grouped according to their characteristics and can be identified the profitable customers. We apply K-Means Clustering Algorithm for segmentation. The input variable for that algorithm we use RFM (Recency, Frequency and Monetary) model. All process in data mining, we use tools IBM SPSS modeller.

Keywords: business intelligence, customer segmentation, data warehouse, data mining

Procedia PDF Downloads 448
24149 Imputation Technique for Feature Selection in Microarray Data Set

Authors: Younies Saeed Hassan Mahmoud, Mai Mabrouk, Elsayed Sallam

Abstract:

Analysing DNA microarray data sets is a great challenge, which faces the bioinformaticians due to the complication of using statistical and machine learning techniques. The challenge will be doubled if the microarray data sets contain missing data, which happens regularly because these techniques cannot deal with missing data. One of the most important data analysis process on the microarray data set is feature selection. This process finds the most important genes that affect certain disease. In this paper, we introduce a technique for imputing the missing data in microarray data sets while performing feature selection.

Keywords: DNA microarray, feature selection, missing data, bioinformatics

Procedia PDF Downloads 535
24148 PDDA: Priority-Based, Dynamic Data Aggregation Approach for Sensor-Based Big Data Framework

Authors: Lutful Karim, Mohammed S. Al-kahtani

Abstract:

Sensors are being used in various applications such as agriculture, health monitoring, air and water pollution monitoring, traffic monitoring and control and hence, play the vital role in the growth of big data. However, sensors collect redundant data. Thus, aggregating and filtering sensors data are significantly important to design an efficient big data framework. Current researches do not focus on aggregating and filtering data at multiple layers of sensor-based big data framework. Thus, this paper introduces (i) three layers data aggregation and framework for big data and (ii) a priority-based, dynamic data aggregation scheme (PDDA) for the lowest layer at sensors. Simulation results show that the PDDA outperforms existing tree and cluster-based data aggregation scheme in terms of overall network energy consumptions and end-to-end data transmission delay.

Keywords: big data, clustering, tree topology, data aggregation, sensor networks

Procedia PDF Downloads 305
24147 Hacking's 'Between Goffman and Foucault': A Theoretical Frame for Criminology

Authors: Tomás Speziale

Abstract:

This paper aims to analyse how Ian Hacking states the theoretical basis of his research on the classification of people. Although all his early philosophical education had been based in Foucault, it is also true that Erving Goffman’s perspective provided him with epistemological and methodological tools for understanding face-to-face relationships. Hence, all his works must be thought of as social science texts that combine the research on how the individuals are constituted ‘top-down’ (as in Foucault), with the inquiry into how people renegotiate ‘bottom-up’ the classifications about them. Thus, Hacking´s proposal constitutes a middle ground between the French Philosopher and the American Sociologist. Placing himself between both authors allows Hacking to build a frame that is expected to adjust to Social Sciences’ main particularity: the fact that they study interactive kinds. These are kinds of people, which imply that those who are classified can change in certain ways that prompt the need for changing previous classifications themselves. It is all about the interaction between the labelling of people and the people who are classified. Consequently, understanding the way in which Hacking uses Foucault’s and Goffman’s theories is essential to fully comprehend the social dynamic between individuals and concepts, what Bert Hansen had called dialectical realism. His theoretical proposal, therefore, is not only valuable because it combines diverse perspectives, but also because it constitutes an utterly original and relevant framework for Sociological theory and particularly for Criminology.

Keywords: classification of people, Foucault's archaeology, Goffman's interpersonal sociology, interactive kinds

Procedia PDF Downloads 315
24146 Control the Flow of Big Data

Authors: Shizra Waris, Saleem Akhtar

Abstract:

Big data is a research area receiving attention from academia and IT communities. In the digital world, the amounts of data produced and stored have within a short period of time. Consequently this fast increasing rate of data has created many challenges. In this paper, we use functionalism and structuralism paradigms to analyze the genesis of big data applications and its current trends. This paper presents a complete discussion on state-of-the-art big data technologies based on group and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also covers big data analytics techniques, processing methods, some reported case studies from different vendor, several open research challenges and the chances brought about by big data. The similarities and differences of these techniques and technologies based on important limitations are also investigated. Emerging technologies are suggested as a solution for big data problems.

Keywords: computer, it community, industry, big data

Procedia PDF Downloads 161
24145 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia

Abstract:

Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: high performance computing, HPC, big data, data analysis

Procedia PDF Downloads 485
24144 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 297
24143 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 424
24142 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 220
24141 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 248