Search results for: big data ecosystem
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25752

Search results for: big data ecosystem

24252 Unlocking Health Insights: Studying Data for Better Care

Authors: Valentina Marutyan

Abstract:

Healthcare data mining is a rapidly developing field at the intersection of technology and medicine that has the potential to change our understanding and approach to providing healthcare. Healthcare and data mining is the process of examining huge amounts of data to extract useful information that can be applied in order to improve patient care, treatment effectiveness, and overall healthcare delivery. This field looks for patterns, trends, and correlations in a variety of healthcare datasets, such as electronic health records (EHRs), medical imaging, patient demographics, and treatment histories. To accomplish this, it uses advanced analytical approaches. Predictive analysis using historical patient data is a major area of interest in healthcare data mining. This enables doctors to get involved early to prevent problems or improve results for patients. It also assists in early disease detection and customized treatment planning for every person. Doctors can customize a patient's care by looking at their medical history, genetic profile, current and previous therapies. In this way, treatments can be more effective and have fewer negative consequences. Moreover, helping patients, it improves the efficiency of hospitals. It helps them determine the number of beds or doctors they require in regard to the number of patients they expect. In this project are used models like logistic regression, random forests, and neural networks for predicting diseases and analyzing medical images. Patients were helped by algorithms such as k-means, and connections between treatments and patient responses were identified by association rule mining. Time series techniques helped in resource management by predicting patient admissions. These methods improved healthcare decision-making and personalized treatment. Also, healthcare data mining must deal with difficulties such as bad data quality, privacy challenges, managing large and complicated datasets, ensuring the reliability of models, managing biases, limited data sharing, and regulatory compliance. Finally, secret code of data mining in healthcare helps medical professionals and hospitals make better decisions, treat patients more efficiently, and work more efficiently. It ultimately comes down to using data to improve treatment, make better choices, and simplify hospital operations for all patients.

Keywords: data mining, healthcare, big data, large amounts of data

Procedia PDF Downloads 75
24251 Assessment on Rumen Microbial Diversity of Bali Cattle Using 16S rRNA Sequencing

Authors: Asmuddin Natsir, A. Mujnisa, Syahriani Syahrir, Marhamah Nadir, Nurul Purnomo

Abstract:

Bacteria, protozoa, Archaea, and fungi are the dominant microorganisms found in the rumen ecosystem that has an important role in converting feed ingredients into components that can be digested and utilized by the livestock host. This study was conducted to assess the diversity of rumen bacteria of bali cattle raised under traditional farming condition. Three adult bali cattle were used in this experiment. The rumen fluid samples from the three experimental animals were obtained by the Stomach Tube method before the morning feeding. The results of study indicated that the Illumina sequencing was successful in identifying 301,589 sequences, averaging 100,533 sequences, from three rumen fluid samples of three cattle. Furthermore, based on the SILVA taxonomic database, there were 19 kinds of phyla that had been successfully identified. Of the 19 phyla, there were only two dominant groups across the three samples, namely Bacteroidetes and Firmicutes, with an average percentage of 83.68% and 13.43%, respectively. Other groups such as Synergistetes, Spirochaetae, Planctomycetes can also be identified but in relatively small percentage. At the genus level, there were 157 sequences obtained from all three samples. Of this number, the most dominant group was Prevotella 1 with a percentage of 71.82% followed by 6.94% of Christencenellaceae R-7 group. Other groups such as Prevotellaceae UCG-001, Ruminococcaceae NK4A214 group, Sphaerochaeta, Ruminococcus 2, Rikenellaceae RC9 gut group, Quinella were also identified but with very low percentages. The sequencing results were able to detect the presence of 3.06% and 3.92% respectively for uncultured rumen bacterium and uncultured bacterium. In conclusion, the results of this experiment can provide an opportunity for a better understanding of the rumen bacterial diversity of the bali cattle raised under traditional farming condition and insight regarding the uncultured rumen bacterium and uncultured bacterium that need to be further explored.

Keywords: 16S rRNA sequencing, bali cattle, rumen microbial diversity, uncultured rumen bacterium

Procedia PDF Downloads 335
24250 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features

Authors: Bushra Zafar, Usman Qamar

Abstract:

Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.

Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection

Procedia PDF Downloads 315
24249 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education

Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue

Abstract:

In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.

Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education

Procedia PDF Downloads 105
24248 Foundation of the Information Model for Connected-Cars

Authors: Hae-Won Seo, Yong-Gu Lee

Abstract:

Recent progress in the next generation of automobile technology is geared towards incorporating information technology into cars. Collectively called smart cars are bringing intelligence to cars that provides comfort, convenience and safety. A branch of smart cars is connected-car system. The key concept in connected-cars is the sharing of driving information among cars through decentralized manner enabling collective intelligence. This paper proposes a foundation of the information model that is necessary to define the driving information for smart-cars. Road conditions are modeled through a unique data structure that unambiguously represent the time variant traffics in the streets. Additionally, the modeled data structure is exemplified in a navigational scenario and usage using UML. Optimal driving route searching is also discussed using the proposed data structure in a dynamically changing road conditions.

Keywords: connected-car, data modeling, route planning, navigation system

Procedia PDF Downloads 373
24247 Combined Use of Microbial Consortia for the Enhanced Degradation of Type-IIx Pyrethroids

Authors: Parminder Kaur, Chandrajit B. Majumder

Abstract:

The unrestrained usage of pesticides to meet the burgeoning demand of enhanced crop productivity has led to the serious contamination of both terrestrial and aquatic ecosystem. The remediation of mixture of pesticides is a challenging affair regarding inadvertent mixture of pesticides from agricultural lands treated with various compounds. Global concerns about the excessive use of pesticides have driven the need to develop more effective and safer alternatives for their remediation. We focused our work on the microbial degradation of a mixture of three Type II-pyrethroids, namely Cypermethrin, Cyhalothrin and Deltamethrin commonly applied for both agricultural and domestic purposes. The fungal strains (Fusarium strain 8-11P and Fusarium sp. zzz1124) had previously been isolated from agricultural soils and their ability to biotransform this amalgam was studied. In brief, the experiment was conducted in two growth systems (added carbon and carbon-free) enriched with variable concentrations of pyrethroids between 100 to 300 mgL⁻¹. Parameter optimization (pH, temperature, concentration and time) was done using a central composite design matrix of Response Surface Methodology (RSM). At concentrations below 200 mgL⁻¹, complete removal was observed; however, degradation of 95.6%/97.4 and 92.27%/95.65% (in carbon-free/added carbon) was observed for 250 and 300 mgL⁻¹ respectively. The consortium has been shown to degrade the pyrethroid mixture (300 mg L⁻¹) within 120 h. After 5 day incubation, the residual pyrethroids concentration in unsterilized soil were much lower than in sterilized soil, indicating that microbial degradation predominates in pyrethroids elimination with the half-life (t₁/₂) of 1.6 d and R² ranging from 0.992-0.999. Overall, these results showed that microbial consortia might be more efficient than single degrader strains. The findings will complement our current understanding of the bioremediation of mixture of Type II pyrethroids with microbial consortia and potentially heighten the importance for considering bioremediation as an effective alternative for the remediation of such pollutants.

Keywords: bioremediation, fungi, pyrethroids, soil

Procedia PDF Downloads 145
24246 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction

Procedia PDF Downloads 335
24245 Automated Multisensory Data Collection System for Continuous Monitoring of Refrigerating Appliances Recycling Plants

Authors: Georgii Emelianov, Mikhail Polikarpov, Fabian Hübner, Jochen Deuse, Jochen Schiemann

Abstract:

Recycling refrigerating appliances plays a major role in protecting the Earth's atmosphere from ozone depletion and emissions of greenhouse gases. The performance of refrigerator recycling plants in terms of material retention is the subject of strict environmental certifications and is reviewed periodically through specialized audits. The continuous collection of Refrigerator data required for the input-output analysis is still mostly manual, error-prone, and not digitalized. In this paper, we propose an automated data collection system for recycling plants in order to deduce expected material contents in individual end-of-life refrigerating appliances. The system utilizes laser scanner measurements and optical data to extract attributes of individual refrigerators by applying transfer learning with pre-trained vision models and optical character recognition. Based on Recognized features, the system automatically provides material categories and target values of contained material masses, especially foaming and cooling agents. The presented data collection system paves the way for continuous performance monitoring and efficient control of refrigerator recycling plants.

Keywords: automation, data collection, performance monitoring, recycling, refrigerators

Procedia PDF Downloads 162
24244 Evaluating the Impact of Future Scenarios on Water Availability and Demand Based on Stakeholders Prioritized Water Management Options in the Upper Awash Basin, Ethiopia

Authors: Adey Nigatu Mersha, Ilyas Masih, Charlotte de Fraiture, Tena Alamirew

Abstract:

Conflicts over water are increasing mainly as a result of water scarcity in response to higher water demand and climatic variability. There is often not enough water to meet all demands for different uses. Thus, decisions have to be made as to how the available resources can be managed and utilized. Correspondingly water allocation goals, practically national water policy goals, need to be revised accordingly as the pressure on water increases from time to time. A case study is conducted in the Upper Awash Basin, Ethiopia, to assess and evaluate prioritized comprehensive water demand management options based on the framework of integrated water resources management in account of stakeholders’ knowledge and preferences as well as practical prominence within the Upper Awash Basin. Two categories of alternative management options based on policy analysis and stakeholders' consultation were evaluated against the business-as-usual scenario by using WEAP21 model as an analytical tool. Strong effects on future (unmet) demands are observed with major socio-economic assumptions and forthcoming water development plans. Water management within the basin will get more complex with further abstraction which may lead to an irreversible damage to the ecosystem. It is further confirmed through this particular study that efforts to maintain users’ preferences alone cannot insure economically viable and environmentally sound development and vice versa. There is always a tradeoff between these factors. Hence, all of these facets must be analyzed separately, related with each other in equal footing, and ultimately taken up in decision making in order for the whole system to function properly.

Keywords: water demand, water availability, WEAP21, scenarios

Procedia PDF Downloads 278
24243 Sales Patterns Clustering Analysis on Seasonal Product Sales Data

Authors: Soojin Kim, Jiwon Yang, Sungzoon Cho

Abstract:

As a seasonal product is only in demand for a short time, inventory management is critical to profits. Both markdowns and stockouts decrease the return on perishable products; therefore, researchers have been interested in the distribution of seasonal products with the aim of maximizing profits. In this study, we propose a data-driven seasonal product sales pattern analysis method for individual retail outlets based on observed sales data clustering; the proposed method helps in determining distribution strategies.

Keywords: clustering, distribution, sales pattern, seasonal product

Procedia PDF Downloads 595
24242 Optimization of the Jatropha curcas Supply Chain as a Criteria for the Implementation of Future Collection Points in Rural Areas of Manabi-Ecuador

Authors: Boris G. German, Edward Jiménez, Sebastián Espinoza, Andrés G. Chico, Ricardo A. Narváez

Abstract:

The unique flora and fauna of The Galapagos Islands has leveraged a tourism-driven growth in the islands. Nonetheless, such development is energy-intensive and requires thousands of gallons of diesel each year for thermoelectric electricity generation. The needed transport of fossil fuels from the continent has generated oil spillages and affectations to the fragile ecosystem of the islands. The Zero Fossil Fuels initiative for The Galapagos proposed by the Ecuadorian government as an alternative to reduce the use of fossil fuels in the islands, considers the replacement of diesel in thermoelectric generators, by Jatropha curcas vegetable oil. However, the Jatropha oil supply cannot entirely cover yet the demand for electricity generation in Galapagos. Within this context, the present work aims to provide an optimization model that can be used as a selection criterion for approving new Jatropha Curcas collection points in rural areas of Manabi-Ecuador. For this purpose, existing Jatropha collection points in Manabi were grouped under three regions: north (7 collection points), center (4 collection points) and south (9 collection points). Field work was carried out in every region in order to characterize the collection points, to establish local Jatropha supply and to determine transportation costs. Data collection was complemented using GIS software and an objective function was defined in order to determine the profit associated to Jatropha oil production. The market price of both Jatropha oil and residual cake, were considered for the total revenue; whereas Jatropha price, transportation and oil extraction costs were considered for the total cost. The tonnes of Jatropha fruit and seed, transported from collection points to the extraction plant, were considered as variables. The maximum and minimum amount of the collected Jatropha from each region constrained the optimization problem. The supply chain was optimized using linear programming in order to maximize the profits. Finally, a sensitivity analysis was performed in order to find a profit-based criterion for the acceptance of future collection points in Manabi. The maximum profit reached a value of $ 4,616.93 per year, which represented a total Jatropha collection of 62.3 tonnes Jatropha per year. The northern region of Manabi had the biggest collection share (69%), followed by the southern region (17%). The criteria for accepting new Jatropha collection points in the rural areas of Manabi can be defined by the current maximum profit of the zone and by the variation in the profit when collection points are removed one at a time. The definition of new feasible collection points plays a key role in the supply chain associated to Jatropha oil production. Therefore, a mathematical model that assists decision makers in establishing new collection points while assuring profitability, contributes to guarantee a continued Jatropha oil supply for Galapagos and a sustained economic growth in the rural areas of Ecuador.

Keywords: collection points, Jatropha curcas, linear programming, supply chain

Procedia PDF Downloads 430
24241 Probability Sampling in Matched Case-Control Study in Drug Abuse

Authors: Surya R. Niraula, Devendra B Chhetry, Girish K. Singh, S. Nagesh, Frederick A. Connell

Abstract:

Background: Although random sampling is generally considered to be the gold standard for population-based research, the majority of drug abuse research is based on non-random sampling despite the well-known limitations of this kind of sampling. Method: We compared the statistical properties of two surveys of drug abuse in the same community: one using snowball sampling of drug users who then identified “friend controls” and the other using a random sample of non-drug users (controls) who then identified “friend cases.” Models to predict drug abuse based on risk factors were developed for each data set using conditional logistic regression. We compared the precision of each model using bootstrapping method and the predictive properties of each model using receiver operating characteristics (ROC) curves. Results: Analysis of 100 random bootstrap samples drawn from the snowball-sample data set showed a wide variation in the standard errors of the beta coefficients of the predictive model, none of which achieved statistical significance. One the other hand, bootstrap analysis of the random-sample data set showed less variation, and did not change the significance of the predictors at the 5% level when compared to the non-bootstrap analysis. Comparison of the area under the ROC curves using the model derived from the random-sample data set was similar when fitted to either data set (0.93, for random-sample data vs. 0.91 for snowball-sample data, p=0.35); however, when the model derived from the snowball-sample data set was fitted to each of the data sets, the areas under the curve were significantly different (0.98 vs. 0.83, p < .001). Conclusion: The proposed method of random sampling of controls appears to be superior from a statistical perspective to snowball sampling and may represent a viable alternative to snowball sampling.

Keywords: drug abuse, matched case-control study, non-probability sampling, probability sampling

Procedia PDF Downloads 492
24240 Bioinformatics High Performance Computation and Big Data

Authors: Javed Mohammed

Abstract:

Right now, bio-medical infrastructure lags well behind the curve. Our healthcare system is dispersed and disjointed; medical records are a bit of a mess; and we do not yet have the capacity to store and process the crazy amounts of data coming our way from widespread whole-genome sequencing. And then there are privacy issues. Despite these infrastructure challenges, some researchers are plunging into bio medical Big Data now, in hopes of extracting new and actionable knowledge. They are doing delving into molecular-level data to discover bio markers that help classify patients based on their response to existing treatments; and pushing their results out to physicians in novel and creative ways. Computer scientists and bio medical researchers are able to transform data into models and simulations that will enable scientists for the first time to gain a profound under-standing of the deepest biological functions. Solving biological problems may require High-Performance Computing HPC due either to the massive parallel computation required to solve a particular problem or to algorithmic complexity that may range from difficult to intractable. Many problems involve seemingly well-behaved polynomial time algorithms (such as all-to-all comparisons) but have massive computational requirements due to the large data sets that must be analyzed. High-throughput techniques for DNA sequencing and analysis of gene expression have led to exponential growth in the amount of publicly available genomic data. With the increased availability of genomic data traditional database approaches are no longer sufficient for rapidly performing life science queries involving the fusion of data types. Computing systems are now so powerful it is possible for researchers to consider modeling the folding of a protein or even the simulation of an entire human body. This research paper emphasizes the computational biology's growing need for high-performance computing and Big Data. It illustrates this article’s indispensability in meeting the scientific and engineering challenges of the twenty-first century, and how Protein Folding (the structure and function of proteins) and Phylogeny Reconstruction (evolutionary history of a group of genes) can use HPC that provides sufficient capability for evaluating or solving more limited but meaningful instances. This article also indicates solutions to optimization problems, and benefits Big Data and Computational Biology. The article illustrates the Current State-of-the-Art and Future-Generation Biology of HPC Computing with Big Data.

Keywords: high performance, big data, parallel computation, molecular data, computational biology

Procedia PDF Downloads 362
24239 Isolation and Identification of Microorganisms from Marine-Associated Samples under Laboratory Conditions

Authors: Sameen Tariq, Saira Bano, Sayyada Ghufrana Nadeem

Abstract:

The Ocean, which covers over 70% of the world's surface, is wealthy in biodiversity as well as a rich wellspring of microorganisms with huge potential. The oceanic climate is home to an expansive scope of plants, creatures, and microorganisms. Marine microbial networks, which incorporate microscopic organisms, infections, and different microorganisms, enjoy different benefits in biotechnological processes. Samples were collected from marine environments, including soil and water samples, to cultivate the uncultured marine organisms by using Zobell’s medium, Sabouraud’s dextrose agar, and casein media for this purpose. Following isolation, we conduct microscopy and biochemical tests, including gelatin, starch, glucose, casein, catalase, and carbohydrate hydrolysis for further identification. The results show that more gram-positive and gram-negative bacteria. The isolation process of marine organisms is essential for understanding their ecological roles, unraveling their biological secrets, and harnessing their potential for various applications. Marine organisms exhibit remarkable adaptations to thrive in the diverse and challenging marine environment, offering vast potential for scientific, medical, and industrial applications. The isolation process plays a crucial role in unlocking the secrets of marine organisms, understanding their biological functions, and harnessing their valuable properties. They offer a rich source of bioactive compounds with pharmaceutical potential, including antibiotics, anticancer agents, and novel therapeutics. This study is an attempt to explore the diversity and dynamics related to marine microflora and their role in biofilm formation.

Keywords: marine microorganisms, ecosystem, fungi, biofilm, gram-positive, gram-negative

Procedia PDF Downloads 44
24238 Evaluating the Effectiveness of Science Teacher Training Programme in National Colleges of Education: a Preliminary Study, Perceptions of Prospective Teachers

Authors: A. S. V Polgampala, F. Huang

Abstract:

This is an overview of what is entailed in an evaluation and issues to be aware of when class observation is being done. This study examined the effects of evaluating teaching practice of a 7-day ‘block teaching’ session in a pre -service science teacher training program at a reputed National College of Education in Sri Lanka. Effects were assessed in three areas: evaluation of the training process, evaluation of the training impact, and evaluation of the training procedure. Data for this study were collected by class observation of 18 teachers during 9th February to 16th of 2017. Prospective teachers of science teaching, the participants of the study were evaluated based on newly introduced format by the NIE. The data collected was analyzed qualitatively using the Miles and Huberman procedure for analyzing qualitative data: data reduction, data display and conclusion drawing/verification. It was observed that the trainees showed their confidence in teaching those competencies and skills. Teacher educators’ dissatisfaction has been a great impact on evaluation process.

Keywords: evaluation, perceptions & perspectives, pre-service, science teachering

Procedia PDF Downloads 313
24237 Detecting Venomous Files in IDS Using an Approach Based on Data Mining Algorithm

Authors: Sukhleen Kaur

Abstract:

In security groundwork, Intrusion Detection System (IDS) has become an important component. The IDS has received increasing attention in recent years. IDS is one of the effective way to detect different kinds of attacks and malicious codes in a network and help us to secure the network. Data mining techniques can be implemented to IDS, which analyses the large amount of data and gives better results. Data mining can contribute to improving intrusion detection by adding a level of focus to anomaly detection. So far the study has been carried out on finding the attacks but this paper detects the malicious files. Some intruders do not attack directly, but they hide some harmful code inside the files or may corrupt those file and attack the system. These files are detected according to some defined parameters which will form two lists of files as normal files and harmful files. After that data mining will be performed. In this paper a hybrid classifier has been used via Naive Bayes and Ripper classification methods. The results show how the uploaded file in the database will be tested against the parameters and then it is characterised as either normal or harmful file and after that the mining is performed. Moreover, when a user tries to mine on harmful file it will generate an exception that mining cannot be made on corrupted or harmful files.

Keywords: data mining, association, classification, clustering, decision tree, intrusion detection system, misuse detection, anomaly detection, naive Bayes, ripper

Procedia PDF Downloads 413
24236 Generalized Approach to Linear Data Transformation

Authors: Abhijith Asok

Abstract:

This paper presents a generalized approach for the simple linear data transformation, Y=bX, through an integration of multidimensional coordinate geometry, vector space theory and polygonal geometry. The scaling is performed by adding an additional ’Dummy Dimension’ to the n-dimensional data, which helps plot two dimensional component-wise straight lines on pairs of dimensions. The end result is a set of scaled extensions of observations in any of the 2n spatial divisions, where n is the total number of applicable dimensions/dataset variables, created by shifting the n-dimensional plane along the ’Dummy Axis’. The derived scaling factor was found to be dependent on the coordinates of the common point of origin for diverging straight lines and the plane of extension, chosen on and perpendicular to the ’Dummy Axis’, respectively. This result indicates the geometrical interpretation of a linear data transformation and hence, opportunities for a more informed choice of the factor ’b’, based on a better choice of these coordinate values. The paper follows on to identify the effect of this transformation on certain popular distance metrics, wherein for many, the distance metric retained the same scaling factor as that of the features.

Keywords: data transformation, dummy dimension, linear transformation, scaling

Procedia PDF Downloads 297
24235 Using Learning Apps in the Classroom

Authors: Janet C. Read

Abstract:

UClan set collaboration with Lingokids to assess the Lingokids learning app's impact on learning outcomes in classrooms in the UK for children with ages ranging from 3 to 5 years. Data gathered during the controlled study with 69 children includes attitudinal data, engagement, and learning scores. Data shows that children enjoyment while learning was higher among those children using the game-based app compared to those children using other traditional methods. It’s worth pointing out that engagement when using the learning app was significantly higher than other traditional methods among older children. According to existing literature, there is a direct correlation between engagement, motivation, and learning. Therefore, this study provides relevant data points to conclude that Lingokids learning app serves its purpose of encouraging learning through playful and interactive content. That being said, we believe that learning outcomes should be assessed with a wider range of methods in further studies. Likewise, it would be beneficial to assess the level of usability and playability of the app in order to evaluate the learning app from other angles.

Keywords: learning app, learning outcomes, rapid test activity, Smileyometer, early childhood education, innovative pedagogy

Procedia PDF Downloads 68
24234 Road Safety in the Great Britain: An Exploratory Data Analysis

Authors: Jatin Kumar Choudhary, Naren Rayala, Abbas Eslami Kiasari, Fahimeh Jafari

Abstract:

The Great Britain has one of the safest road networks in the world. However, the consequences of any death or serious injury are devastating for loved ones, as well as for those who help the severely injured. This paper aims to analyse the Great Britain's road safety situation and show the response measures for areas where the total damage caused by accidents can be significantly and quickly reduced. In this paper, we do an exploratory data analysis using STATS19 data. For the past 30 years, the UK has had a good record in reducing fatalities. The UK ranked third based on the number of road deaths per million inhabitants. There were around 165,000 accidents reported in the Great Britain in 2009 and it has been decreasing every year until 2019 which is under 120,000. The government continues to scale back road deaths empowering responsible road users by identifying and prosecuting the parameters that make the roads less safe.

Keywords: road safety, data analysis, openstreetmap, feature expanding.

Procedia PDF Downloads 139
24233 Intrusion Detection System Using Linear Discriminant Analysis

Authors: Zyad Elkhadir, Khalid Chougdali, Mohammed Benattou

Abstract:

Most of the existing intrusion detection systems works on quantitative network traffic data with many irrelevant and redundant features, which makes detection process more time’s consuming and inaccurate. A several feature extraction methods, such as linear discriminant analysis (LDA), have been proposed. However, LDA suffers from the small sample size (SSS) problem which occurs when the number of the training samples is small compared with the samples dimension. Hence, classical LDA cannot be applied directly for high dimensional data such as network traffic data. In this paper, we propose two solutions to solve SSS problem for LDA and apply them to a network IDS. The first method, reduce the original dimension data using principal component analysis (PCA) and then apply LDA. In the second solution, we propose to use the pseudo inverse to avoid singularity of within-class scatter matrix due to SSS problem. After that, the KNN algorithm is used for classification process. We have chosen two known datasets KDDcup99 and NSLKDD for testing the proposed approaches. Results showed that the classification accuracy of (PCA+LDA) method outperforms clearly the pseudo inverse LDA method when we have large training data.

Keywords: LDA, Pseudoinverse, PCA, IDS, NSL-KDD, KDDcup99

Procedia PDF Downloads 225
24232 The Ecological Urbanism as an Oppurtunity to Solve City Problem

Authors: Fairuz A. Ulinnuha, Bimo K. Fuadi

Abstract:

The world’s population continues to grow resulting in steady migration from rural to urban areas. Increased numbers of people and cities hand in hand with greater exploitation of world’s resource. Every year, more cities are feeling the devastating of this impact of this situation. During the 1970’s, some of eco-concept were applied to urban settings, one of them is Ecological Cities. A non-profit organization, Urban Ecology, was founded in California in 1975 to 'rebuild cities in balance with nature'. Efforts to synthesize ecological and urban planning approaches were slowed somewhat in the 1980s, but useful refinements were made. Consideration of social impact acknowledges that the ecological design is not just about ecology itself. It is also about questioning and redefining our understanding of the ecology. When ecologist did recognize the existence of cities, they were usually concerned with resource flows. One popular approach was to study the flow and transformation of energy through urban ecosystem. This research method is descriptive method, following LEED Certification which is the international standard of the sustainable building, is more widely applied. But there remains problem that the moral imperative of sustainability and by implication of sustainable design, tends to supplant the disciplinary contribution. Sustainable design is not always seen as design excellence or design innovation. This can provoke the skepticism and cause the tension those who promote disciplinary knowledge and those who push for sustainability. The challenges of rapid urbanization and limited of global resources has become more pressing. So, there is a need to find an alternative design approaches. The urban, as the site of complex relation (economy, political, social, cultural), need a complex problem solving that can solve current and future condition. The aim of this study is to discussed about conjoining of ecology such as public park and sustainable design.

Keywords: ecology, cities, urban, sustainability

Procedia PDF Downloads 138
24231 A Mixed-Methods Approach to Developing and Evaluating an SME Business Support Model for Innovation in Rural England

Authors: Steve Fish, Chris Lambert

Abstract:

Cumbria is a geo-political county in Northwest England within which the Lake District National Park, a UNESCO World Heritage site is located. Whilst the area has a formidable reputation for natural beauty and historic assets, the innovation ecosystem is described as ‘patchy’ for a number of reasons. The county is one of the largest in England by area and is sparsely populated. This paper describes the needs, development and delivery of an SME business-support programme funded by the European Regional Development Fund, Lancaster University and the University of Cumbria. The Cumbria Innovations Platform (CUSP) Project has been designed to respond to the nuanced needs of SMEs in this locale, whilst promoting the adoption of research and innovation. CUSP utilizes a funnel method to support rural businesses with access to university innovation intervention. CUSP has been built on a three-tier model: Communicate, Collaborate and Create. The paper describes this project in detail and presents results in terms of output indicators achieved, a beneficiary telephone survey and wider economic forecasts. From a pragmatic point-of-view, the paper provides experiences and reflections of those people who are delivering and evaluating knowledge exchange. The authors discuss some of the benefits, challenges and implications for both policy makers and practitioners. Finally, the paper aims to serve as an invitation to others who may consider adopting a similar method of university-industry collaboration in their own region.

Keywords: regional business support, rural business support, university-industry collaboration, collaborative R&D, SMEs, knowledge exchange

Procedia PDF Downloads 120
24230 Studies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise — in the Case of Critical Dataset Size —

Authors: Tetsuro Saeki, Yuichi Kato, Shoutarou Mizuno

Abstract:

STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to realworld data.

Keywords: rule induction, decision table, missing data, noise

Procedia PDF Downloads 395
24229 Seven Brothers and Sisters of Severely Disabled Children Speak up about Their Everyday Challenges and Needs : A Multiple Case Study

Authors: Myriam Castonguay, Florence Vinit

Abstract:

This study aims to gain a better understanding of the lived experience of seven children growing up in a family where another child is severely disabled, informed by family systems theory and the socio-ecological model of development. In depth semi-structured interviews were conducted with seven children who described they everyday life since their brother’s or sister’s diagnosis. Thematic analysis revealed four themes : struggling with loneliness inside the family, supporting the disabled child through its journey, accommodating to a changing routine and keeping a “bubble” for oneself. Brothers and sisters depict a family life characterized by much loneliness, with severe disabilities requiring ongoing care and prolonged hospitalizations. In the midst of adversity, siblings describe themselves as highly committed to supporting the disabled child and to preserve family cohesion, even if that means getting exposed to emotionally challenging situations and adjusting their daily routine frequently. Children recount that keeping up with schoolwork and leisure activities of their own is central to their well-being. Having a space where one can reconnect with his ordinary life as a kid is also deemed very important. This study reminds us that more needs to be done to counteract the loneliness experienced by siblings through the family experience of disability. Family members and clinicians need to be extra vigilant to ensure siblings’ needs don’t go unnoticed or dismissed, as it may be difficult for this population of children to voice their own experience and needs. Family, school and other actors in the community may help brothers and sisters pursue their personal dreams, goals and projects, to continue experiencing well-being despite adverse life circumstances.

Keywords: sibling’s lived experience of disability, sibling’s needs at various levels of the ecosystem, family adjustment to the disability experience, supporting family wellness through the disability experience

Procedia PDF Downloads 116
24228 Risk Assessment of Heavy Metals in Soils at Electronic Waste Activity Sites within the Vicinity of Alaba International Market, Nigeria

Authors: A. A. Adebayo, A. O. Ogunkeyede, A. O. Adeigbe

Abstract:

Digital globalisation and yarn of Nigeria society to overcome the digital divide have resulted in contamination of soil by heavy metals (HMs) from e-waste activities at Alaba international market, Lagos, Nigeria. The aim of this research was to determine the concentration of various metals {Cadmium (Cd), Chromium (Cr), Copper (Cu), and Lead (Pb)} and identify their ecological and health risks for the people within the study area. A total of 60 soil samples were collected at Alaba market study area. Two types of samples were collected from each sampling points: topsoil (0-15 cm), subsoil (15 -30 cm). The metal concentration results showed that the soils were heavily contaminated by HMs at topsoil and subsoil. The geoaccummulation and ecological risk indices revealed high pollution level from all studied site. The health risk assessment results suggested that there is high possibility of carcinogenic risk to humans because the carcinogenic risk via corresponding exposure pathways exceeded the safety limit of 10-6 (the acceptable level of carcinogenic risk for human). Furthermore, inhalation of soil particles is the main exposure pathway for Cr to enter the human body for all ages. Children in the vicinity are exposed more to ingestion of Pb since they tend to eat earth (pica) and repeatedly suck their fingers. This study provides basic information to create awareness for a need to introduce pollution control measures and the need to protect the ecosystem and human health within the study area at Alaba international market.

Keywords: contaminated soil, ecological risk, hazard index, risk factor, exposure pathways, heavy metals

Procedia PDF Downloads 251
24227 Machine Learning Strategies for Data Extraction from Unstructured Documents in Financial Services

Authors: Delphine Vendryes, Dushyanth Sekhar, Baojia Tong, Matthew Theisen, Chester Curme

Abstract:

Much of the data that inform the decisions of governments, corporations and individuals are harvested from unstructured documents. Data extraction is defined here as a process that turns non-machine-readable information into a machine-readable format that can be stored, for instance, in a database. In financial services, introducing more automation in data extraction pipelines is a major challenge. Information sought by financial data consumers is often buried within vast bodies of unstructured documents, which have historically required thorough manual extraction. Automated solutions provide faster access to non-machine-readable datasets, in a context where untimely information quickly becomes irrelevant. Data quality standards cannot be compromised, so automation requires high data integrity. This multifaceted task is broken down into smaller steps: ingestion, table parsing (detection and structure recognition), text analysis (entity detection and disambiguation), schema-based record extraction, user feedback incorporation. Selected intermediary steps are phrased as machine learning problems. Solutions leveraging cutting-edge approaches from the fields of computer vision (e.g. table detection) and natural language processing (e.g. entity detection and disambiguation) are proposed.

Keywords: computer vision, entity recognition, finance, information retrieval, machine learning, natural language processing

Procedia PDF Downloads 109
24226 Model Predictive Controller for Pasteurization Process

Authors: Tesfaye Alamirew Dessie

Abstract:

Our study focuses on developing a Model Predictive Controller (MPC) and evaluating it against a traditional PID for a pasteurization process. Utilizing system identification from the experimental data, the dynamics of the pasteurization process were calculated. Using best fit with data validation, residual, and stability analysis, the quality of several model architectures was evaluated. The validation data fit the auto-regressive with exogenous input (ARX322) model of the pasteurization process by roughly 80.37 percent. The ARX322 model structure was used to create MPC and PID control techniques. After comparing controller performance based on settling time, overshoot percentage, and stability analysis, it was found that MPC controllers outperform PID for those parameters.

Keywords: MPC, PID, ARX, pasteurization

Procedia PDF Downloads 161
24225 Point Estimation for the Type II Generalized Logistic Distribution Based on Progressively Censored Data

Authors: Rana Rimawi, Ayman Baklizi

Abstract:

Skewed distributions are important models that are frequently used in applications. Generalized distributions form a class of skewed distributions and gain widespread use in applications because of their flexibility in data analysis. More specifically, the Generalized Logistic Distribution with its different types has received considerable attention recently. In this study, based on progressively type-II censored data, we will consider point estimation in type II Generalized Logistic Distribution (Type II GLD). We will develop several estimators for its unknown parameters, including maximum likelihood estimators (MLE), Bayes estimators and linear estimators (BLUE). The estimators will be compared using simulation based on the criteria of bias and Mean square error (MSE). An illustrative example of a real data set will be given.

Keywords: point estimation, type II generalized logistic distribution, progressive censoring, maximum likelihood estimation

Procedia PDF Downloads 196
24224 Omni: Data Science Platform for Evaluate Performance of a LoRaWAN Network

Authors: Emanuele A. Solagna, Ricardo S, Tozetto, Roberto dos S. Rabello

Abstract:

Nowadays, physical processes are becoming digitized by the evolution of communication, sensing and storage technologies which promote the development of smart cities. The evolution of this technology has generated multiple challenges related to the generation of big data and the active participation of electronic devices in society. Thus, devices can send information that is captured and processed over large areas, but there is no guarantee that all the obtained data amount will be effectively stored and correctly persisted. Because, depending on the technology which is used, there are parameters that has huge influence on the full delivery of information. This article aims to characterize the project, currently under development, of a platform that based on data science will perform a performance and effectiveness evaluation of an industrial network that implements LoRaWAN technology considering its main parameters configuration relating these parameters to the information loss.

Keywords: Internet of Things, LoRa, LoRaWAN, smart cities

Procedia PDF Downloads 147
24223 Cybervetting and Online Privacy in Job Recruitment – Perspectives on the Current and Future Legislative Framework Within the EU

Authors: Nicole Christiansen, Hanne Marie Motzfeldt

Abstract:

In recent years, more and more HR professionals have been using cyber-vetting in job recruitment in an effort to find the perfect match for the company. These practices are growing rapidly, accessing a vast amount of data from social networks, some of which is privileged and protected information. Thus, there is a risk that the right to privacy is becoming a duty to manage your private data. This paper investigates to which degree a job applicant's fundamental rights are protected adequately in current and future legislation in the EU. This paper argues that current data protection regulations and forthcoming regulations on the use of AI ensure sufficient protection. However, even though the regulation on paper protects employees within the EU, the recruitment sector may not pay sufficient attention to the regulation as it not specifically targeting this area. Therefore, the lack of specific labor and employment regulation is a concern that the social partners should attend to.

Keywords: AI, cyber vetting, data protection, job recruitment, online privacy

Procedia PDF Downloads 85