Search results for: data lake
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25056

Search results for: data lake

24816 High Performance Computing and Big Data Analytics

Authors: Branci Sarra, Branci Saadia

Abstract:

Because of the multiplied data growth, many computer science tools have been developed to process and analyze these Big Data. High-performance computing architectures have been designed to meet the treatment needs of Big Data (view transaction processing standpoint, strategic, and tactical analytics). The purpose of this article is to provide a historical and global perspective on the recent trend of high-performance computing architectures especially what has a relation with Analytics and Data Mining.

Keywords: high performance computing, HPC, big data, data analysis

Procedia PDF Downloads 514
24815 A Landscape of Research Data Repositories in Re3data.org Registry: A Case Study of Indian Repositories

Authors: Prashant Shrivastava

Abstract:

The purpose of this study is to explore re3dat.org registry to identify research data repositories registration workflow process. Further objective is to depict a graph for present development of research data repositories in India. Preliminarily with an approach to understand re3data.org registry framework and schema design then further proceed to explore the status of research data repositories of India in re3data.org registry. Research data repositories are getting wider relevance due to e-research concepts. Now available registry re3data.org is a good tool for users and researchers to identify appropriate research data repositories as per their research requirements. In Indian environment, a compatible National Research Data Policy is the need of the time to boost the management of research data. Registry for Research Data Repositories is a crucial tool to discover specific information in specific domain. Also, Research Data Repositories in India have not been studied. Re3data.org registry and status of Indian research data repositories both discussed in this study.

Keywords: research data, research data repositories, research data registry, re3data.org

Procedia PDF Downloads 318
24814 A Study of Cloud Computing Solution for Transportation Big Data Processing

Authors: Ilgin Gökaşar, Saman Ghaffarian

Abstract:

The need for fast processed big data of transportation ridership (eg., smartcard data) and traffic operation (e.g., traffic detectors data) which requires a lot of computational power is incontrovertible in Intelligent Transportation Systems. Nowadays cloud computing is one of the important subjects and popular information technology solution for data processing. It enables users to process enormous measure of data without having their own particular computing power. Thus, it can also be a good selection for transportation big data processing as well. This paper intends to examine how the cloud computing can enhance transportation big data process with contrasting its advantages and disadvantages, and discussing cloud computing features.

Keywords: big data, cloud computing, Intelligent Transportation Systems, ITS, traffic data processing

Procedia PDF Downloads 458
24813 Harmonic Data Preparation for Clustering and Classification

Authors: Ali Asheibi

Abstract:

The rapid increase in the size of databases required to store power quality monitoring data has demanded new techniques for analysing and understanding the data. One suggested technique to assist in analysis is data mining. Preparing raw data to be ready for data mining exploration take up most of the effort and time spent in the whole data mining process. Clustering is an important technique in data mining and machine learning in which underlying and meaningful groups of data are discovered. Large amounts of harmonic data have been collected from an actual harmonic monitoring system in a distribution system in Australia for three years. This amount of acquired data makes it difficult to identify operational events that significantly impact the harmonics generated on the system. In this paper, harmonic data preparation processes to better understanding of the data have been presented. Underlying classes in this data has then been identified using clustering technique based on the Minimum Message Length (MML) method. The underlying operational information contained within the clusters can be rapidly visualised by the engineers. The C5.0 algorithm was used for classification and interpretation of the generated clusters.

Keywords: data mining, harmonic data, clustering, classification

Procedia PDF Downloads 241
24812 Linguistic Summarization of Structured Patent Data

Authors: E. Y. Igde, S. Aydogan, F. E. Boran, D. Akay

Abstract:

Patent data have an increasingly important role in economic growth, innovation, technical advantages and business strategies and even in countries competitions. Analyzing of patent data is crucial since patents cover large part of all technological information of the world. In this paper, we have used the linguistic summarization technique to prove the validity of the hypotheses related to patent data stated in the literature.

Keywords: data mining, fuzzy sets, linguistic summarization, patent data

Procedia PDF Downloads 266
24811 Proposal of Data Collection from Probes

Authors: M. Kebisek, L. Spendla, M. Kopcek, T. Skulavik

Abstract:

In our paper we describe the security capabilities of data collection. Data are collected with probes located in the near and distant surroundings of the company. Considering the numerous obstacles e.g. forests, hills, urban areas, the data collection is realized in several ways. The collection of data uses connection via wireless communication, LAN network, GSM network and in certain areas data are collected by using vehicles. In order to ensure the connection to the server most of the probes have ability to communicate in several ways. Collected data are archived and subsequently used in supervisory applications. To ensure the collection of the required data, it is necessary to propose algorithms that will allow the probes to select suitable communication channel.

Keywords: communication, computer network, data collection, probe

Procedia PDF Downloads 357
24810 A Review on Big Data Movement with Different Approaches

Authors: Nay Myo Sandar

Abstract:

With the growth of technologies and applications, a large amount of data has been producing at increasing rate from various resources such as social media networks, sensor devices, and other information serving devices. This large collection of massive, complex and exponential growth of dataset is called big data. The traditional database systems cannot store and process such data due to large and complexity. Consequently, cloud computing is a potential solution for data storage and processing since it can provide a pool of resources for servers and storage. However, moving large amount of data to and from is a challenging issue since it can encounter a high latency due to large data size. With respect to big data movement problem, this paper reviews the literature of previous works, discusses about research issues, finds out approaches for dealing with big data movement problem.

Keywords: Big Data, Cloud Computing, Big Data Movement, Network Techniques

Procedia PDF Downloads 79
24809 The Role of Halloysite’s Surface Area and Aspect Ratio on Tensile Properties of Ethylene Propylene Diene Monomer Nanocomposites

Authors: Pooria Pasbakhsh, Rangika T. De Silva, Vahdat Vahedi, Hanafi Ismail

Abstract:

The influence of three different types of halloysite nanotubes (HNTs) with different dimensions, namely as camel lake (CLA), Jarrahdale (JA) and Matauri Bay (MB), on their reinforcing ability of ethylene propylene dine monomer (EPDM) were investigated by varying the HNTs loading (from 0-15 phr). Mechanical properties of the nanocomposites improved with addition of all three HNTs, but CLA based nanocomposites exhibited a significant enhancement compared to the other HNTs. For instance, tensile properties of EPDM nanocomposites increased by 120%, 256% and 340% for MB, JA, and CLA, respectively with addition of 15 phr of HNTs. This could be due to the higher aspect ratio and higher surface area of CLA compared to others. Scanning electron microscopy (SEM) of nanocomposites at 15 phr of HNT loadings showed low amounts of pulled-out nanotubes which confirmed the presence of more embedded nanotubes inside the EPDM matrix, as well as aggregates within the fracture surface of EPDM/HNT nanocomposites.

Keywords: aspect ratio, halloysite nanotubes (HNTs), mechanical properties, rubber/clay nanocomposites

Procedia PDF Downloads 370
24808 Optimized Approach for Secure Data Sharing in Distributed Database

Authors: Ahmed Mateen, Zhu Qingsheng, Ahmad Bilal

Abstract:

In the current age of technology, information is the most precious asset of a company. Today, companies have a large amount of data. As the data become larger, access to data for some particular information is becoming slower day by day. Faster data processing to shape it in the form of information is the biggest issue. The major problems in distributed databases are the efficiency of data distribution and response time of data distribution. The security of data distribution is also a big issue. For these problems, we proposed a strategy that can maximize the efficiency of data distribution and also increase its response time. This technique gives better results for secure data distribution from multiple heterogeneous sources. The newly proposed technique facilitates the companies for secure data sharing efficiently and quickly.

Keywords: ER-schema, electronic record, P2P framework, API, query formulation

Procedia PDF Downloads 330
24807 Assessment of Heavy Metal Contamination for the Sustainable Management of Vulnerable Mangrove Ecosystem, the Sundarbans

Authors: S. Begum, T. Biswas, M. A. Islam

Abstract:

The present research investigates the distribution and contamination of heavy metals in core sediments collected from three locations of the Sundarbans mangrove forest. In this research, quality of the analysis is evaluated by analyzing certified reference materials IAEA-SL-1 (lake sediment), IAEA-Soil-7, and NIST-1633b (coal fly ash). Total concentrations of 28 heavy metals (Na, Al, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Zn, Ga, As, Sb, Cs, La, Ce, Sm, Eu, Tb, Dy, Ho, Yb, Hf, Ta, Th, and U) have determined in core sediments of the Sundarbans mangrove by neutron activation analysis (NAA) technique. When compared with upper continental crustal (UCC) values, it is observed that mean concentrations of K, Ti, Zn, Cs, La, Ce, Sm, Hf, and Th show elevated values in the research area is high. In this research, the assessments of metal contamination levels using different environmental contamination indices (EF, Igeo, CF) indicate that Ti, Sb, Cs, REEs, and Th have minor enrichment of the sediments of the Sundarbans. The modified degree of contamination (mCd) of studied samples of the Sundarbans ecosystem show low contamination. The pollution load index (PLI) values for the cores suggested that sampling points are moderately polluted. The possible sources of the deterioration of the sediment quality can be attributed to the different chemical carrying cargo accidents, port activities, ship breaking, agricultural and aquaculture run-off of the area. Pearson correlation matrix (PCM) established relationships among elements. The PCM indicates that most of the metal's distributions have been controlled by the same factors such as Fe-oxy-hydroxides and clay minerals, and also they have a similar origin. The poor correlations of Ca with most of the elements in the sediment cores indicate that calcium carbonate has a less significant role in this mangrove sediment. Finally, the data from this research will be used as a benchmark for future research and help to quantify levels of metal pollutions, as well as to manage future ecological risks of the vulnerable mangrove ecosystem, the Sundarbans.

Keywords: contamination, core sediment, trace element, sundarbans, vulnerable

Procedia PDF Downloads 117
24806 Clinical Parameters Response to Low Level Laser Versus Monochromatic Near Infrared Photo Energy in Diabetic Patient with Peripheral Neuropathy

Authors: Abeer Ahmed Abdehameed

Abstract:

Background: Diabetic sensorimotor polyneuropathy (DSP) is one of the most common micro vascular complications of type 2 diabetes. Loss of sensation is thought to contribute to lake of static and dynamic stability and increased risk of falling. Purpose: The purpose of this study was to compare the effects of low level laser (LLL) and monochromatic near infrared photo energy (MIRE) on pain , cutaneous sensation, static stability and index of lower limb blood flow in diabetic with peripheral neuropathy. Methods: Forty subjects with diabetic peripheral neuropathy were recruited for study. They were divided into two groups: The ( MIRE) group that included (20) patients and (LLL) group included (20) patients. All patients in the study had been subjected to various physical assessment procedures including pain, cutaneous sensation, Doppler flow meter and static stability assessments. The baseline measurements were followed by treatment sessions that conducted twice a week for 6 successive weeks. Results: The statistical analysis of the data had revealed significant improvement of the pain in both groups, with significant improvement in cutaneous sensation and static balance in (MIRE) group compared to (LLL) group; on the other hand results showed no significant differences on lower limb blood flow in both groups. Conclusion: Low level laser and monochromatic near infrared therapy can improve painful symptoms in patients with diabetic neuropathy. On the other hand (MIRE) is useful in improving cutaneous sensation and static stability in patients with diabetic neuropathy.

Keywords: diabetic neuropathy, doppler flow meter, low level laser, monochromatic near infrared photo energy

Procedia PDF Downloads 312
24805 Climate Change Impacts, Vulnerability, and Adaptation among Rural Households in Ethiopia

Authors: Birtukan Atinkut Asmare

Abstract:

Climate change disproportionately affects many Africans who heavily rely on climate-exposed sectors such as rain-fed agriculture and fishing, rendering them highly vulnerable. Gender plays a significant role, as men and women experience unequal impacts and vulnerabilities due to gender norms, labor divisions, resource access, and power dynamics. Drawing on an integrated framework, this study sheds light on the gendered impacts of climate change on household’s livelihood, their vulnerability, and adaptation in rural Ethiopia's Lake Tana Basin. This study utilized mixed research methods, integrating diverse qualitative techniques such as focus group discussions, key informant interviews, and field observations, along with quantitative data gathered through household surveys. The findings reveal that women-headed households were more vulnerable to climate change than male-headed households. Flood was the major climate-induced hazards in the area that threatened the lives and livelihoods of households. In response to climate change, households undertook different adaptation measures such as agroforestry practices, crop diversification, seasonal migration, petty trading, charcoal and fuel wood sales. However, the adaptation strategies were slightly varied based on the gender of the household head. Women-headed households specifically engaged in fuelwood collection and selling and petty trading activities. The main constraints for adaptation were limited access to technologies, extension services, information, and financial services. Therefore, this research urges attention from research, policy, and advisory services on rural households who are trying to survive in the face of climate change.

Keywords: agriculture, climate change impacts, ethiopia, gender

Procedia PDF Downloads 56
24804 Estimation of Ribb Dam Catchment Sediment Yield and Reservoir Effective Life Using Soil and Water Assessment Tool Model and Empirical Methods

Authors: Getalem E. Haylia

Abstract:

The Ribb dam is one of the irrigation projects in the Upper Blue Nile basin, Ethiopia, to irrigate the Fogera plain. Reservoir sedimentation is a major problem because it reduces the useful reservoir capacity by the accumulation of sediments coming from the watersheds. Estimates of sediment yield are needed for studies of reservoir sedimentation and planning of soil and water conservation measures. The objective of this study was to simulate the Ribb dam catchment sediment yield using SWAT model and to estimate Ribb reservoir effective life according to trap efficiency methods. The Ribb dam catchment is found in North Western part of Ethiopia highlands, and it belongs to the upper Blue Nile and Lake Tana basins. Soil and Water Assessment Tool (SWAT) was selected to simulate flow and sediment yield in the Ribb dam catchment. The model sensitivity, calibration, and validation analysis at Ambo Bahir site were performed with Sequential Uncertainty Fitting (SUFI-2). The flow data at this site was obtained by transforming the Lower Ribb gauge station (2002-2013) flow data using Area Ratio Method. The sediment load was derived based on the sediment concentration yield curve of Ambo site. Stream flow results showed that the Nash-Sutcliffe efficiency coefficient (NSE) was 0.81 and the coefficient of determination (R²) was 0.86 in calibration period (2004-2010) and, 0.74 and 0.77 in validation period (2011-2013), respectively. Using the same periods, the NS and R² for the sediment load calibration were 0.85 and 0.79 and, for the validation, it became 0.83 and 0.78, respectively. The simulated average daily flow rate and sediment yield generated from Ribb dam watershed were 3.38 m³/s and 1772.96 tons/km²/yr, respectively. The effective life of Ribb reservoir was estimated using the developed empirical methods of the Brune (1953), Churchill (1948) and Brown (1958) methods and found to be 30, 38 and 29 years respectively. To conclude, massive sediment comes from the steep slope agricultural areas, and approximately 98-100% of this incoming annual sediment loads have been trapped by the Ribb reservoir. In Ribb catchment, as well as reservoir systematic and thorough consideration of technical, social, environmental, and catchment managements and practices should be made to lengthen the useful life of Ribb reservoir.

Keywords: catchment, reservoir effective life, reservoir sedimentation, Ribb, sediment yield, SWAT model

Procedia PDF Downloads 183
24803 Data Mining Algorithms Analysis: Case Study of Price Predictions of Lands

Authors: Julio Albuja, David Zaldumbide

Abstract:

Data analysis is an important step before taking a decision about money. The aim of this work is to analyze the factors that influence the final price of the houses through data mining algorithms. To our best knowledge, previous work was researched just to compare results. Furthermore, before using the data of the data set, the Z-Transformation were used to standardize the data in the same range. Hence, the data was classified into two groups to visualize them in a readability format. A decision tree was built, and graphical data is displayed where clearly is easy to see the results and the factors' influence in these graphics. The definitions of these methods are described, as well as the descriptions of the results. Finally, conclusions and recommendations are presented related to the released results that our research showed making it easier to apply these algorithms using a customized data set.

Keywords: algorithms, data, decision tree, transformation

Procedia PDF Downloads 370
24802 Integrating Personality Traits and Travel Motivations for Enhanced Small and Medium-sized Tourism Enterprises (SMEs) Strategies: A Case Study of Cumbria, United Kingdom

Authors: Delia Gabriela Moisa, Demos Parapanos, Tim Heap

Abstract:

The tourism sector is mainly comprised of small and medium-sized tourism enterprises (SMEs), representing approximately 80% of global businesses in this field. These entities require focused attention and support to address challenges, ensuring their competitiveness and relevance in a dynamic industry characterized by continuously changing customer preferences. To address these challenges, it becomes imperative to consider not only socio-demographic factors but also delve into the intricate interplay of psychological elements influencing consumer behavior. This study investigates the impact of personality traits and travel motivations on visitor activities in Cumbria, United Kingdom, an iconic region marked by UNESCO World Heritage Sites, including The Lake District National Park and Hadrian's Wall. With a £4.1 billion tourism industry primarily driven by SMEs, Cumbria serves as an ideal setting for examining the relationship between tourist psychology and activities. Employing the Big Five personality model and the Travel Career Pattern motivation theory, this study aims to explain the relationship between psychological factors and tourist activities. The study further explores SME perspectives on personality-based market segmentation, providing strategic insights into addressing evolving tourist preferences.This pioneering mixed-methods study integrates quantitative data from 330 visitor surveys, subsequently complemented by qualitative insights from tourism SME representatives. The findings unveil that socio-demographic factors do not exhibit statistically significant variations in the activities pursued by visitors in Cumbria. However, significant correlations emerge between personality traits and motivations with preferred visitor activities. Open-minded tourists gravitate towards events and cultural activities, while Conscientious individuals favor cultural pursuits. Extraverted tourists lean towards adventurous, recreational, and wellness activities, while Agreeable personalities opt for lake cruises. Interestingly, a contrasting trend emerges as Extraversion increases, leading to a decrease in interest in cultural activities. Similarly, heightened Agreeableness corresponds to a decrease in interest in adventurous activities. Furthermore, travel motivations, including nostalgia and building relationships, drive event participation, while self-improvement and novelty-seeking lead to adventurous activities. Additionally, qualitative insights from tourism SME representatives underscore the value of targeted messaging aligned with visitor personalities for enhancing loyalty and experiences. This study contributes significantly to scholarship through its novel framework, integrating tourist psychology with activities and industry perspectives. The proposed conceptual model holds substantial practical implications for SMEs to formulate personalized offerings, optimize marketing, and strategically allocate resources tailored to tourist personalities. While the focus is on Cumbria, the methodology's universal applicability offers valuable insights for destinations globally seeking a competitive advantage. Future research addressing scale reliability and geographic specificity limitations can further advance knowledge on this critical relationship between visitor psychology, individual preferences, and industry imperatives. Moreover, by extending the investigation to other districts, future studies could draw comparisons and contrasts in the results, providing a more nuanced understanding of the factors influencing visitor psychology and preferences.

Keywords: personality trait, SME, tourist behaviour, tourist motivation, visitor activity

Procedia PDF Downloads 66
24801 Application of Blockchain Technology in Geological Field

Authors: Mengdi Zhang, Zhenji Gao, Ning Kang, Rongmei Liu

Abstract:

Management and application of geological big data is an important part of China's national big data strategy. With the implementation of a national big data strategy, geological big data management becomes more and more critical. At present, there are still a lot of technology barriers as well as cognition chaos in many aspects of geological big data management and application, such as data sharing, intellectual property protection, and application technology. Therefore, it’s a key task to make better use of new technologies for deeper delving and wider application of geological big data. In this paper, we briefly introduce the basic principle of blockchain technology at the beginning and then make an analysis of the application dilemma of geological data. Based on the current analysis, we bring forward some feasible patterns and scenarios for the blockchain application in geological big data and put forward serval suggestions for future work in geological big data management.

Keywords: blockchain, intellectual property protection, geological data, big data management

Procedia PDF Downloads 83
24800 Frequent Item Set Mining for Big Data Using MapReduce Framework

Authors: Tamanna Jethava, Rahul Joshi

Abstract:

Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.

Keywords: frequent item set mining, big data, Hadoop, MapReduce

Procedia PDF Downloads 429
24799 The Role Of Data Gathering In NGOs

Authors: Hussaini Garba Mohammed

Abstract:

Background/Significance: The lack of data gathering is affecting NGOs world-wide in general to have good data information about educational and health related issues among communities in any country and around the world. For example, HIV/AIDS smoking (Tuberculosis diseases) and COVID-19 virus carriers is becoming a serious public health problem, especially among old men and women. But there is no full details data survey assessment from communities, villages, and rural area in some countries to show the percentage of victims and patients, especial with this world COVID-19 virus among the people. These data are essential to inform programming targets, strategies, and priorities in getting good information about data gathering in any society.

Keywords: reliable information, data assessment, data mining, data communication

Procedia PDF Downloads 176
24798 Pricing Techniques to Mitigate Recurring Congestion on Interstate Facilities Using Dynamic Feedback Assignment

Authors: Hatem Abou-Senna

Abstract:

Interstate 4 (I-4) is a primary east-west transportation corridor between Tampa and Daytona cities, serving commuters, commercial and recreational traffic. I-4 is known to have severe recurring congestion during peak hours. The congestion spans about 11 miles in the evening peak period in the central corridor area as it is considered the only non-tolled limited access facility connecting the Orlando Central Business District (CBD) and the tourist attractions area (Walt Disney World). Florida officials had been skeptical of tolling I-4 prior to the recent legislation, and the public through the media had been complaining about the excessive toll facilities in Central Florida. So, in search for plausible mitigation to the congestion on the I-4 corridor, this research is implemented to evaluate the effectiveness of different toll pricing alternatives that might divert traffic from I-4 to the toll facilities during the peak period. The network is composed of two main diverging limited access highways, freeway (I-4) and toll road (SR 417) in addition to two east-west parallel toll roads SR 408 and SR 528, intersecting the above-mentioned highways from both ends. I-4 and toll road SR 408 are the most frequently used route by commuters. SR-417 is a relatively uncongested toll road with 15 miles longer than I-4 and $5 tolls compared to no monetary cost on 1-4 for the same trip. The results of the calibrated Orlando PARAMICS network showed that percentages of route diversion vary from one route to another and depends primarily on the travel cost between specific origin-destination (O-D) pairs. Most drivers going from Disney (O1) or Lake Buena Vista (O2) to Lake Mary (D1) were found to have a high propensity towards using I-4, even when eliminating tolls and/or providing real-time information. However, a diversion from I-4 to SR 417 for these OD pairs occurred only in the cases of the incident and lane closure on I-4, due to the increase in delay and travel costs, and when information is provided to travelers. Furthermore, drivers that diverted from I-4 to SR 417 and SR 528 did not gain significant travel-time savings. This was attributed to the limited extra capacity of the alternative routes in the peak period and the longer traveling distance. When the remaining origin-destination pairs were analyzed, average travel time savings on I-4 ranged between 10 and 16% amounting to 10 minutes at the most with a 10% increase in the network average speed. High propensity of diversion on the network increased significantly when eliminating tolls on SR 417 and SR 528 while doubling the tolls on SR 408 along with the incident and lane closure scenarios on I-4 and with real-time information provided. The toll roads were found to be a viable alternative to I-4 for these specific OD pairs depending on the user perception of the toll cost which was reflected in their specific travel times. However, on the macroscopic level, it was concluded that route diversion through toll reduction or elimination on surrounding toll roads would only have a minimum impact on reducing I-4 congestion during the peak period.

Keywords: congestion pricing, dynamic feedback assignment, microsimulation, paramics, route diversion

Procedia PDF Downloads 176
24797 The Application of Data Mining Technology in Building Energy Consumption Data Analysis

Authors: Liang Zhao, Jili Zhang, Chongquan Zhong

Abstract:

Energy consumption data, in particular those involving public buildings, are impacted by many factors: the building structure, climate/environmental parameters, construction, system operating condition, and user behavior patterns. Traditional methods for data analysis are insufficient. This paper delves into the data mining technology to determine its application in the analysis of building energy consumption data including energy consumption prediction, fault diagnosis, and optimal operation. Recent literature are reviewed and summarized, the problems faced by data mining technology in the area of energy consumption data analysis are enumerated, and research points for future studies are given.

Keywords: data mining, data analysis, prediction, optimization, building operational performance

Procedia PDF Downloads 848
24796 To Handle Data-Driven Software Development Projects Effectively

Authors: Shahnewaz Khan

Abstract:

Machine learning (ML) techniques are often used in projects for creating data-driven applications. These tasks typically demand additional research and analysis. The proper technique and strategy must be chosen to ensure the success of data-driven projects. Otherwise, even exerting a lot of effort, the necessary development might not always be possible. In this post, an effort to examine the workflow of data-driven software development projects and its implementation process in order to describe how to manage a project successfully. Which will assist in minimizing the added workload.

Keywords: data, data-driven projects, data science, NLP, software project

Procedia PDF Downloads 78
24795 Rare Earth Element (REE) Geochemistry of Tepeköy Sandstones (Central Anatolia, Turkey)

Authors: Mehmet Yavuz Hüseyinca, Şuayip Küpeli

Abstract:

Sandstones from Upper Eocene - Oligocene Tepeköy formation (Member of Mezgit Group) that exposed on the eastern edge of Tuz Gölü (Salt Lake) were analyzed for their rare earth element (REE) contents. Average concentrations of ΣREE, ΣLREE (Total light rare earth elements) and ΣHREE (Total heavy rare earth elements) were determined as 31.37, 26.47 and 4.55 ppm respectively. These values are lower than UCC (Upper continental crust) which indicates grain size and/or CaO dilution effect. The chondrite-normalized REE pattern is characterized by the average ratios of (La/Yb)cn = 6.20, (La/Sm)cn = 4.06, (Gd/Lu)cn = 1.10, Eu/Eu* = 0.99 and Ce/Ce* = 0.94. Lower values of ΣLREE/ΣHREE (Average 5.97) and (La/Yb)cn suggest lower fractionation of overall REE. Moreover (La/Sm)cn and (Gd/Lu)cn ratios define less inclined LREE and almost flat HREE pattern when compared with UCC. Almost no Ce anomaly (Ce/Ce*) emphasizes that REE were originated from terrigenous material. Also depleted LREE and no Eu anomaly (Eu/Eu*) suggest an undifferentiated mafic provenance for the sandstones.

Keywords: central Anatolia, provenance, rare earth elements, REE, Tepeköy sandstone

Procedia PDF Downloads 471
24794 The Relationship Between Artificial Intelligence, Data Science, and Privacy

Authors: M. Naidoo

Abstract:

Artificial intelligence often requires large amounts of good quality data. Within important fields, such as healthcare, the training of AI systems predominately relies on health and personal data; however, the usage of this data is complicated by various layers of law and ethics that seek to protect individuals’ privacy rights. This research seeks to establish the challenges AI and data sciences pose to (i) informational rights, (ii) privacy rights, and (iii) data protection. To solve some of the issues presented, various methods are suggested, such as embedding values in technological development, proper balancing of rights and interests, and others.

Keywords: artificial intelligence, data science, law, policy

Procedia PDF Downloads 102
24793 Simulation Data Summarization Based on Spatial Histograms

Authors: Jing Zhao, Yoshiharu Ishikawa, Chuan Xiao, Kento Sugiura

Abstract:

In order to analyze large-scale scientific data, research on data exploration and visualization has gained popularity. In this paper, we focus on the exploration and visualization of scientific simulation data, and define a spatial V-Optimal histogram for data summarization. We propose histogram construction algorithms based on a general binary hierarchical partitioning as well as a more specific one, the l-grid partitioning. For effective data summarization and efficient data visualization in scientific data analysis, we propose an optimal algorithm as well as a heuristic algorithm for histogram construction. To verify the effectiveness and efficiency of the proposed methods, we conduct experiments on the massive evacuation simulation data.

Keywords: simulation data, data summarization, spatial histograms, exploration, visualization

Procedia PDF Downloads 173
24792 Economic Assessment of the Fish Solar Tent Dryers

Authors: Collen Kawiya

Abstract:

In an effort of reducing post-harvest losses and improving the supply of quality fish products in Malawi, the fish solar tent dryers have been designed in the southern part of Lake Malawi for processing small fish species under the project of Cultivate Africa’s Future (CultiAF). This study was done to promote the adoption of the fish solar tent dryers by the many small scale fish processors in Malawi through the assessment of the economic viability of these dryers. With the use of the project’s baseline survey data, a business model for a constructed ‘ready for use’ solar tent dryer was developed where investment appraisal techniques were calculated in addition with the sensitivity analysis. The study also conducted a risk analysis through the use of the Monte Carlo simulation technique and a probabilistic net present value was found. The investment appraisal results showed that the net present value was US$8,756.85, the internal rate of return was 62% higher than the 16.32% cost of capital and the payback period was 1.64 years. The sensitivity analysis results showed that only two input variables influenced the fish solar dryer investment’s net present value. These are the dried fish selling prices that were correlating positively with the net present value and the fresh fish buying prices that were negatively correlating with the net present value. Risk analysis results showed that the chances that fish processors will make a loss from this type of investment are 17.56%. It was also observed that there exist only a 0.20 probability of experiencing a negative net present value from this type of investment. Lastly, the study found that the net present value of the fish solar tent dryer’s investment is still robust in spite of any changes in the levels of investors risk preferences. With these results, it is concluded that the fish solar tent dryers in Malawi are an economically viable investment because they are able to improve the returns in the fish processing activity. As such, fish processors need to adopt them by investing their money to construct and use them.

Keywords: investment appraisal, risk analysis, sensitivity analysis, solar tent drying

Procedia PDF Downloads 273
24791 Research Trends in Using Virtual Reality for the Analysis and Treatment of Lower-Limb Musculoskeletal Injury of Athletes: A Literature Review

Authors: Hannah K. M. Tang, Muhammad Ateeq, Mark J. Lake, Badr Abdullah, Frederic A. Bezombes

Abstract:

There is little research applying virtual reality (VR) to the treatment of musculoskeletal injury in athletes. This is despite their prevalence, and the implications for physical and psychological health. Nevertheless, developments of wireless VR headsets better facilitate dynamic movement in VR environments (VREs), and more research is expected in this emerging field. This systematic review identified publications that used VR interventions for the analysis or treatment of lower-limb musculoskeletal injury of athletes. It established a search protocol, and through narrative discussion, identified existing trends. Database searches encompassed four term sets: 1) VR systems; 2) musculoskeletal injuries; 3) sporting population; 4) movement outcome analysis. Overall, a total of 126 publications were identified through database searching, and twelve were included in the final analysis and discussion. Many of the studies were pilot and proof of concept work. Seven of the twelve publications were observational studies. However, this may provide preliminary data from which clinical trials will branch. If specified, the focus of the literature was very narrow, with very similar population demographics and injuries. The trends in the literature findings emphasised the role of VR and attentional focus, the strategic manipulation of movement outcomes, and the transfer of skill to the real-world. Causal inferences may have been undermined by flaws, as most studies were limited by the practicality of conducting a two-factor clinical-VR-based study. In conclusion, by assessing the exploratory studies, and combining this with the use of numerous developments, techniques, and tools, a novel application could be established to utilise VR with dynamic movement, for the effective treatment of specific musculoskeletal injuries of athletes.

Keywords: athletes, lower-limb musculoskeletal injury, rehabilitation, return-to-sport, virtual reality

Procedia PDF Downloads 227
24790 Algorithms used in Spatial Data Mining GIS

Authors: Vahid Bairami Rad

Abstract:

Extracting knowledge from spatial data like GIS data is important to reduce the data and extract information. Therefore, the development of new techniques and tools that support the human in transforming data into useful knowledge has been the focus of the relatively new and interdisciplinary research area ‘knowledge discovery in databases’. Thus, we introduce a set of database primitives or basic operations for spatial data mining which are sufficient to express most of the spatial data mining algorithms from the literature. This approach has several advantages. Similar to the relational standard language SQL, the use of standard primitives will speed-up the development of new data mining algorithms and will also make them more portable. We introduced a database-oriented framework for spatial data mining which is based on the concepts of neighborhood graphs and paths. A small set of basic operations on these graphs and paths were defined as database primitives for spatial data mining. Furthermore, techniques to efficiently support the database primitives by a commercial DBMS were presented.

Keywords: spatial data base, knowledge discovery database, data mining, spatial relationship, predictive data mining

Procedia PDF Downloads 457
24789 Data Stream Association Rule Mining with Cloud Computing

Authors: B. Suraj Aravind, M. H. M. Krishna Prasad

Abstract:

There exist emerging applications of data streams that require association rule mining, such as network traffic monitoring, web click streams analysis, sensor data, data from satellites etc. Data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. This paper proposes to introduce an improved data stream association rule mining algorithm by eliminating the limitation of resources. For this, the concept of cloud computing is used. Inclusion of this may lead to additional unknown problems which needs further research.

Keywords: data stream, association rule mining, cloud computing, frequent itemsets

Procedia PDF Downloads 497
24788 A Comprehensive Survey and Improvement to Existing Privacy Preserving Data Mining Techniques

Authors: Tosin Ige

Abstract:

Ethics must be a condition of the world, like logic. (Ludwig Wittgenstein, 1889-1951). As important as data mining is, it possess a significant threat to ethics, privacy, and legality, since data mining makes it difficult for an individual or consumer (in the case of a company) to control the accessibility and usage of his data. This research focuses on Current issues and the latest research and development on Privacy preserving data mining methods as at year 2022. It also discusses some advances in those techniques while at the same time highlighting and providing a new technique as a solution to an existing technique of privacy preserving data mining methods. This paper also bridges the wide gap between Data mining and the Web Application Programing Interface (web API), where research is urgently needed for an added layer of security in data mining while at the same time introducing a seamless and more efficient way of data mining.

Keywords: data, privacy, data mining, association rule, privacy preserving, mining technique

Procedia PDF Downloads 165
24787 Big Data: Concepts, Technologies and Applications in the Public Sector

Authors: A. Alexandru, C. A. Alexandru, D. Coardos, E. Tudora

Abstract:

Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data types, (2) accumulation speed, and (3) data volume. This paper presents the main concepts related to the BD paradigm, and introduces architectures and technologies for BD and BD sets. The integration of BD with the Hadoop Framework is also underlined. BD has attracted a lot of attention in the public sector due to the newly emerging technologies that allow the availability of network access. The volume of different types of data has exponentially increased. Some applications of BD in the public sector in Romania are briefly presented.

Keywords: big data, big data analytics, Hadoop, cloud

Procedia PDF Downloads 304