Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 26547

Search results for: data infrastructure

25347 Strategic Citizen Participation in Applied Planning Investigations: How Planners Use Etic and Emic Community Input Perspectives to Fill-in the Gaps in Their Analysis

Abstract:

Planners regularly use citizen input as empirical data to help them better understand community issues they know very little about. This type of community data is based on the lived experiences of local residents and is known as "emic" data. What is becoming more common practice for planners is their use of data from local experts and stakeholders (known as "etic" data or the outsider perspective) to help them fill in the gaps in their analysis of applied planning research projects. Utilizing international Health Impact Assessment (HIA) data, I look at who planners invite to their citizen input investigations. Research presented in this paper shows that planners access a wide range of emic and etic community perspectives in their search for the “community’s view.” The paper concludes with how planners can chart out a new empirical path in their execution of emic/etic citizen participation strategies in their applied planning research projects.

Keywords: citizen participation, emic data, etic data, Health Impact Assessment (HIA)

Procedia PDF Downloads 488

25346 The Analysis of Internet and Social Media Behaviors of the Students in Vocational High School

Authors: Mehmet Balci, Sakir Tasdemir, Mustafa Altin, Ozlem Bozok

Abstract:

Our globalizing world has become almost a small village and everyone can access any information at any time. Everyone lets each other know who does whatever in which place. We can learn which social events occur in which place in the world. From the perspective of education, the course notes that a lecturer use in lessons in a university in any state of America can be examined by a student studying in a city of Africa or the Far East. This dizzying communication we have mentioned happened thanks to fast developments in computer technologies and in parallel with this, internet technology. While these developments in the world, has a very large young population and a rapidly evolving electronic communications infrastructure Turkey has been affected by this situation. Researches has shown that almost all young people in Turkey has an account in a social network. Especially becoming common of mobile devices causes data traffic in social networks to increase. In this study, has been surveyed on students in the different age groups and at the Selcuk University Vocational School of Technical Sciences Department of Computer Technology. Student’s opinions about the use of internet and social media has been gotten. Using the Internet and social media skills, purposes, operating frequency, access facilities and tools, social life and effects on vocational education etc. have been explored. Both internet and use of social media positive and negative effects on this department students results have been obtained by the obtained findings evaluating from various aspects. Relations and differences have been found out with statistic.

Keywords: computer technologies, internet use, social network, higher vocational school

Procedia PDF Downloads 546

25345 Data Augmentation for Automatic Graphical User Interface Generation Based on Generative Adversarial Network

Authors: Xulu Yao, Moi Hoon Yap, Yanlong Zhang

Abstract:

As a branch of artificial neural network, deep learning is widely used in the field of image recognition, but the lack of its dataset leads to imperfect model learning. By analysing the data scale requirements of deep learning and aiming at the application in GUI generation, it is found that the collection of GUI dataset is a time-consuming and labor-consuming project, which is difficult to meet the needs of current deep learning network. To solve this problem, this paper proposes a semi-supervised deep learning model that relies on the original small-scale datasets to produce a large number of reliable data sets. By combining the cyclic neural network with the generated countermeasure network, the cyclic neural network can learn the sequence relationship and characteristics of data, make the generated countermeasure network generate reasonable data, and then expand the Rico dataset. Relying on the network structure, the characteristics of collected data can be well analysed, and a large number of reasonable data can be generated according to these characteristics. After data processing, a reliable dataset for model training can be formed, which alleviates the problem of dataset shortage in deep learning.

Keywords: GUI, deep learning, GAN, data augmentation

Procedia PDF Downloads 188

25344 Modelling Rainfall-Induced Shallow Landslides in the Northern New South Wales

Authors: S. Ravindran, Y.Liu, I. Gratchev, D.Jeng

Abstract:

Rainfall-induced shallow landslides are more common in the northern New South Wales (NSW), Australia. From 2009 to 2017, around 105 rainfall-induced landslides occurred along the road corridors and caused temporary road closures in the northern NSW. Rainfall causing shallow landslides has different distributions of rainfall varying from uniform, normal, decreasing to increasing rainfall intensity. The duration of rainfall varied from one day to 18 days according to historical data. The objective of this research is to analyse slope instability of some of the sites in the northern NSW by varying cumulative rainfall using SLOPE/W and SEEP/W and compare with field data of rainfall causing shallow landslides. The rainfall data and topographical data from public authorities and soil data obtained from laboratory tests will be used for this modelling. There is a likelihood of shallow landslides if the cumulative rainfall is between 100 mm to 400 mm in accordance with field data.

Keywords: landslides, modelling, rainfall, suction

Procedia PDF Downloads 188

25343 Machine Learning-Enabled Classification of Climbing Using Small Data

Authors: Nicholas Milburn, Yu Liang, Dalei Wu

Abstract:

Athlete performance scoring within the climbing do-main presents interesting challenges as the sport does not have an objective way to assign skill. Assessing skill levels within any sport is valuable as it can be used to mark progress while training, and it can help an athlete choose appropriate climbs to attempt. Machine learning-based methods are popular for complex problems like this. The dataset available was composed of dynamic force data recorded during climbing; however, this dataset came with challenges such as data scarcity, imbalance, and it was temporally heterogeneous. Investigated solutions to these challenges include data augmentation, temporal normalization, conversion of time series to the spectral domain, and cross validation strategies. The investigated solutions to the classification problem included light weight machine classifiers KNN and SVM as well as the deep learning with CNN. The best performing model had an 80% accuracy. In conclusion, there seems to be enough information within climbing force data to accurately categorize climbers by skill.

Keywords: classification, climbing, data imbalance, data scarcity, machine learning, time sequence

Procedia PDF Downloads 147

25342 Analysis of Expression Data Using Unsupervised Techniques

Authors: M. A. I Perera, C. R. Wijesinghe, A. R. Weerasinghe

Abstract:

his study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Feature selection is important since the genomic data are high dimensional with a large number of features compared to samples. Hierarchical clustering and K Means are often used in the analysis of gene expression data. There are several cluster validation techniques used in validating the clusters. Heatmaps are an effective external validation method that allows comparing the identified classes with clinical variables and visual analysis of the classes.

Keywords: cancer subtypes, gene expression data analysis, clustering, cluster validation

Procedia PDF Downloads 152

25341 Decentralised Edge Authentication in the Industrial Enterprise IoT Space

Authors: C. P. Autry, A.W. Roscoe

Abstract:

Authentication protocols based on public key infrastructure (PKI) and trusted third party (TTP) are no longer adequate for industrial scale IoT networks thanks to issues such as low compute and power availability, the use of widely distributed and commercial off-the-shelf (COTS) systems, and the increasingly sophisticated attackers and attacks we now have to counter. For example, there is increasing concern about nation-state-based interference and future quantum computing capability. We have examined this space from first principles and have developed several approaches to group and point-to-point authentication for IoT that do not depend on the use of a centralised client-server model. We emphasise the use of quantum resistant primitives such as strong cryptographic hashing and the use multi-factor authentication.

Keywords: authentication, enterprise IoT cybersecurity, PKI/TTP, IoT space

Procedia PDF Downloads 177

25340 Learning Analytics in a HiFlex Learning Environment

Authors: Matthew Montebello

Abstract:

Student engagement within a virtual learning environment generates masses of data points that can significantly contribute to the learning analytics that lead to decision support. Ideally, similar data is collected during student interaction with a physical learning space, and as a consequence, data is present at a large scale, even in relatively small classes. In this paper, we report of such an occurrence during classes held in a HiFlex modality as we investigate the advantages of adopting such a methodology. We plan to take full advantage of the learner-generated data in an attempt to further enhance the effectiveness of the adopted learning environment. This could shed crucial light on operating modalities that higher education institutions around the world will switch to in a post-COVID era.

Keywords: HiFlex, big data in higher education, learning analytics, virtual learning environment

Procedia PDF Downloads 206

25339 Simulation Programs to Education of Crisis Management Members

Authors: Jiri Barta

Abstract:

This paper deals with a simulation programs and technologies using in the educational process for members of the crisis management. Risk analysis, simulation, preparation and planning are among the main activities of workers of crisis management. Made correctly simulation of emergency defines the extent of the danger. On this basis, it is possible to effectively prepare and plan measures to minimize damage. The paper is focused on simulation programs that are trained at the University of Defence. Implementation of the outputs from simulation programs in decision-making processes of crisis staffs is one of the main tasks of the research project.

Keywords: crisis management, continuity, critical infrastructure, dangerous substance, education, flood, simulation programs

Procedia PDF Downloads 468

25338 Prototype of Over Dimension Over Loading (ODOL) Freight Transportation Monitoring System Based on Arduino Mega 'Sabrang': A Case Study in Klaten, Indonesia

Authors: Chairul Fajar, Muhammad Nur Hidayat, Muksalmina

Abstract:

The issue of Over Dimension Over Loading (ODOL) in Indonesia remains a significant challenge, causing traffic accidents, disrupting traffic flow, accelerating road damage, and potentially leading to bridge collapses. Klaten Regency, located on the slopes of Mount Merapi along the Woro River in Kemalang District, has potential Class C excavation materials such as sand and stone. Data from the Klaten Regency Transportation Department indicates that ODOL violations account for 72%, while non-violating vehicles make up only 28%. ODOL involves modifying factory-standard vehicles beyond the limits specified in the Type Test Registration Certificate (SRUT) to save costs and travel time. This study aims to develop a prototype ‘Sabrang’ monitoring system based on Arduino Mega to control and monitor ODOL freight transportation in the mining of Class C excavation materials in Klaten Regency. The prototype is designed to automatically measure the dimensions and weight of objects using a microcontroller. The data analysis techniques used in this study include the Normality Test and Paired T-Test, comparing sensor measurement results on scaled objects. The study results indicate differences in measurement validation under room temperature and ambient temperature conditions. Measurements at room temperature showed that the majority of H0 was accepted, meaning there was no significant difference in measurements when the prototype tool was used. Conversely, measurements at ambient temperature showed that the majority of H0 was rejected, indicating a significant difference in measurements when the prototype tool was used. In conclusion, the ‘Sabrang’ monitoring system prototype is effective for controlling ODOL, although measurement results are influenced by temperature conditions. This study is expected to assist in the monitoring and control of ODOL, thereby enhancing traffic safety and road infrastructure.

Keywords: over dimension over loading, prototype, microcontroller, Arduino, normality test, paired t-test

Procedia PDF Downloads 40

25337 Li-Fi Technology: Data Transmission through Visible Light

Authors: Shahzad Hassan, Kamran Saeed

Abstract:

People are always in search of Wi-Fi hotspots because Internet is a major demand nowadays. But like all other technologies, there is still room for improvement in the Wi-Fi technology with regards to the speed and quality of connectivity. In order to address these aspects, Harald Haas, a professor at the University of Edinburgh, proposed what we know as the Li-Fi (Light Fidelity). Li-Fi is a new technology in the field of wireless communication to provide connectivity within a network environment. It is a two-way mode of wireless communication using light. Basically, the data is transmitted through Light Emitting Diodes which can vary the intensity of light very fast, even faster than the blink of an eye. From the research and experiments conducted so far, it can be said that Li-Fi can increase the speed and reliability of the transfer of data. This paper pays particular attention on the assessment of the performance of this technology. In other words, it is a 5G technology which uses LED as the medium of data transfer. For coverage within the buildings, Wi-Fi is good but Li-Fi can be considered favorable in situations where large amounts of data are to be transferred in areas with electromagnetic interferences. It brings a lot of data related qualities such as efficiency, security as well as large throughputs to the table of wireless communication. All in all, it can be said that Li-Fi is going to be a future phenomenon where the presence of light will mean access to the Internet as well as speedy data transfer.

Keywords: communication, LED, Li-Fi, Wi-Fi

Procedia PDF Downloads 350

25336 Healthy Nutrition Within Institutions

Authors: Khalil Boukfoussa

Abstract:

It is important to provide students with food that contains complete nutrients to provide them with mental and physical energy during the school day. Especially since the time students spend in school is equivalent to 50% of their time during the day, which increases the importance of proper nutrition in schools and makes it an ideal way to inculcate the foundations of a healthy lifestyle and healthy eating habits. Proper nutrition is one of the most important things that affect the health and process of growth and development in children, in addition to being a key factor in supporting the ability to focus, supporting mental abilities and developing the student’s academic achievement. In addition to the importance of a healthy diet for the development and growth of the child's body, proper nutrition can significantly contribute to protecting the body from catching viruses and helping it to pass the winter safely. Effective food control systems in different countries are essential to protect the health and safety of domestic consumers. These systems are also crucial in enabling countries to ensure the safety and quality of food entering international trade and to ensure that imported food conforms to national requirements. The current global food trade environment places significant obligations on both importing and exporting countries to strengthen their food control systems and to apply and implement risk-based food control strategiesConsumers are becoming more interested in the way food is produced, processed and marketed, and are increasingly demanding that governments assume greater responsibility for consumer protection and food safety. In many countries, food control is weak because of the abundance of legislation, the multiplicity of jurisdictions and weaknesses in control, monitoring and enforcement. The following guidelines seek to advise national authorities on strategies to strengthen food control systems to protect public health, prevent fraud and fraud, avoid food contamination and help facilitate trade. These Guidelines will assist authorities in selecting the most appropriate food control system options in terms of legislation, infrastructure and enforcement mechanisms. The document clarifies the broad principles that govern food control systems and provides examples of the infrastructure and methods by which national systems can operate

Keywords: food, nutrision, school, safty

Procedia PDF Downloads 72

25335 A Study on the Measurement of Spatial Mismatch and the Influencing Factors of “Job-Housing” in Affordable Housing from the Perspective of Commuting

Authors: Daijun Chen

Abstract:

Affordable housing is subsidized by the government to meet the housing demand of low and middle-income urban residents in the process of urbanization and to alleviate the housing inequality caused by market-based housing reforms. It is a recognized fact that the living conditions of the insured have been improved while constructing the subsidized housing. However, the choice of affordable housing is mostly in the suburbs, where the surrounding urban functions and infrastructure are incomplete, resulting in the spatial mismatch of "jobs-housing" in affordable housing. The main reason for this problem is that the residents of affordable housing are more sensitive to the spatial location of their residence, but their selectivity and controllability to the housing location are relatively weak, which leads to higher commuting costs. Their real cost of living has not been effectively reduced. In this regard, 92 subsidized housing communities in Nanjing, China, are selected as the research sample in this paper. The residents of the affordable housing and their commuting Spatio-temporal behavior characteristics are identified based on the LBS (location-based service) data. Based on the spatial mismatch theory, spatial mismatch indicators such as commuting distance and commuting time are established to measure the spatial mismatch degree of subsidized housing in different districts of Nanjing. Furthermore, the geographically weighted regression model is used to analyze the influencing factors of the spatial mismatch of affordable housing in terms of the provision of employment opportunities, traffic accessibility and supporting service facilities by using spatial, functional and other multi-source Spatio-temporal big data. The results show that the spatial mismatch of affordable housing in Nanjing generally presents a "concentric circle" pattern of decreasing from the central urban area to the periphery. The factors affecting the spatial mismatch of affordable housing in different spatial zones are different. The main reasons are the number of enterprises within 1 km of the affordable housing district and the shortest distance to the subway station. And the low spatial mismatch is due to the diversity of services and facilities. Based on this, a spatial optimization strategy for different levels of spatial mismatch in subsidized housing is proposed. And feasible suggestions for the later site selection of subsidized housing are also provided. It hopes to avoid or mitigate the impact of "spatial mismatch," promote the "spatial adaptation" of "jobs-housing," and truly improve the overall welfare level of affordable housing residents.

Keywords: affordable housing, spatial mismatch, commuting characteristics, spatial adaptation, welfare benefits

Procedia PDF Downloads 116

25334 An Approach for Estimation in Hierarchical Clustered Data Applicable to Rare Diseases

Authors: Daniel C. Bonzo

Abstract:

Practical considerations lead to the use of unit of analysis within subjects, e.g., bleeding episodes or treatment-related adverse events, in rare disease settings. This is coupled with data augmentation techniques such as extrapolation to enlarge the subject base. In general, one can think about extrapolation of data as extending information and conclusions from one estimand to another estimand. This approach induces hierarchichal clustered data with varying cluster sizes. Extrapolation of clinical trial data is being accepted increasingly by regulatory agencies as a means of generating data in diverse situations during drug development process. Under certain circumstances, data can be extrapolated to a different population, a different but related indication, and different but similar product. We consider here the problem of estimation (point and interval) using a mixed-models approach under an extrapolation. It is proposed that estimators (point and interval) be constructed using weighting schemes for the clusters, e.g., equally weighted and with weights proportional to cluster size. Simulated data generated under varying scenarios are then used to evaluate the performance of this approach. In conclusion, the evaluation result showed that the approach is a useful means for improving statistical inference in rare disease settings and thus aids not only signal detection but risk-benefit evaluation as well.

Keywords: clustered data, estimand, extrapolation, mixed model

Procedia PDF Downloads 140

25333 Authorization of Commercial Communication Satellite Grounds for Promoting Turkish Data Relay System

Authors: Celal Dudak, Aslı Utku, Burak Yağlioğlu

Abstract:

Uninterrupted and continuous satellite communication through the whole orbit time is becoming more indispensable every day. Data relay systems are developed and built for various high/low data rate information exchanges like TDRSS of USA and EDRSS of Europe. In these missions, a couple of task-dedicated communication satellites exist. In this regard, for Turkey a data relay system is attempted to be defined exchanging low data rate information (i.e. TTC) for Earth-observing LEO satellites appointing commercial GEO communication satellites all over the world. First, justification of this attempt is given, demonstrating duration enhancements in the link. Discussion of preference of RF communication is, also, given instead of laser communication. Then, preferred communication GEOs – including TURKSAT4A already belonging to Turkey- are given, together with the coverage enhancements through STK simulations and the corresponding link budget. Also, a block diagram of the communication system is given on the LEO satellite.

Keywords: communication, GEO satellite, data relay system, coverage

Procedia PDF Downloads 446

25332 Data Hiding by Vector Quantization in Color Image

Authors: Yung Gi Wu

Abstract:

With the growing of computer and network, digital data can be spread to anywhere in the world quickly. In addition, digital data can also be copied or tampered easily so that the security issue becomes an important topic in the protection of digital data. Digital watermark is a method to protect the ownership of digital data. Embedding the watermark will influence the quality certainly. In this paper, Vector Quantization (VQ) is used to embed the watermark into the image to fulfill the goal of data hiding. This kind of watermarking is invisible which means that the users will not conscious the existing of embedded watermark even though the embedded image has tiny difference compared to the original image. Meanwhile, VQ needs a lot of computation burden so that we adopt a fast VQ encoding scheme by partial distortion searching (PDS) and mean approximation scheme to speed up the data hiding process. The watermarks we hide to the image could be gray, bi-level and color images. Texts are also can be regarded as watermark to embed. In order to test the robustness of the system, we adopt Photoshop to fulfill sharpen, cropping and altering to check if the extracted watermark is still recognizable. Experimental results demonstrate that the proposed system can resist the above three kinds of tampering in general cases.

Keywords: data hiding, vector quantization, watermark, color image

Procedia PDF Downloads 368

25331 Anomaly Detection in a Data Center with a Reconstruction Method Using a Multi-Autoencoders Model

Authors: Victor Breux, Jérôme Boutet, Alain Goret, Viviane Cattin

Abstract:

Early detection of anomalies in data centers is important to reduce downtimes and the costs of periodic maintenance. However, there is little research on this topic and even fewer on the fusion of sensor data for the detection of abnormal events. The goal of this paper is to propose a method for anomaly detection in data centers by combining sensor data (temperature, humidity, power) and deep learning models. The model described in the paper uses one autoencoder per sensor to reconstruct the inputs. The auto-encoders contain Long-Short Term Memory (LSTM) layers and are trained using the normal samples of the relevant sensors selected by correlation analysis. The difference signal between the input and its reconstruction is then used to classify the samples using feature extraction and a random forest classifier. The data measured by the sensors of a data center between January 2019 and May 2020 are used to train the model, while the data between June 2020 and May 2021 are used to assess it. Performances of the model are assessed a posteriori through F1-score by comparing detected anomalies with the data center’s history. The proposed model outperforms the state-of-the-art reconstruction method, which uses only one autoencoder taking multivariate sequences and detects an anomaly with a threshold on the reconstruction error, with an F1-score of 83.60% compared to 24.16%.

Keywords: anomaly detection, autoencoder, data centers, deep learning

Procedia PDF Downloads 198

25330 Integration Process and Analytic Interface of different Environmental Open Data Sets with Java/Oracle and R

Authors: Pavel H. Llamocca, Victoria Lopez

Abstract:

The main objective of our work is the comparative analysis of environmental data from Open Data bases, belonging to different governments. This means that you have to integrate data from various different sources. Nowadays, many governments have the intention of publishing thousands of data sets for people and organizations to use them. In this way, the quantity of applications based on Open Data is increasing. However each government has its own procedures to publish its data, and it causes a variety of formats of data sets because there are no international standards to specify the formats of the data sets from Open Data bases. Due to this variety of formats, we must build a data integration process that is able to put together all kind of formats. There are some software tools developed in order to give support to the integration process, e.g. Data Tamer, Data Wrangler. The problem with these tools is that they need data scientist interaction to take part in the integration process as a final step. In our case we don’t want to depend on a data scientist, because environmental data are usually similar and these processes can be automated by programming. The main idea of our tool is to build Hadoop procedures adapted to data sources per each government in order to achieve an automated integration. Our work focus in environment data like temperature, energy consumption, air quality, solar radiation, speeds of wind, etc. Since 2 years, the government of Madrid is publishing its Open Data bases relative to environment indicators in real time. In the same way, other governments have published Open Data sets relative to the environment (like Andalucia or Bilbao). But all of those data sets have different formats and our solution is able to integrate all of them, furthermore it allows the user to make and visualize some analysis over the real-time data. Once the integration task is done, all the data from any government has the same format and the analysis process can be initiated in a computational better way. So the tool presented in this work has two goals: 1. Integration process; and 2. Graphic and analytic interface. As a first approach, the integration process was developed using Java and Oracle and the graphic and analytic interface with Java (jsp). However, in order to open our software tool, as second approach, we also developed an implementation with R language as mature open source technology. R is a really powerful open source programming language that allows us to process and analyze a huge amount of data with high performance. There are also some R libraries for the building of a graphic interface like shiny. A performance comparison between both implementations was made and no significant differences were found. In addition, our work provides with an Official Real-Time Integrated Data Set about Environment Data in Spain to any developer in order that they can build their own applications.

Keywords: open data, R language, data integration, environmental data

Procedia PDF Downloads 317

25329 Transforming Data into Knowledge: Mathematical and Statistical Innovations in Data Analytics

Authors: Zahid Ullah, Atlas Khan

Abstract:

The rapid growth of data in various domains has created a pressing need for effective methods to transform this data into meaningful knowledge. In this era of big data, mathematical and statistical innovations play a crucial role in unlocking insights and facilitating informed decision-making in data analytics. This abstract aims to explore the transformative potential of these innovations and their impact on converting raw data into actionable knowledge. Drawing upon a comprehensive review of existing literature, this research investigates the cutting-edge mathematical and statistical techniques that enable the conversion of data into knowledge. By evaluating their underlying principles, strengths, and limitations, we aim to identify the most promising innovations in data analytics. To demonstrate the practical applications of these innovations, real-world datasets will be utilized through case studies or simulations. This empirical approach will showcase how mathematical and statistical innovations can extract patterns, trends, and insights from complex data, enabling evidence-based decision-making across diverse domains. Furthermore, a comparative analysis will be conducted to assess the performance, scalability, interpretability, and adaptability of different innovations. By benchmarking against established techniques, we aim to validate the effectiveness and superiority of the proposed mathematical and statistical innovations in data analytics. Ethical considerations surrounding data analytics, such as privacy, security, bias, and fairness, will be addressed throughout the research. Guidelines and best practices will be developed to ensure the responsible and ethical use of mathematical and statistical innovations in data analytics. The expected contributions of this research include advancements in mathematical and statistical sciences, improved data analysis techniques, enhanced decision-making processes, and practical implications for industries and policymakers. The outcomes will guide the adoption and implementation of mathematical and statistical innovations, empowering stakeholders to transform data into actionable knowledge and drive meaningful outcomes.

Keywords: data analytics, mathematical innovations, knowledge extraction, decision-making

Procedia PDF Downloads 77

25328 FCNN-MR: A Parallel Instance Selection Method Based on Fast Condensed Nearest Neighbor Rule

Authors: Lu Si, Jie Yu, Shasha Li, Jun Ma, Lei Luo, Qingbo Wu, Yongqi Ma, Zhengji Liu

Abstract:

Instance selection (IS) technique is used to reduce the data size to improve the performance of data mining methods. Recently, to process very large data set, several proposed methods divide the training set into some disjoint subsets and apply IS algorithms independently to each subset. In this paper, we analyze the limitation of these methods and give our viewpoint about how to divide and conquer in IS procedure. Then, based on fast condensed nearest neighbor (FCNN) rule, we propose a large data sets instance selection method with MapReduce framework. Besides ensuring the prediction accuracy and reduction rate, it has two desirable properties: First, it reduces the work load in the aggregation node; Second and most important, it produces the same result with the sequential version, which other parallel methods cannot achieve. We evaluate the performance of FCNN-MR on one small data set and two large data sets. The experimental results show that it is effective and practical.

Keywords: instance selection, data reduction, MapReduce, kNN

Procedia PDF Downloads 257

25327 A Design Framework for an Open Market Platform of Enriched Card-Based Transactional Data for Big Data Analytics and Open Banking

Authors: Trevor Toy, Josef Langerman

Abstract:

Around a quarter of the world’s data is generated by financial with an estimated 708.5 billion global non-cash transactions reached between 2018 and. And with Open Banking still a rapidly developing concept within the financial industry, there is an opportunity to create a secure mechanism for connecting its stakeholders to openly, legitimately and consensually share the data required to enable it. Integration and data sharing of anonymised transactional data are still operated in silos and centralised between the large corporate entities in the ecosystem that have the resources to do so. Smaller fintechs generating data and businesses looking to consume data are largely excluded from the process. Therefore there is a growing demand for accessible transactional data for analytical purposes and also to support the rapid global adoption of Open Banking. The following research has provided a solution framework that aims to provide a secure decentralised marketplace for 1.) data providers to list their transactional data, 2.) data consumers to find and access that data, and 3.) data subjects (the individuals making the transactions that generate the data) to manage and sell the data that relates to themselves. The platform also provides an integrated system for downstream transactional-related data from merchants, enriching the data product available to build a comprehensive view of a data subject’s spending habits. A robust and sustainable data market can be developed by providing a more accessible mechanism for data producers to monetise their data investments and encouraging data subjects to share their data through the same financial incentives. At the centre of the platform is the market mechanism that connects the data providers and their data subjects to the data consumers. This core component of the platform is developed on a decentralised blockchain contract with a market layer that manages transaction, user, pricing, payment, tagging, contract, control, and lineage features that pertain to the user interactions on the platform. One of the platform’s key features is enabling the participation and management of personal data by the individuals from whom the data is being generated. This framework developed a proof-of-concept on the Etheruem blockchain base where an individual can securely manage access to their own personal data and that individual’s identifiable relationship to the card-based transaction data provided by financial institutions. This gives data consumers access to a complete view of transactional spending behaviour in correlation to key demographic information. This platform solution can ultimately support the growth, prosperity, and development of economies, businesses, communities, and individuals by providing accessible and relevant transactional data for big data analytics and open banking.

Keywords: big data markets, open banking, blockchain, personal data management

Procedia PDF Downloads 76

25326 Experimental Evaluation of Succinct Ternary Tree

Authors: Dmitriy Kuptsov

Abstract:

Tree data structures, such as binary or in general k-ary trees, are essential in computer science. The applications of these data structures can range from data search and retrieval to sorting and ranking algorithms. Naive implementations of these data structures can consume prohibitively large volumes of random access memory limiting their applicability in certain solutions. Thus, in these cases, more advanced representation of these data structures is essential. In this paper we present the design of the compact version of ternary tree data structure and demonstrate the results for the experimental evaluation using static dictionary problem. We compare these results with the results for binary and regular ternary trees. The conducted evaluation study shows that our design, in the best case, consumes up to 12 times less memory (for the dictionary used in our experimental evaluation) than a regular ternary tree and in certain configuration shows performance comparable to regular ternary trees. We have evaluated the performance of the algorithms using both 32 and 64 bit operating systems.

Keywords: algorithms, data structures, succinct ternary tree, per- formance evaluation

Procedia PDF Downloads 167

25325 Assessing Local Authorities’ Interest in Addressing Urban Challenges through Nature Based Solutions in Romania

Authors: Athanasios A. Gavrilidis, Mihai R. Nita, Larissa N. Stoia, Diana A. Onose

Abstract:

Contemporary global environmental challenges must be primarily addressed at local levels. Cities are under continuous pressure as they must ensure high quality of life levels for their citizens and at the same time to adapt and address specific environmental issues. Innovative solutions using natural features or mimicking natural systems are endorsed by the scientific community as efficient approaches for both mitigating climate change effects and the decrease of environmental quality and for maintaining high standards of living for urban dwellers. The aim of this study was to assess whether Romanian cities’ authorities are considering nature-based innovation as solutions for their planning, management, and environmental issues. Data were gathered by applying 140 questionnaires to urban authorities throughout the country. The questionnaire was designed for assessinglocal policy makers’ perspective over the efficiency of nature-based innovations as a tool to address specific challenges. It also focused on extracting data about financing sources and challenges they must overcome for adopting nature-based approaches. The gather results from the municipalities participating in our study were statistically processed, and they revealed that Romanian city managers acknowledge the benefits of nature-based innovations, but investments in this sector are not on top of their priorities. More than 90% of the selected cities have agreed that in the last 10 years, their major concern was to expand the grey infrastructure (roads and public amenities) using traditional approaches. When asked how they would react if faced with different socio-economic and environmental challenges, local urban managers indicated investments nature-based solutions as a priority only in case of biodiversity loss and extreme weather, while for other 14 proposed scenarios, they would embrace the business-as-usual approach. Our study indicates that while new concepts of sustainable urban planning emerge within the scientific community, local authorities need more time to understand and implement them. Without the proper knowledge, personnel, policies, or dedicated budgets, local administrators will not embrace nature-based innovations as solutions for their challenges.

Keywords: nature based innovations, perception analysis, policy making, urban planning

Procedia PDF Downloads 179

25324 Review of Urbanization Pattern in Kabul City

Authors: Muhammad Hanif Amiri, Edris Sadeqy, Ahmad Freed Osman

Abstract:

International Conference on Architectural Engineering and Skyscraper (ICAES 2016) on January 18 - 19, 2016 is aimed to exchange new ideas and application experiences face to face, to establish business or research relations and to find global partners for future collaboration. Therefore, we are very keen to participate and share our issues in order to get valuable feedbacks of the conference participants. Urbanization is a controversial issue all around the world. Substandard and unplanned urbanization has many implications on a social, cultural and economic situation of population life. Unplanned and illegal construction has become a critical issue in Afghanistan particularly Kabul city. In addition, lack of municipal bylaws, poor municipal governance, lack of development policies and strategies, budget limitation, low professional capacity of ainvolved private sector in development and poor coordination among stakeholders are the other factors which made the problem more complicated. The main purpose of this research paper is to review urbanization pattern of Kabul city and find out the improvement solutions and to evaluate the increasing of population density which caused vast illegal and unplanned development which finally converts the Kabul city to a slam area as the whole. The Kabul city Master Plan was reviewed in the year 1978 and revised for the planned 2million population. In 2001, the interim administration took place and the city became influx of returnees from neighbor countries and other provinces of Afghanistan mostly for the purpose of employment opportunities, security and better quality of life, therefore, Kabul faced with strange population growth. According to Central Statistics Organization of Afghanistan population of Kabul has been estimated approx. 5 million (2015), however a new Master Plan has been prepared in 2009, but the existing challenges have not been dissolved yet. On the other hand, 70% of Kabul population is living in unplanned (slam) area and facing the shortage of drinking water, inexistence of sewerage and drainage network, inexistence of proper management system for solid waste collection, lack of public transportation and traffic management, environmental degradation and the shortage of social infrastructure. Although there are many problems in Kabul city, but still the development of 22 townships are in progress which caused the great attraction of population. The research is completed with a detailed analysis on four main issues such as elimination of duplicated administrations, Development of regions, Rehabilitation and improvement of infrastructure, and prevention of new townships establishment in Kabul Central Core in order to mitigate the problems and constraints which are the foundation and principal to find the point of departure for an objective based future development of Kabul city. The closure has been defined to reflect the stage-wise development in light of prepared policy and strategies, development of a procedure for the improvement of infrastructure, conducting a preliminary EIA, defining scope of stakeholder’s contribution and preparation of project list for initial development. In conclusion this paper will help the transformation of Kabul city.

Keywords: development of regions, illegal construction, population density, urbanization pattern

Procedia PDF Downloads 321

25323 Predicting Data Center Resource Usage Using Quantile Regression to Conserve Energy While Fulfilling the Service Level Agreement

Authors: Ahmed I. Alutabi, Naghmeh Dezhabad, Sudhakar Ganti

Abstract:

Data centers have been growing in size and dema nd continuously in the last two decades. Planning for the deployment of resources has been shallow and always resorted to over-provisioning. Data center operators try to maximize the availability of their services by allocating multiple of the needed resources. One resource that has been wasted, with little thought, has been energy. In recent years, programmable resource allocation has paved the way to allow for more efficient and robust data centers. In this work, we examine the predictability of resource usage in a data center environment. We use a number of models that cover a wide spectrum of machine learning categories. Then we establish a framework to guarantee the client service level agreement (SLA). Our results show that using prediction can cut energy loss by up to 55%.

Keywords: machine learning, artificial intelligence, prediction, data center, resource allocation, green computing

Procedia PDF Downloads 111

25322 Prosperous Digital Image Watermarking Approach by Using DCT-DWT

Authors: Prabhakar C. Dhavale, Meenakshi M. Pawar

Abstract:

In this paper, everyday tons of data is embedded on digital media or distributed over the internet. The data is so distributed that it can easily be replicated without error, putting the rights of their owners at risk. Even when encrypted for distribution, data can easily be decrypted and copied. One way to discourage illegal duplication is to insert information known as watermark, into potentially valuable data in such a way that it is impossible to separate the watermark from the data. These challenges motivated researchers to carry out intense research in the field of watermarking. A watermark is a form, image or text that is impressed onto paper, which provides evidence of its authenticity. Digital watermarking is an extension of the same concept. There are two types of watermarks visible watermark and invisible watermark. In this project, we have concentrated on implementing watermark in image. The main consideration for any watermarking scheme is its robustness to various attacks

Keywords: watermarking, digital, DCT-DWT, security

Procedia PDF Downloads 426

25321 Landslide Vulnerability Assessment in Context with Indian Himalayan

Authors: Neha Gupta

Abstract:

Landslide vulnerability is considered as the crucial parameter for the assessment of landslide risk. The term vulnerability defined as the damage or degree of elements at risk of different dimensions, i.e., physical, social, economic, and environmental dimensions. Himalaya region is very prone to multi-hazard such as floods, forest fires, earthquakes, and landslides. With the increases in fatalities rates, loss of infrastructure, and economy due to landslide in the Himalaya region, leads to the assessment of vulnerability. In this study, a methodology to measure the combination of vulnerability dimension, i.e., social vulnerability, physical vulnerability, and environmental vulnerability in one framework. A combined result of these vulnerabilities has rarely been carried out. But no such approach was applied in the Indian Scenario. The methodology was applied in an area of east Sikkim Himalaya, India. The physical vulnerability comprises of building footprint layer extracted from remote sensing data and Google Earth imaginary. The social vulnerability was assessed by using population density based on land use. The land use map was derived from a high-resolution satellite image, and for environment vulnerability assessment NDVI, forest, agriculture land, distance from the river were assessed from remote sensing and DEM. The classes of social vulnerability, physical vulnerability, and environment vulnerability were normalized at the scale of 0 (no loss) to 1 (loss) to get the homogenous dataset. Then the Multi-Criteria Analysis (MCA) was used to assign individual weights to each dimension and then integrate it into one frame. The final vulnerability was further classified into four classes from very low to very high.

Keywords: landslide, multi-criteria analysis, MCA, physical vulnerability, social vulnerability

Procedia PDF Downloads 303

25320 A Comparison of Image Data Representations for Local Stereo Matching

Authors: André Smith, Amr Abdel-Dayem

Abstract:

The stereo matching problem, while having been present for several decades, continues to be an active area of research. The goal of this research is to find correspondences between elements found in a set of stereoscopic images. With these pairings, it is possible to infer the distance of objects within a scene, relative to the observer. Advancements in this field have led to experimentations with various techniques, from graph-cut energy minimization to artificial neural networks. At the basis of these techniques is a cost function, which is used to evaluate the likelihood of a particular match between points in each image. While at its core, the cost is based on comparing the image pixel data; there is a general lack of consistency as to what image data representation to use. This paper presents an experimental analysis to compare the effectiveness of more common image data representations. The goal is to determine the effectiveness of these data representations to reduce the cost for the correct correspondence relative to other possible matches.

Keywords: colour data, local stereo matching, stereo correspondence, disparity map

Procedia PDF Downloads 374

25319 Modeling Geogenic Groundwater Contamination Risk with the Groundwater Assessment Platform (GAP)

Authors: Joel Podgorski, Manouchehr Amini, Annette Johnson, Michael Berg

Abstract:

One-third of the world’s population relies on groundwater for its drinking water. Natural geogenic arsenic and fluoride contaminate ~10% of wells. Prolonged exposure to high levels of arsenic can result in various internal cancers, while high levels of fluoride are responsible for the development of dental and crippling skeletal fluorosis. In poor urban and rural settings, the provision of drinking water free of geogenic contamination can be a major challenge. In order to efficiently apply limited resources in the testing of wells, water resource managers need to know where geogenically contaminated groundwater is likely to occur. The Groundwater Assessment Platform (GAP) fulfills this need by providing state-of-the-art global arsenic and fluoride contamination hazard maps as well as enabling users to create their own groundwater quality models. The global risk models were produced by logistic regression of arsenic and fluoride measurements using predictor variables of various soil, geological and climate parameters. The maps display the probability of encountering concentrations of arsenic or fluoride exceeding the World Health Organization’s (WHO) stipulated concentration limits of 10 µg/L or 1.5 mg/L, respectively. In addition to a reconsideration of the relevant geochemical settings, these second-generation maps represent a great improvement over the previous risk maps due to a significant increase in data quantity and resolution. For example, there is a 10-fold increase in the number of measured data points, and the resolution of predictor variables is generally 60 times greater. These same predictor variable datasets are available on the GAP platform for visualization as well as for use with a modeling tool. The latter requires that users upload their own concentration measurements and select the predictor variables that they wish to incorporate in their models. In addition, users can upload additional predictor variable datasets either as features or coverages. Such models can represent an improvement over the global models already supplied, since (a) users may be able to use their own, more detailed datasets of measured concentrations and (b) the various processes leading to arsenic and fluoride groundwater contamination can be isolated more effectively on a smaller scale, thereby resulting in a more accurate model. All maps, including user-created risk models, can be downloaded as PDFs. There is also the option to share data in a secure environment as well as the possibility to collaborate in a secure environment through the creation of communities. In summary, GAP provides users with the means to reliably and efficiently produce models specific to their region of interest by making available the latest datasets of predictor variables along with the necessary modeling infrastructure.

Keywords: arsenic, fluoride, groundwater contamination, logistic regression

Procedia PDF Downloads 350

25318 Floodnet: Classification for Post Flood Scene with a High-Resolution Aerial Imaginary Dataset

Authors: Molakala Mourya Vardhan Reddy, Kandimala Revanth, Koduru Sumanth, Beena B. M.

Abstract:

Emergency response and recovery operations are severely hampered by natural catastrophes, especially floods. Understanding post-flood scenarios is essential to disaster management because it facilitates quick evaluation and decision-making. To this end, we introduce FloodNet, a brand-new high-resolution aerial picture collection created especially for comprehending post-flood scenes. A varied collection of excellent aerial photos taken during and after flood occurrences make up FloodNet, which offers comprehensive representations of flooded landscapes, damaged infrastructure, and changed topographies. The dataset provides a thorough resource for training and assessing computer vision models designed to handle the complexity of post-flood scenarios, including a variety of environmental conditions and geographic regions. Pixel-level semantic segmentation masks are used to label the pictures in FloodNet, allowing for a more detailed examination of flood-related characteristics, including debris, water bodies, and damaged structures. Furthermore, temporal and positional metadata improve the dataset's usefulness for longitudinal research and spatiotemporal analysis. For activities like flood extent mapping, damage assessment, and infrastructure recovery projection, we provide baseline standards and evaluation metrics to promote research and development in the field of post-flood scene comprehension. By integrating FloodNet into machine learning pipelines, it will be easier to create reliable algorithms that will help politicians, urban planners, and first responders make choices both before and after floods. The goal of the FloodNet dataset is to support advances in computer vision, remote sensing, and disaster response technologies by providing a useful resource for researchers. FloodNet helps to create creative solutions for boosting communities' resilience in the face of natural catastrophes by tackling the particular problems presented by post-flood situations.

Keywords: image classification, segmentation, computer vision, nature disaster, unmanned arial vehicle(UAV), machine learning.

Procedia PDF Downloads 86