Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 458

Search results for: centralized server

8 A Clustering-Based Approach for Weblog Data Cleaning

Authors: Amine Ganibardi, Cherif Arab Ali

Abstract:

This paper addresses the data cleaning issue as a part of web usage data preprocessing within the scope of Web Usage Mining. Weblog data recorded by web servers within log files reflect usage activity, i.e., End-users’ clicks and underlying user-agents’ hits. As Web Usage Mining is interested in End-users’ behavior, user-agents’ hits are referred to as noise to be cleaned-off before mining. Filtering hits from clicks is not trivial for two reasons, i.e., a server records requests interlaced in sequential order regardless of their source or type, website resources may be set up as requestable interchangeably by end-users and user-agents. The current methods are content-centric based on filtering heuristics of relevant/irrelevant items in terms of some cleaning attributes, i.e., website’s resources filetype extensions, website’s resources pointed by hyperlinks/URIs, http methods, user-agents, etc. These methods need exhaustive extra-weblog data and prior knowledge on the relevant and/or irrelevant items to be assumed as clicks or hits within the filtering heuristics. Such methods are not appropriate for dynamic/responsive Web for three reasons, i.e., resources may be set up to as clickable by end-users regardless of their type, website’s resources are indexed by frame names without filetype extensions, web contents are generated and cancelled differently from an end-user to another. In order to overcome these constraints, a clustering-based cleaning method centered on the logging structure is proposed. This method focuses on the statistical properties of the logging structure at the requested and referring resources attributes levels. It is insensitive to logging content and does not need extra-weblog data. The used statistical property takes on the structure of the generated logging feature by webpage requests in terms of clicks and hits. Since a webpage consists of its single URI and several components, these feature results in a single click to multiple hits ratio in terms of the requested and referring resources. Thus, the clustering-based method is meant to identify two clusters based on the application of the appropriate distance to the frequency matrix of the requested and referring resources levels. As the ratio clicks to hits is single to multiple, the clicks’ cluster is the smallest one in requests number. Hierarchical Agglomerative Clustering based on a pairwise distance (Gower) and average linkage has been applied to four logfiles of dynamic/responsive websites whose click to hits ratio range from 1/2 to 1/15. The optimal clustering set on the basis of average linkage and maximum inter-cluster inertia results always in two clusters. The evaluation of the smallest cluster referred to as clicks cluster under the terms of confusion matrix indicators results in 97% of true positive rate. The content-centric cleaning methods, i.e., conventional and advanced cleaning, resulted in a lower rate 91%. Thus, the proposed clustering-based cleaning outperforms the content-centric methods within dynamic and responsive web design without the need of any extra-weblog. Such an improvement in cleaning quality is likely to refine dependent analysis.

Keywords: clustering approach, data cleaning, data preprocessing, weblog data, web usage data

Procedia PDF Downloads 154

7 Introducing, Testing, and Evaluating a Unified JavaScript Framework for Professional Online Studies

Authors: Caspar Goeke, Holger Finger, Dorena Diekamp, Peter König

Abstract:

Online-based research has recently gained increasing attention from various fields of research in the cognitive sciences. Technological advances in the form of online crowdsourcing (Amazon Mechanical Turk), open data repositories (Open Science Framework), and online analysis (Ipython notebook) offer rich possibilities to improve, validate, and speed up research. However, until today there is no cross-platform integration of these subsystems. Furthermore, implementation of online studies still suffers from the complex implementation (server infrastructure, database programming, security considerations etc.). Here we propose and test a new JavaScript framework that enables researchers to conduct any kind of behavioral research in the browser without the need to program a single line of code. In particular our framework offers the possibility to manipulate and combine the experimental stimuli via a graphical editor, directly in the browser. Moreover, we included an action-event system that can be used to handle user interactions, interactively change stimuli properties or store participants’ responses. Besides traditional recordings such as reaction time, mouse and keyboard presses, the tool offers webcam based eye and face-tracking. On top of these features our framework also takes care about the participant recruitment, via crowdsourcing platforms such as Amazon Mechanical Turk. Furthermore, the build in functionality of google translate will ensure automatic text translations of the experimental content. Thereby, thousands of participants from different cultures and nationalities can be recruited literally within hours. Finally, the recorded data can be visualized and cleaned online, and then exported into the desired formats (csv, xls, sav, mat) for statistical analysis. Alternatively, the data can also be analyzed online within our framework using the integrated Ipython notebook. The framework was designed such that studies can be used interchangeably between researchers. This will support not only the idea of open data repositories but also constitutes the possibility to share and reuse the experimental designs and analyses such that the validity of the paradigms will be improved. Particularly, sharing and integrating the experimental designs and analysis will lead to an increased consistency of experimental paradigms. To demonstrate the functionality of the framework we present the results of a pilot study in the field of spatial navigation that was conducted using the framework. Specifically, we recruited over 2000 subjects with various cultural backgrounds and consequently analyzed performance difference in dependence on the factors culture, gender and age. Overall, our results demonstrate a strong influence of cultural factors in spatial cognition. Such an influence has not yet been reported before and would not have been possible to show without the massive amount of data collected via our framework. In fact, these findings shed new lights on cultural differences in spatial navigation. As a consequence we conclude that our new framework constitutes a wide range of advantages for online research and a methodological innovation, by which new insights can be revealed on the basis of massive data collection.

Keywords: cultural differences, crowdsourcing, JavaScript framework, methodological innovation, online data collection, online study, spatial cognition

Procedia PDF Downloads 234

6 Machine Learning Approach for Automating Electronic Component Error Classification and Detection

Authors: Monica Racha, Siva Chandrasekaran, Alex Stojcevski

Abstract:

The engineering programs focus on promoting students' personal and professional development by ensuring that students acquire technical and professional competencies during four-year studies. The traditional engineering laboratory provides an opportunity for students to "practice by doing," and laboratory facilities aid them in obtaining insight and understanding of their discipline. Due to rapid technological advancements and the current COVID-19 outbreak, the traditional labs were transforming into virtual learning environments. Aim: To better understand the limitations of the physical laboratory, this research study aims to use a Machine Learning (ML) algorithm that interfaces with the Augmented Reality HoloLens and predicts the image behavior to classify and detect the electronic components. The automated electronic components error classification and detection automatically detect and classify the position of all components on a breadboard by using the ML algorithm. This research will assist first-year undergraduate engineering students in conducting laboratory practices without any supervision. With the help of HoloLens, and ML algorithm, students will reduce component placement error on a breadboard and increase the efficiency of simple laboratory practices virtually. Method: The images of breadboards, resistors, capacitors, transistors, and other electrical components will be collected using HoloLens 2 and stored in a database. The collected image dataset will then be used for training a machine learning model. The raw images will be cleaned, processed, and labeled to facilitate further analysis of components error classification and detection. For instance, when students conduct laboratory experiments, the HoloLens captures images of students placing different components on a breadboard. The images are forwarded to the server for detection in the background. A hybrid Convolutional Neural Networks (CNNs) and Support Vector Machines (SVMs) algorithm will be used to train the dataset for object recognition and classification. The convolution layer extracts image features, which are then classified using Support Vector Machine (SVM). By adequately labeling the training data and classifying, the model will predict, categorize, and assess students in placing components correctly. As a result, the data acquired through HoloLens includes images of students assembling electronic components. It constantly checks to see if students appropriately position components in the breadboard and connect the components to function. When students misplace any components, the HoloLens predicts the error before the user places the components in the incorrect proportion and fosters students to correct their mistakes. This hybrid Convolutional Neural Networks (CNNs) and Support Vector Machines (SVMs) algorithm automating electronic component error classification and detection approach eliminates component connection problems and minimizes the risk of component damage. Conclusion: These augmented reality smart glasses powered by machine learning provide a wide range of benefits to supervisors, professionals, and students. It helps customize the learning experience, which is particularly beneficial in large classes with limited time. It determines the accuracy with which machine learning algorithms can forecast whether students are making the correct decisions and completing their laboratory tasks.

Keywords: augmented reality, machine learning, object recognition, virtual laboratories

Procedia PDF Downloads 112

5 The Underground Ecosystem of Credit Card Frauds

Authors: Abhinav Singh

Abstract:

Point Of Sale (POS) malwares have been stealing the limelight this year. They have been the elemental factor in some of the biggest breaches uncovered in past couple of years. Some of them include • Target: A Retail Giant reported close to 40 million credit card data being stolen • Home Depot : A home product Retailer reported breach of close to 50 million credit records • Kmart: A US retailer recently announced breach of 800 thousand credit card details. Alone in 2014, there have been reports of over 15 major breaches of payment systems around the globe. Memory scrapping malwares infecting the point of sale devices have been the lethal weapon used in these attacks. These malwares are capable of reading the payment information from the payment device memory before they are being encrypted. Later on these malwares send the stolen details to its parent server. These malwares are capable of recording all the critical payment information like the card number, security number, owner etc. All these information are delivered in raw format. This Talk will cover the aspects of what happens after these details have been sent to the malware authors. The entire ecosystem of credit card frauds can be broadly classified into these three steps: • Purchase of raw details and dumps • Converting them to plastic cash/cards • Shop! Shop! Shop! The focus of this talk will be on the above mentioned points and how they form an organized network of cyber-crime. The first step involves buying and selling of the stolen details. The key point to emphasize are : • How is this raw information been sold in the underground market • The buyer and seller anatomy • Building your shopping cart and preferences • The importance of reputation and vouches • Customer support and replace/refunds These are some of the key points that will be discussed. But the story doesn’t end here. As of now the buyer only has the raw card information. How will this raw information be converted to plastic cash? Now comes in picture the second part of this underground economy where-in these raw details are converted into actual cards. There are well organized services running underground that can help you in converting these details into plastic cards. We will discuss about this technique in detail. At last, the final step involves shopping with the stolen cards. The cards generated with the stolen details can be easily used to swipe-and-pay for purchased goods at different retail shops. Usually these purchases are of expensive items that have good resale value. Apart from using the cards at stores, there are underground services that lets you deliver online orders to their dummy addresses. Once the package is received it will be delivered to the original buyer. These services charge based on the value of item that is being delivered. The overall underground ecosystem of credit card fraud works in a bulletproof way and it involves people working in close groups and making heavy profits. This is a brief summary of what I plan to present at the talk. I have done an extensive research and have collected good deal of material to present as samples. Some of them include: • List of underground forums • Credit card dumps • IRC chats among these groups • Personal chat with big card sellers • Inside view of these forum owners. The talk will be concluded by throwing light on how these breaches are being tracked during investigation. How are credit card breaches tracked down and what steps can financial institutions can build an incidence response over it.

Keywords: POS mawalre, credit card frauds, enterprise security, underground ecosystem

Procedia PDF Downloads 411

4 Development of Advanced Virtual Radiation Detection and Measurement Laboratory (AVR-DML) for Nuclear Science and Engineering Students

Authors: Lily Ranjbar, Haori Yang

Abstract:

Online education has been around for several decades, but the importance of online education became evident after the COVID-19 pandemic. Eventhough the online delivery approach works well for knowledge building through delivering content and oversight processes, it has limitations in developing hands-on laboratory skills, especially in the STEM field. During the pandemic, many education institutions faced numerous challenges in delivering lab-based courses, especially in the STEM field. Also, many students worldwide were unable to practice working with lab equipment due to social distancing or the significant cost of highly specialized equipment. The laboratory plays a crucial role in nuclear science and engineering education. It can engage students and improve their learning outcomes. In addition, online education and virtual labs have gained substantial popularity in engineering and science education. Therefore, developing virtual labs is vital for institutions to deliver high-class education to their students, including their online students. The School of Nuclear Science and Engineering (NSE) at Oregon State University, in partnership with SpectralLabs company, has developed an Advanced Virtual Radiation Detection and Measurement Lab (AVR-DML) to offer a fully online Master of Health Physics program. It was essential for us to use a system that could simulate nuclear modules that accurately replicate the underlying physics, the nature of radiation and radiation transport, and the mechanics of the instrumentations used in the real radiation detection lab. It was all accomplished using a Realistic, Adaptive, Interactive Learning System (RAILS). RAILS is a comprehensive software simulation-based learning system for use in training. It is comprised of a web-based learning management system that is located on a central server, as well as a 3D-simulation package that is downloaded locally to user machines. Users will find that the graphics, animations, and sounds in RAILS create a realistic, immersive environment to practice detecting different radiation sources. These features allow students to coexist, interact and engage with a real STEM lab in all its dimensions. It enables them to feel like they are in a real lab environment and to see the same system they would in a lab. Unique interactive interfaces were designed and developed by integrating all the tools and equipment needed to run each lab. These interfaces provide students full functionality for data collection, changing the experimental setup, and live data collection with real-time updates for each experiment. Students can manually do all experimental setups and parameter changes in this lab. Experimental results can then be tracked and analyzed in an oscilloscope, a multi-channel analyzer, or a single-channel analyzer (SCA). The advanced virtual radiation detection and measurement laboratory developed in this study enabled the NSE school to offer a fully online MHP program. This flexibility of course modality helped us to attract more non-traditional students, including international students. It is a valuable educational tool as students can walk around the virtual lab, make mistakes, and learn from them. They have an unlimited amount of time to repeat and engage in experiments. This lab will also help us speed up training in nuclear science and engineering.

Keywords: advanced radiation detection and measurement, virtual laboratory, realistic adaptive interactive learning system (rails), online education in stem fields, student engagement, stem online education, stem laboratory, online engineering education

Procedia PDF Downloads 66

3 Cloud-Based Multiresolution Geodata Cube for Efficient Raster Data Visualization and Analysis

Authors: Lassi Lehto, Jaakko Kahkonen, Juha Oksanen, Tapani Sarjakoski

Abstract:

The use of raster-formatted data sets in geospatial analysis is increasing rapidly. At the same time, geographic data are being introduced into disciplines outside the traditional domain of geoinformatics, like climate change, intelligent transport, and immigration studies. These developments call for better methods to deliver raster geodata in an efficient and easy-to-use manner. Data cube technologies have traditionally been used in the geospatial domain for managing Earth Observation data sets that have strict requirements for effective handling of time series. The same approach and methodologies can also be applied in managing other types of geospatial data sets. A cloud service-based geodata cube, called GeoCubes Finland, has been developed to support online delivery and analysis of most important geospatial data sets with national coverage. The main target group of the service is the academic research institutes in the country. The most significant aspects of the GeoCubes data repository include the use of multiple resolution levels, cloud-optimized file structure, and a customized, flexible content access API. Input data sets are pre-processed while being ingested into the repository to bring them into a harmonized form in aspects like georeferencing, sampling resolutions, spatial subdivision, and value encoding. All the resolution levels are created using an appropriate generalization method, selected depending on the nature of the source data set. Multiple pre-processed resolutions enable new kinds of online analysis approaches to be introduced. Analysis processes based on interactive visual exploration can be effectively carried out, as the level of resolution most close to the visual scale can always be used. In the same way, statistical analysis can be carried out on resolution levels that best reflect the scale of the phenomenon being studied. Access times remain close to constant, independent of the scale applied in the application. The cloud service-based approach, applied in the GeoCubes Finland repository, enables analysis operations to be performed on the server platform, thus making high-performance computing facilities easily accessible. The developed GeoCubes API supports this kind of approach for online analysis. The use of cloud-optimized file structures in data storage enables the fast extraction of subareas. The access API allows for the use of vector-formatted administrative areas and user-defined polygons as definitions of subareas for data retrieval. Administrative areas of the country in four levels are available readily from the GeoCubes platform. In addition to direct delivery of raster data, the service also supports the so-called virtual file format, in which only a small text file is first downloaded. The text file contains links to the raster content on the service platform. The actual raster data is downloaded on demand, from the spatial area and resolution level required in each stage of the application. By the geodata cube approach, pre-harmonized geospatial data sets are made accessible to new categories of inexperienced users in an easy-to-use manner. At the same time, the multiresolution nature of the GeoCubes repository facilitates expert users to introduce new kinds of interactive online analysis operations.

Keywords: cloud service, geodata cube, multiresolution, raster geodata

Procedia PDF Downloads 109

2 Long-Term Subcentimeter-Accuracy Landslide Monitoring Using a Cost-Effective Global Navigation Satellite System Rover Network: Case Study

Authors: Vincent Schlageter, Maroua Mestiri, Florian Denzinger, Hugo Raetzo, Michel Demierre

Abstract:

Precise landslide monitoring with differential global navigation satellite system (GNSS) is well known, but technical or economic reasons limit its application by geotechnical companies. This study demonstrates the reliability and the usefulness of Geomon (Infrasurvey Sàrl, Switzerland), a stand-alone and cost-effective rover network. The system permits deploying up to 15 rovers, plus one reference station for differential GNSS. A dedicated radio communication links all the modules to a base station, where an embedded computer automatically provides all the relative positions (L1 phase, open-source RTKLib software) and populates an Internet server. Each measure also contains information from an internal inclinometer, battery level, and position quality indices. Contrary to standard GNSS survey systems, which suffer from a limited number of beacons that must be placed in areas with good GSM signal, Geomon offers greater flexibility and permits a real overview of the whole landslide with good spatial resolution. Each module is powered with solar panels, ensuring autonomous long-term recordings. In this study, we have tested the system on several sites in the Swiss mountains, setting up to 7 rovers per site, for an 18 month-long survey. The aim was to assess the robustness and the accuracy of the system in different environmental conditions. In one case, we ran forced blind tests (vertical movements of a given amplitude) and compared various session parameters (duration from 10 to 90 minutes). Then the other cases were a survey of real landslides sites using fixed optimized parameters. Sub centimetric-accuracy with few outliers was obtained using the best parameters (session duration of 60 minutes, baseline 1 km or less), with the noise level on the horizontal component half that of the vertical one. The performance (percent of aborting solutions, outliers) was reduced with sessions shorter than 30 minutes. The environment also had a strong influence on the percent of aborting solutions (ambiguity search problem), due to multiple reflections or satellites obstructed by trees and mountains. The length of the baseline (distance reference-rover, single baseline processing) reduced the accuracy above 1 km but had no significant effect below this limit. In critical weather conditions, the system’s robustness was limited: snow, avalanche, and frost-covered some rovers, including the antenna and vertically oriented solar panels, leading to data interruption; and strong wind damaged a reference station. The possibility of changing the sessions’ parameters remotely was very useful. In conclusion, the rover network tested provided the foreseen sub-centimetric-accuracy while providing a dense spatial resolution landslide survey. The ease of implementation and the fully automatic long-term survey were timesaving. Performance strongly depends on surrounding conditions, but short pre-measures should allow moving a rover to a better final placement. The system offers a promising hazard mitigation technique. Improvements could include data post-processing for alerts and automatic modification of the duration and numbers of sessions based on battery level and rover displacement velocity.

Keywords: GNSS, GSM, landslide, long-term, network, solar, spatial resolution, sub-centimeter.

Procedia PDF Downloads 93

1 Developing a Cloud Intelligence-Based Energy Management Architecture Facilitated with Embedded Edge Analytics for Energy Conservation in Demand-Side Management

Authors: Yu-Hsiu Lin, Wen-Chun Lin, Yen-Chang Cheng, Chia-Ju Yeh, Yu-Chuan Chen, Tai-You Li

Abstract:

Demand-Side Management (DSM) has the potential to reduce electricity costs and carbon emission, which are associated with electricity used in the modern society. A home Energy Management System (EMS) commonly used by residential consumers in a down-stream sector of a smart grid to monitor, control, and optimize energy efficiency to domestic appliances is a system of computer-aided functionalities as an energy audit for residential DSM. Implementing fault detection and classification to domestic appliances monitored, controlled, and optimized is one of the most important steps to realize preventive maintenance, such as residential air conditioning and heating preventative maintenance in residential/industrial DSM. In this study, a cloud intelligence-based green EMS that comes up with an Internet of Things (IoT) technology stack for residential DSM is developed. In the EMS, Arduino MEGA Ethernet communication-based smart sockets that module a Real Time Clock chip to keep track of current time as timestamps via Network Time Protocol are designed and implemented for readings of load phenomena reflecting on voltage and current signals sensed. Also, a Network-Attached Storage providing data access to a heterogeneous group of IoT clients via Hypertext Transfer Protocol (HTTP) methods is configured to data stores of parsed sensor readings. Lastly, a desktop computer with a WAMP software bundle (the Microsoft® Windows operating system, Apache HTTP Server, MySQL relational database management system, and PHP programming language) serves as a data science analytics engine for dynamic Web APP/REpresentational State Transfer-ful web service of the residential DSM having globally-Advanced Internet of Artificial Intelligence (AI)/Computational Intelligence. Where, an abstract computing machine, Java Virtual Machine, enables the desktop computer to run Java programs, and a mash-up of Java, R language, and Python is well-suited and -configured for AI in this study. Having the ability of sending real-time push notifications to IoT clients, the desktop computer implements Google-maintained Firebase Cloud Messaging to engage IoT clients across Android/iOS devices and provide mobile notification service to residential/industrial DSM. In this study, in order to realize edge intelligence that edge devices avoiding network latency and much-needed connectivity of Internet connections for Internet of Services can support secure access to data stores and provide immediate analytical and real-time actionable insights at the edge of the network, we upgrade the designed and implemented smart sockets to be embedded AI Arduino ones (called embedded AIduino). With the realization of edge analytics by the proposed embedded AIduino for data analytics, an Arduino Ethernet shield WizNet W5100 having a micro SD card connector is conducted and used. The SD library is included for reading parsed data from and writing parsed data to an SD card. And, an Artificial Neural Network library, ArduinoANN, for Arduino MEGA is imported and used for locally-embedded AI implementation. The embedded AIduino in this study can be developed for further applications in manufacturing industry energy management and sustainable energy management, wherein in sustainable energy management rotating machinery diagnostics works to identify energy loss from gross misalignment and unbalance of rotating machines in power plants as an example.

Keywords: demand-side management, edge intelligence, energy management system, fault detection and classification

Procedia PDF Downloads 229