Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 704

Search results for: file format

704 An Automatic Bayesian Classification System for File Format Selection

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for the classification of an unstructured format description for identification of file formats. The main contribution of this work is the employment of data mining techniques to support file format selection with just the unstructured text description that comprises the most important format features for a particular organisation. Subsequently, the file format indentification method employs file format classifier and associated configurations to support digital preservation experts with an estimation of required file format. Our goal is to make use of a format specification knowledge base aggregated from a different Web sources in order to select file format for a particular institution. Using the naive Bayes method, the decision support system recommends to an expert, the file format for his institution. The proposed methods facilitate the selection of file format and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and specifications of file formats. To facilitate decision-making, the aggregated information about the file formats is presented as a file format vocabulary that comprises most common terms that are characteristic for all researched formats. The goal is to suggest a particular file format based on this vocabulary for analysis by an expert. The sample file format calculation and the calculation results including probabilities are presented in the evaluation section.

Keywords: data mining, digital libraries, digital preservation, file format

Procedia PDF Downloads 465

703 Restoration of Digital Design Using Row and Column Major Parsing Technique from the Old/Used Jacquard Punched Cards

Authors: R. Kumaravelu, S. Poornima, Sunil Kumar Kashyap

Abstract:

The optimized and digitalized restoration of the information from the old and used manual jacquard punched card in textile industry is referred to as Jacquard Punch Card (JPC) reader. In this paper, we present a novel design and development of photo electronics based system for reading old and used punched cards and storing its binary information for transforming them into an effective image file format. In our textile industry the jacquard punched cards holes diameters having the sizes of 3mm, 5mm and 5.5mm pitch. Before the adaptation of computing systems in the field of textile industry those punched cards were prepared manually without digital design source, but those punched cards are having rich woven designs. Now, the idea is to retrieve binary information from the jacquard punched cards and store them in digital (Non-Graphics) format before processing it. After processing the digital format (Non-Graphics) it is converted into an effective image file format through either by Row major or Column major parsing technique.To accomplish these activities, an embedded system based device and software integration is developed. As part of the test and trial activity the device was tested and installed for industrial service at Weavers Service Centre, Kanchipuram, Tamilnadu in India.

Keywords: file system, SPI. UART, ARM controller, jacquard, punched card, photo LED, photo diode

Procedia PDF Downloads 135

702 A Tool for Facilitating an Institutional Risk Profile Definition

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for the easy creation of an institutional risk profile for endangerment analysis of file formats. The main contribution of this work is the employment of data mining techniques to support risk factors set up with just the most important values that are important for a particular organisation. Subsequently, the risk profile employs fuzzy models and associated configurations for the file format metadata aggregator to support digital preservation experts with a semi-automatic estimation of endangerment level for file formats. Our goal is to make use of a domain expert knowledge base aggregated from a digital preservation survey in order to detect preservation risks for a particular institution. Another contribution is support for visualisation and analysis of risk factors for a requried dimension. The proposed methods improve the visibility of risk factor information and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and automatically aggregated file format metadata from linked open data sources. To facilitate decision-making, the aggregated information about the risk factors is presented as a multidimensional vector. The goal is to visualise particular dimensions of this vector for analysis by an expert. The sample risk profile calculation and the visualisation of some risk factor dimensions is presented in the evaluation section.

Keywords: digital information management, file format, endangerment analysis, fuzzy models

Procedia PDF Downloads 371

701 Developing a Rational Database Management System (RDBMS) Supporting Product Life Cycle Appications

Authors: Yusri Yusof, Chen Wong Keong

Abstract:

This paper presents the implementation details of a Relational Database Management System of a STEP-technology product model repository. It is able support the implementation of any EXPRESS language schema, although it has been primarily implemented to support mechanical product life cycle applications. This database support the input of STEP part 21 file format from CAD in geometrical and topological data format and support a range of queries for mechanical product life cycle applications. This proposed relational database management system uses entity-to-table method (R1) rather than type-to-table method (R4). The two mapping methods have their own strengths and drawbacks.

Keywords: RDBMS, CAD, ISO 10303, part-21 file

Procedia PDF Downloads 504

700 A Bayesian Classification System for Facilitating an Institutional Risk Profile Definition

Authors: Roman Graf, Sergiu Gordea, Heather M. Ryan

Abstract:

This paper presents an approach for easy creation and classification of institutional risk profiles supporting endangerment analysis of file formats. The main contribution of this work is the employment of data mining techniques to support set up of the most important risk factors. Subsequently, risk profiles employ risk factors classifier and associated configurations to support digital preservation experts with a semi-automatic estimation of endangerment group for file format risk profiles. Our goal is to make use of an expert knowledge base, accuired through a digital preservation survey in order to detect preservation risks for a particular institution. Another contribution is support for visualisation of risk factors for a requried dimension for analysis. Using the naive Bayes method, the decision support system recommends to an expert the matching risk profile group for the previously selected institutional risk profile. The proposed methods improve the visibility of risk factor values and the quality of a digital preservation process. The presented approach is designed to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and values of file format risk profiles. To facilitate decision-making, the aggregated information about the risk factors is presented as a multidimensional vector. The goal is to visualise particular dimensions of this vector for analysis by an expert and to define its profile group. The sample risk profile calculation and the visualisation of some risk factor dimensions is presented in the evaluation section.

Keywords: linked open data, information integration, digital libraries, data mining

Procedia PDF Downloads 394

699 Design Approach for the Development of Format-Flexible Packaging Machines

Authors: G. Götz, P. Stich, J. Backhaus, G. Reinhart

Abstract:

The rising demand for format-flexible packaging machines is caused by current market changes. Increasing the formatflexibility is a new goal for the packaging machine manufacturers’ product development process. There are no methodical or designorientated tools for a comprehensive consideration of this target. This paper defines the term format-flexibility in the context of packaging machines and shows the state-of-the-art for improving the changeover of production machines. The requirements for a new approach and the concept itself will be introduced, and the method elements will be explained. Finally, the use of the concept and the result of the development of a format-flexible packaging machine will be shown.

Keywords: packaging machine, format-flexibility, changeover, design method

Procedia PDF Downloads 398

698 Developing an Online Library for Faster Retrieval of Mold Base and Standard Parts of Injection Molding

Authors: Alan C. Lin, Ricky N. Joevan

Abstract:

This paper focuses on developing a system to transfer mold base plates and standard parts faster during the stage of injection mold design. This system not only provides a way to compare the file version, but also it utilizes Siemens NX 10 to isolate the updated information into a single executable file (.dll), and then, the file can be transferred without the need of transferring the whole file. By this way, the system can help the user to download only necessary mold base plates and standard parts, and those parts downloaded are only the updated portions.

Keywords: CAD, injection molding, mold base, data retrieval

Procedia PDF Downloads 268

697 Proactive Disk Defragmentation through User's File-Access Patterns

Authors: Gordon Wong

Abstract:

This paper shows how the task of disk defragmentation can be handled by modern operating systems in a transparent, automated, efficient, and confined way through user's file-access patterns. Since files tend to gradually fragment from time to time through file creation, deletion, growth, and shrinking, the problem gets even worse when a disk becomes so fragmented that file accesses cannot be made reasonably efficient without performing the operation of defragmentation for the "entire" disk, which is done manually by the user by launching the disk defragmentation utility program normally bundled with the operating system. In this paper, we argue that the disk defragmentation problem described can be solved without having to manually use the utility program to defragment the entire disk. The argument is based on the observation that system users tend to access certain files in a particular time interval like the way observed for programs exhibiting temporal locality of memory references during their execution. The task of disk defragmentation can be initiated and acted upon for those files contained in the current file-access locality detected and identified by the operating system. The paper also discusses how to use the locality of file references approach to quantitatively measure and determine the locality of user's file access patterns on which the task of disk defragmentation is based.

Keywords: operating systems, disk defragmentation, locality of file accesses, system performance

Procedia PDF Downloads 19

696 Determinants of Standard Audit File for Tax Purposes Accounting Legal Obligation Compliance Costs: Empirical Study for Portuguese SMEs of Leiria District

Authors: Isa Raquel Alves Soeiro, Cristina Isabel Branco de Sá

Abstract:

In Portugal, since 2008, there has been a requirement to export the Standard Audit File for Tax Purposes (SAF-T) standard file (in XML format). This file thus gathers tax-relevant information from a company relating to a specific period of taxation. There are two types of SAF-T files that serve different purposes: the SAF-T of revenues and the SAF-T of accounting, which requires taxpayers and accounting firms to invest in order to adapt the accounting programs to the legal requirements. The implementation of the SAF-T accounting file aims to facilitate the collection of relevant tax data by tax inspectors as support of taxpayers' tax returns for the analysis of accounting records or other information with tax relevance (Portaria No. 321-A/2007 of March 26 and Portaria No. 302/2016 of December 2). The main objective of this research project is to verify, through quantitative analysis, what is the cost of compliance of Small and Medium Enterprises (SME) in the district of Leiria in the introduction and implementation of the tax obligation of SAF-T - Standard Audit File for Tax Purposes of accounting. The information was collected through a questionnaire sent to a population of companies selected through the SABI Bureau Van Dijk database in 2020. Based on the responses obtained to the questionnaire, the companies were divided into two groups: Group 1 -companies who are self-employed and whose main activity is accounting services; and Group 2 -companies that do not belong to the accounting sector. In general terms, the conclusion is that there are no statistically significant differences in the costs of complying with the accounting SAF-T between the companies in Group 1 and Group 2 and that, on average, the internal costs of both groups represent the largest component of the total cost of compliance with the accounting SAF-T. The results obtained show that, in both groups, the total costs of complying with the SAF-T of accounting are regressive, which appears to be similar to international studies, although these are related to different tax obligations. Additionally, we verified that the variables volume of business, software used, number of employees, and legal form explain the differences in the costs of complying with accounting SAF-T in the Leiria district SME.

Keywords: compliance costs, SAF-T accounting, SME, Portugal

Procedia PDF Downloads 46

695 O.MG- It’s a Cyber-Enabled Fraud

Authors: Damola O. Lawal, David W. Gresty, Diane E. Gan, Louise Hewitt

Abstract:

This paper investigates the feasibility of using a programmable USB such as the O.MG Cable to perform a file tampering attack. Here, the O.MG Cable, an apparently harmless mobile device charger, is used in an unauthorized way to alter the content of a file (accounts record-January_Contributions.xlsx). The aim is to determine if a forensics analyst can reliably determine who has altered the target file; the O.MG Cable or the user of the machine. This work highlights some of the traces of the O.MG Cable left behind on the target computer itself, such as the Product ID (PID) and Vendor ID (ID). Also discussed is the O.MG Cable’s behavior during the experiments. We determine if a forensics analyst could identify if any evidence has been left behind by the programmable device on the target file once it has been removed from the computer to establish if the analyst would be able to link the traces left by the O.MG Cable to the file tampering. It was discovered that the forensic analyst might mistake the actions of the O.MG Cable for the computer users. Experiments carried out in this work could further the discussion as to whether an innocent user could be punished for the unauthorized changes made by a programmable device.

Keywords: O.MG cable, programmable USB, file tampering attack, digital evidence credibility, miscarriage of justice, cyber fraud

Procedia PDF Downloads 127

694 Utilizing Hybrid File Mapping for High-Performance I/O

Authors: Jaechun No

Abstract:

As the technology of NAND flash memory rapidly grows, SSD is becoming an excellent alternative for storage solutions, because of its high random I/O throughput and low power consumption. These SSD potentials have drawn great attention from IT enterprises that seek for better I/O performance. However, high SSD cost per capacity makes it less desirable to construct a large-scale storage subsystem solely composed of SSD devices. An alternative is to build a hybrid storage subsystem where both HDD and SSD devices are incorporated in an economic manner, while employing the strengths of both devices. This paper presents a hybrid file system, called hybridFS, that attempts to utilize the advantages of HDD and SSD devices, to provide a single, virtual address space by integrating both devices. HybridFS not only proposes an efficient implementation for the file management in the hybrid storage subsystem but also suggests an experimental framework for making use of the excellent features of existing file systems. Several performance evaluations were conducted to verify the effectiveness and suitability of hybridFS.

Keywords: hybrid file mapping, data layout, hybrid device integration, extent allocation

Procedia PDF Downloads 471

693 Developing NAND Flash-Memory SSD-Based File System Design

Authors: Jaechun No

Abstract:

This paper focuses on I/O optimizations of N-hybrid (New-Form of hybrid), which provides a hybrid file system space constructed on SSD and HDD. Although the promising potentials of SSD, such as the absence of mechanical moving overhead and high random I/O throughput, have drawn a lot of attentions from IT enterprises, its high ratio of cost/capacity makes it less desirable to build a large-scale data storage subsystem composed of only SSDs. In this paper, we present N-hybrid that attempts to integrate the strengths of SSD and HDD, to offer a single, large hybrid file system space. Several experiments were conducted to verify the performance of N-hybrid.

Keywords: SSD, data section, I/O optimizations, hybrid system

Procedia PDF Downloads 385

692 Enhance Security in XML Databases: XLog File for Severity-Aware Trust-Based Access Control

Authors: A: Asmawi, L. S. Affendey, N. I. Udzir, R. Mahmod

Abstract:

The topic of enhancing security in XML databases is important as it includes protecting sensitive data and providing a secure environment to users. In order to improve security and provide dynamic access control for XML databases, we presented XLog file to calculate user trust values by recording users’ bad transaction, errors and query severities. Severity-aware trust-based access control for XML databases manages the access policy depending on users' trust values and prevents unauthorized processes, malicious transactions and insider threats. Privileges are automatically modified and adjusted over time depending on user behaviour and query severity. Logging in database is an important process and is used for recovery and security purposes. In this paper, the Xlog file is presented as a dynamic and temporary log file for XML databases to enhance the level of security.

Keywords: XML database, trust-based access control, severity-aware, trust values, log file

Procedia PDF Downloads 268

691 When Messages Cause Distraction from Advertising: An Eye-Tracking Study

Authors: Nilamadhab Mohanty

Abstract:

It is essential to use message formats that make communication understandable and correct. It is because; the information format can influence consumer decision on the purchase of a product. This study combines information from qualitative inquiry, media trend analysis, eye tracking experiment, and questionnaire data to examine the impact of specific message format and consumer perceived risk on attention to the information and risk retention. We investigated the influence of message framing (goal framing, attribute framing, and mix framing) on consumer memory, study time, and decisional uncertainty while deciding on the purchase of drugs. Furthermore, we explored the impact of consumer perceived risk (associated with the use of the drug, i.e., RISK-AB and perceived risk associated with the non-use of the drug, i.e., RISK-EB) on message format preference. The study used eye-tracking methods to understand the differences in message processing. Findings of the study suggest that the message format influences information processing, and participants' risk perception impacts message format preference. Eye tracking can be used to understand the format differences and design effective advertisements.

Keywords: message framing, consumer perceived risk, advertising, eye tracking

Procedia PDF Downloads 91

690 The Development of Encrypted Near Field Communication Data Exchange Format Transmission in an NFC Passive Tag for Checking the Genuine Product

Authors: Tanawat Hongthai, Dusit Thanapatay

Abstract:

This paper presents the development of encrypted near field communication (NFC) data exchange format transmission in an NFC passive tag for the feasibility of implementing a genuine product authentication. We propose a research encryption and checking the genuine product into four major categories; concept, infrastructure, development and applications. This result shows the passive NFC-forum Type 2 tag can be configured to be compatible with the NFC data exchange format (NDEF), which can be automatically partially data updated when there is NFC field.

Keywords: near field communication, NFC data exchange format, checking the genuine product, encrypted NFC

Procedia PDF Downloads 249

689 Design and Development of a Computerized Medical Record System for Hospitals in Remote Areas

Authors: Grace Omowunmi Soyebi

Abstract:

A computerized medical record system is a collection of medical information about a person that is stored on a computer. One principal problem of most hospitals in rural areas is using the file management system for keeping records. A lot of time is wasted when a patient visits the hospital, probably in an emergency, and the nurse or attendant has to search through voluminous files before the patient's file can be retrieved, this may cause an unexpected to happen to the patient. This Data Mining application is to be designed using a Structured System Analysis and design method which will help in a well-articulated analysis of the existing file management system, feasibility study, and proper documentation of the Design and Implementation of a Computerized medical record system. This Computerized system will replace the file management system and help to quickly retrieve a patient's record with increased data security, access clinical records for decision-making, and reduce the time range at which a patient gets attended to.

Keywords: programming, computing, data, innovation

Procedia PDF Downloads 90

688 A Guide to the Implementation of Ambisonics Super Stereo

Authors: Alessio Mastrorillo, Giuseppe Silvi, Francesco Scagliola

Abstract:

In this work, we introduce an Ambisonics decoder with an implementation of the C-format, also called Super Stereo. This format is an alternative to conventional stereo and binaural decoding. Unlike those, this format conveys audio information from the horizontal plane and works with stereo speakers and headphones. The two C-format channels can also return a reconstructed planar B-format. This work provides an open-source implementation for this format. We implement an all-pass filter for signal quadrature, as required by the decoding equations. This filter works with six Biquads in a cascade configuration, with values for control frequency and quality factor discovered experimentally. The phase response of the filter delivers a small error in the 20-14.000Hz range. The decoder has been tested with audio sources up to 192kHz sample rate, returning pristine sound quality and detailed stereo image. It has been included in the Envelop for Live suite and is available as an open-source repository. This decoder has applications in Virtual Reality and 360° audio productions, music composition, and online streaming.

Keywords: ambisonics, UHJ, quadrature filter, virtual reality, Gerzon, decoder, stereo, binaural, biquad

Procedia PDF Downloads 62

687 Design and Development of a Computerized Medical Record System for Hospitals in Remote Areas

Authors: Grace Omowunmi Soyebi

Abstract:

A computerized medical record system is a collection of medical information about a person that is stored on a computer. One principal problem of most hospitals in rural areas is using the file management system for keeping records. A lot of time is wasted when a patient visits the hospital, probably in an emergency, and the nurse or attendant has to search through voluminous files before the patient's file can be retrieved; this may cause an unexpected to happen to the patient. This data mining application is to be designed using a structured system analysis and design method which will help in a well-articulated analysis of the existing file management system, feasibility study, and proper documentation of the design and implementation of a computerized medical record system. This computerized system will replace the file management system and help to quickly retrieve a patient's record with increased data security, access clinical records for decision-making, and reduce the time range at which a patient gets attended to.

Keywords: programming, data, software development, innovation

Procedia PDF Downloads 50

686 An End-to-end Piping and Instrumentation Diagram Information Recognition System

Authors: Taekyong Lee, Joon-Young Kim, Jae-Min Cha

Abstract:

Piping and instrumentation diagram (P&ID) is an essential design drawing describing the interconnection of process equipment and the instrumentation installed to control the process. P&IDs are modified and managed throughout a whole life cycle of a process plant. For the ease of data transfer, P&IDs are generally handed over from a design company to an engineering company as portable document format (PDF) which is hard to be modified. Therefore, engineering companies have to deploy a great deal of time and human resources only for manually converting P&ID images into a computer aided design (CAD) file format. To reduce the inefficiency of the P&ID conversion, various symbols and texts in P&ID images should be automatically recognized. However, recognizing information in P&ID images is not an easy task. A P&ID image usually contains hundreds of symbol and text objects. Most objects are pretty small compared to the size of a whole image and are densely packed together. Traditional recognition methods based on geometrical features are not capable enough to recognize every elements of a P&ID image. To overcome these difficulties, state-of-the-art deep learning models, RetinaNet and connectionist text proposal network (CTPN) were used to build a system for recognizing symbols and texts in a P&ID image. Using the RetinaNet and the CTPN model carefully modified and tuned for P&ID image dataset, the developed system recognizes texts, equipment symbols, piping symbols and instrumentation symbols from an input P&ID image and save the recognition results as the pre-defined extensible markup language format. In the test using a commercial P&ID image, the P&ID information recognition system correctly recognized 97% of the symbols and 81.4% of the texts.

Keywords: object recognition system, P&ID, symbol recognition, text recognition

Procedia PDF Downloads 112

685 Dynamics of Chirped RZ Modulation Format in GEPON Fiber to the Home (FTTH) Network

Authors: Anurag Sharma, Manoj Kumar, Ashima, Sooraj Parkash

Abstract:

The work in this paper presents simulative comparison for different modulation formats such as NRZ, Manchester and CRZ in a 100 subscribers at 5 Gbps bit rate Gigabit Ethernet Passive Optical Network (GEPON) FTTH network. It is observed from the simulation results that the CRZ modulation format is best suited for the designed system. A link design for 1:100 splitter is used as Passive Optical Network (PON) element which creates communication between central offices to different users. The Bit Error Rate (BER) is found to be 2.8535e-10 at 5 Gbit/s systems for CRZ modulation format.

Keywords: PON , FTTH, OLT, ONU, CO, GEPON

Procedia PDF Downloads 664

684 Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition

Authors: L. Hamsaveni, Navya Prakash, Suresha

Abstract:

Document Image Analysis recognizes text and graphics in documents acquired as images. An approach without Optical Character Recognition (OCR) for degraded document image analysis has been adopted in this paper. The technique involves document imaging methods such as Image Fusing and Speeded Up Robust Features (SURF) Detection to identify and extract the degraded regions from a set of document images to obtain an original document with complete information. In case, degraded document image captured is skewed, it has to be straightened (deskew) to perform further process. A special format of image storing known as YCbCr is used as a tool to convert the Grayscale image to RGB image format. The presented algorithm is tested on various types of degraded documents such as printed documents, handwritten documents, old script documents and handwritten image sketches in documents. The purpose of this research is to obtain an original document for a given set of degraded documents of the same source.

Keywords: grayscale image format, image fusing, RGB image format, SURF detection, YCbCr image format

Procedia PDF Downloads 341

683 Design and Development of Data Mining Application for Medical Centers in Remote Areas

Authors: Grace Omowunmi Soyebi

Abstract:

Data Mining is the extraction of information from a large database which helps in predicting a trend or behavior, thereby helping management make knowledge-driven decisions. One principal problem of most hospitals in rural areas is making use of the file management system for keeping records. A lot of time is wasted when a patient visits the hospital, probably in an emergency, and the nurse or attendant has to search through voluminous files before the patient's file can be retrieved; this may cause an unexpected to happen to the patient. This Data Mining application is to be designed using a Structured System Analysis and design method, which will help in a well-articulated analysis of the existing file management system, feasibility study, and proper documentation of the Design and Implementation of a Computerized medical record system. This Computerized system will replace the file management system and help to easily retrieve a patient's record with increased data security, access clinical records for decision-making, and reduce the time range at which a patient gets attended to.

Keywords: data mining, medical record system, systems programming, computing

Procedia PDF Downloads 178

682 Searching for Forensic Evidence in a Compromised Virtual Web Server against SQL Injection Attacks and PHP Web Shell

Authors: Gigih Supriyatno

Abstract:

SQL injection is one of the most common types of attacks and has a very critical impact on web servers. In the worst case, an attacker can perform post-exploitation after a successful SQL injection attack. In the case of forensics web servers, web server analysis is closely related to log file analysis. But sometimes large file sizes and different log types make it difficult for investigators to look for traces of attackers on the server. The purpose of this paper is to help investigator take appropriate steps to investigate when the web server gets attacked. We use attack scenarios using SQL injection attacks including PHP backdoor injection as post-exploitation. We perform post-mortem analysis of web server logs based on Hypertext Transfer Protocol (HTTP) POST and HTTP GET method approaches that are characteristic of SQL injection attacks. In addition, we also propose structured analysis method between the web server application log file, database application, and other additional logs that exist on the webserver. This method makes the investigator more structured to analyze the log file so as to produce evidence of attack with acceptable time. There is also the possibility that other attack techniques can be detected with this method. On the other side, it can help web administrators to prepare their systems for the forensic readiness.

Keywords: web forensic, SQL injection, investigation, web shell

Procedia PDF Downloads 115

681 Signs-Only Compressed Row Storage Format for Exact Diagonalization Study of Quantum Fermionic Models

Authors: Michael Danilov, Sergei Iskakov, Vladimir Mazurenko

Abstract:

The present paper describes a high-performance parallel realization of an exact diagonalization solver for quantum-electron models in a shared memory computing system. The proposed algorithm contains a storage format for efficient computing eigenvalues and eigenvectors of a quantum electron Hamiltonian matrix. The results of the test calculations carried out for 15 sites Hubbard model demonstrate reduction in the required memory and good multiprocessor scalability, while maintaining performance of the same order as compressed row storage.

Keywords: sparse matrix, compressed format, Hubbard model, Anderson model

Procedia PDF Downloads 363

680 Building User Behavioral Models by Processing Web Logs and Clustering Mechanisms

Authors: Madhuka G. P. D. Udantha, Gihan V. Dias, Surangika Ranathunga

Abstract:

Today Websites contain very interesting applications. But there are only few methodologies to analyze User navigations through the Websites and formulating if the Website is put to correct use. The web logs are only used if some major attack or malfunctioning occurs. Web Logs contain lot interesting dealings on users in the system. Analyzing web logs has become a challenge due to the huge log volume. Finding interesting patterns is not as easy as it is due to size, distribution and importance of minor details of each log. Web logs contain very important data of user and site which are not been put to good use. Retrieving interesting information from logs gives an idea of what the users need, group users according to their various needs and improve site to build an effective and efficient site. The model we built is able to detect attacks or malfunctioning of the system and anomaly detection. Logs will be more complex as volume of traffic and the size and complexity of web site grows. Unsupervised techniques are used in this solution which is fully automated. Expert knowledge is only used in validation. In our approach first clean and purify the logs to bring them to a common platform with a standard format and structure. After cleaning module web session builder is executed. It outputs two files, Web Sessions file and Indexed URLs file. The Indexed URLs file contains the list of URLs accessed and their indices. Web Sessions file lists down the indices of each web session. Then DBSCAN and EM Algorithms are used iteratively and recursively to get the best clustering results of the web sessions. Using homogeneity, completeness, V-measure, intra and inter cluster distance and silhouette coefficient as parameters these algorithms self-evaluate themselves to input better parametric values to run the algorithms. If a cluster is found to be too large then micro-clustering is used. Using Cluster Signature Module the clusters are annotated with a unique signature called finger-print. In this module each cluster is fed to Associative Rule Learning Module. If it outputs confidence and support as value 1 for an access sequence it would be a potential signature for the cluster. Then the access sequence occurrences are checked in other clusters. If it is found to be unique for the cluster considered then the cluster is annotated with the signature. These signatures are used in anomaly detection, prevent cyber attacks, real-time dashboards that visualize users, accessing web pages, predict actions of users and various other applications in Finance, University Websites, News and Media Websites etc.

Keywords: anomaly detection, clustering, pattern recognition, web sessions

Procedia PDF Downloads 256

679 Estimating Tree Height and Forest Classification from Multi Temporal Risat-1 HH and HV Polarized Satellite Aperture Radar Interferometric Phase Data

Authors: Saurav Kumar Suman, P. Karthigayani

Abstract:

In this paper the height of the tree is estimated and forest types is classified from the multi temporal RISAT-1 Horizontal-Horizontal (HH) and Horizontal-Vertical (HV) Polarised Satellite Aperture Radar (SAR) data. The novelty of the proposed project is combined use of the Back-scattering Coefficients (Sigma Naught) and the Coherence. It uses Water Cloud Model (WCM). The approaches use two main steps. (a) Extraction of the different forest parameter data from the Product.xml, BAND-META file and from Grid-xxx.txt file come with the HH & HV polarized data from the ISRO (Indian Space Research Centre). These file contains the required parameter during height estimation. (b) Calculation of the Vegetation and Ground Backscattering, Coherence and other Forest Parameters. (c) Classification of Forest Types using the ENVI 5.0 Tool and ROI (Region of Interest) calculation.

Keywords: RISAT-1, classification, forest, SAR data

Procedia PDF Downloads 374

678 The Effect of Penalizing Wrong Answers in the Computerized Modified Multiple Choice Testing System

Authors: Min Hae Song, Jooyong Park

Abstract:

Even though assessment using information and communication technology will most likely lead the future of educational assessment, there is little research on this topic. Computerized assessment will not only cut costs but also measure students' performance in ways not possible before. In this context, this study introduces a tool which can overcome the problems of multiple choice tests. Multiple-choice tests (MC) are efficient in automatic grading, however structural problems of multiple-choice tests allow students to find the correct answer from options even though they do not know the answer. A computerized modified multiple-choice testing system (CMMT) was developed using the interactivity of computers, that presents questions first, and options later for a short time when the student requests for them. This study was conducted to find out whether penalizing for wrong answers in CMMT could lower random guessing. In this study, we checked whether students knew the answers by having them respond to the short-answer tests before choosing the given options in CMMT or MC format. Ninety-four students were tested with the directions that they will be penalized for wrong answers, but not for no response. There were 4 experimental conditions: two conditions of high or low percentage of penalizing, each in traditional multiple-choice or CMMT format. In the low penalty condition, the penalty rate was the probability of getting the correct answer by random guessing. In the high penalty condition, students were penalized at twice the percentage of the low penalty condition. The results showed that the number of no response was significantly higher for the CMMT format and the number of random guesses was significantly lower for the CMMT format. There were no significant between the two penalty conditions. This result may be due to the fact that the actual score difference between the two conditions was too small. In the discussion, the possibility of applying CMMT format tests while penalizing wrong answers in actual testing settings was addressed.

Keywords: computerized modified multiple choice test format, multiple-choice test format, penalizing, test format

Procedia PDF Downloads 139

677 Large-Scale Simulations of Turbulence Using Discontinuous Spectral Element Method

Authors: A. Peyvan, D. Li, J. Komperda, F. Mashayek

Abstract:

Turbulence can be observed in a variety fluid motions in nature and industrial applications. Recent investment in high-speed aircraft and propulsion systems has revitalized fundamental research on turbulent flows. In these systems, capturing chaotic fluid structures with different length and time scales is accomplished through the Direct Numerical Simulation (DNS) approach since it accurately simulates flows down to smallest dissipative scales, i.e., Kolmogorov’s scales. The discontinuous spectral element method (DSEM) is a high-order technique that uses spectral functions for approximating the solution. The DSEM code has been developed by our research group over the course of more than two decades. Recently, the code has been improved to run large cases in the order of billions of solution points. Running big simulations requires a considerable amount of RAM. Therefore, the DSEM code must be highly parallelized and able to start on multiple computational nodes on an HPC cluster with distributed memory. However, some pre-processing procedures, such as determining global element information, creating a global face list, and assigning global partitioning and element connection information of the domain for communication, must be done sequentially with a single processing core. A separate code has been written to perform the pre-processing procedures on a local machine. It stores the minimum amount of information that is required for the DSEM code to start in parallel, extracted from the mesh file, into text files (pre-files). It packs integer type information with a Stream Binary format in pre-files that are portable between machines. The files are generated to ensure fast read performance on different file-systems, such as Lustre and General Parallel File System (GPFS). A new subroutine has been added to the DSEM code to read the startup files using parallel MPI I/O, for Lustre, in a way that each MPI rank acquires its information from the file in parallel. In case of GPFS, in each computational node, a single MPI rank reads data from the file, which is specifically generated for the computational node, and send them to other ranks on the node using point to point non-blocking MPI communication. This way, communication takes place locally on each node and signals do not cross the switches of the cluster. The read subroutine has been tested on Argonne National Laboratory’s Mira (GPFS), National Center for Supercomputing Application’s Blue Waters (Lustre), San Diego Supercomputer Center’s Comet (Lustre), and UIC’s Extreme (Lustre). The tests showed that one file per node is suited for GPFS and parallel MPI I/O is the best choice for Lustre file system. The DSEM code relies on heavily optimized linear algebra operation such as matrix-matrix and matrix-vector products for calculation of the solution in every time-step. For this, the code can either make use of its matrix math library, BLAS, Intel MKL, or ATLAS. This fact and the discontinuous nature of the method makes the DSEM code run efficiently in parallel. The results of weak scaling tests performed on Blue Waters showed a scalable and efficient performance of the code in parallel computing.

Keywords: computational fluid dynamics, direct numerical simulation, spectral element, turbulent flow

Procedia PDF Downloads 103

676 Detecting Venomous Files in IDS Using an Approach Based on Data Mining Algorithm

Authors: Sukhleen Kaur

Abstract:

In security groundwork, Intrusion Detection System (IDS) has become an important component. The IDS has received increasing attention in recent years. IDS is one of the effective way to detect different kinds of attacks and malicious codes in a network and help us to secure the network. Data mining techniques can be implemented to IDS, which analyses the large amount of data and gives better results. Data mining can contribute to improving intrusion detection by adding a level of focus to anomaly detection. So far the study has been carried out on finding the attacks but this paper detects the malicious files. Some intruders do not attack directly, but they hide some harmful code inside the files or may corrupt those file and attack the system. These files are detected according to some defined parameters which will form two lists of files as normal files and harmful files. After that data mining will be performed. In this paper a hybrid classifier has been used via Naive Bayes and Ripper classification methods. The results show how the uploaded file in the database will be tested against the parameters and then it is characterised as either normal or harmful file and after that the mining is performed. Moreover, when a user tries to mine on harmful file it will generate an exception that mining cannot be made on corrupted or harmful files.

Keywords: data mining, association, classification, clustering, decision tree, intrusion detection system, misuse detection, anomaly detection, naive Bayes, ripper

Procedia PDF Downloads 387

675 Multimedia Container for Autonomous Car

Authors: Janusz Bobulski, Mariusz Kubanek

Abstract:

The main goal of the research is to develop a multimedia container structure containing three types of images: RGB, lidar and infrared, properly calibrated to each other. An additional goal is to develop program libraries for creating and saving this type of file and for restoring it. It will also be necessary to develop a method of data synchronization from lidar and RGB cameras as well as infrared. This type of file could be used in autonomous vehicles, which would certainly facilitate data processing by the intelligent autonomous vehicle management system. Autonomous cars are increasingly breaking into our consciousness. No one seems to have any doubts that self-driving cars are the future of motoring. Manufacturers promise that moving the first of them to showrooms is the prospect of the next few years. Many experts believe that creating a network of communicating autonomous cars will be able to completely eliminate accidents. However, to make this possible, it is necessary to develop effective methods of detection of objects around the moving vehicle. In bad weather conditions, this task is difficult on the basis of the RGB(red, green, blue) image. Therefore, in such situations, you should be supported by information from other sources, such as lidar or infrared cameras. The problem is the different data formats that individual types of devices return. In addition to these differences, there is a problem with the synchronization of these data and the formatting of this data. The goal of the project is to develop a file structure that could be containing a different type of data. This type of file is calling a multimedia container. A multimedia container is a container that contains many data streams, which allows you to store complete multimedia material in one file. Among the data streams located in such a container should be indicated streams of images, films, sounds, subtitles, as well as additional information, i.e., metadata. This type of file could be used in autonomous vehicles, which would certainly facilitate data processing by the intelligent autonomous vehicle management system. As shown by preliminary studies, the use of combining RGB and InfraRed images with Lidar data allows for easier data analysis. Thanks to this application, it will be possible to display the distance to the object in a color photo. Such information can be very useful for drivers and for systems in autonomous cars.

Keywords: an autonomous car, image processing, lidar, obstacle detection

Procedia PDF Downloads 181