Search results for: Parallel efficiency.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2949

Search results for: Parallel efficiency.

2889 Solving Facility Location Problem on Cluster Computing

Authors: Ei Phyo Wai, Nay Min Tun

Abstract:

Computation of facility location problem for every location in the country is not easy simultaneously. Solving the problem is described by using cluster computing. A technique is to design parallel algorithm by using local search with single swap method in order to solve that problem on clusters. Parallel implementation is done by the use of portable parallel programming, Message Passing Interface (MPI), on Microsoft Windows Compute Cluster. In this paper, it presents the algorithm that used local search with single swap method and implementation of the system of a facility to be opened by using MPI on cluster. If large datasets are considered, the process of calculating a reasonable cost for a facility becomes time consuming. The result shows parallel computation of facility location problem on cluster speedups and scales well as problem size increases.

Keywords: cluster, cost, demand, facility location

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1444
2888 Energy-Efficient Electrical Power Distribution with Multi-Agent Control at Parallel DC/DC Converters

Authors: Janos Hamar, Peter Bartal, Daniel T. Sepsi

Abstract:

Consumer electronics are pervasive. It is impossible to imagine a household or office without DVD players, digital cameras, printers, mobile phones, shavers, electrical toothbrushes, etc. All these devices operate at different voltage levels ranging from 1.8 to 20 VDC, in the absence of universal standards. The voltages available are however usually 120/230 VAC at 50/60 Hz. This situation makes an individual electrical energy conversion system necessary for each device. Such converters usually involve several conversion stages and often operate with excessive losses and poor reliability. The aim of the project presented in this paper is to design and implement a multi-channel DC/DC converter system, customizing the output voltage and current ratings according to the requirements of the load. Distributed, multi-agent techniques will be applied for the control of the DC/DC converters.

Keywords: DC/DC converter, energy efficiency, multi-agentcontrol, parallel converters.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1403
2887 Security over OFDM Fading Channels with Friendly Jammer

Authors: Munnujahan Ara

Abstract:

In this paper, we investigate the effect of friendly jamming power allocation strategies on the achievable average secrecy rate over a bank of parallel fading wiretap channels. We investigate the achievable average secrecy rate in parallel fading wiretap channels subject to Rayleigh and Rician fading. The achievable average secrecy rate, due to the presence of a line-of-sight component in the jammer channel is also evaluated. Moreover, we study the detrimental effect of correlation across the parallel sub-channels, and evaluate the corresponding decrease in the achievable average secrecy rate for the various fading configurations. We also investigate the tradeoff between the transmission power and the jamming power for a fixed total power budget. Our results, which are applicable to current orthogonal frequency division multiplexing (OFDM) communications systems, shed further light on the achievable average secrecy rates over a bank of parallel fading channels in the presence of friendly jammers.

Keywords: Fading parallel channels, Wire-tap channel, OFDM, Secrecy capacity, Power allocation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2210
2886 Efficient Implementation of Serial and Parallel Support Vector Machine Training with a Multi-Parameter Kernel for Large-Scale Data Mining

Authors: Tatjana Eitrich, Bruno Lang

Abstract:

This work deals with aspects of support vector learning for large-scale data mining tasks. Based on a decomposition algorithm that can be run in serial and parallel mode we introduce a data transformation that allows for the usage of an expensive generalized kernel without additional costs. In order to speed up the decomposition algorithm we analyze the problem of working set selection for large data sets and analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our modifications and settings lead to improvement of support vector learning performance and thus allow using extensive parameter search methods to optimize classification accuracy.

Keywords: Support Vector Machines, Shared Memory Parallel Computing, Large Data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1541
2885 Motion Estimator Architecture with Optimized Number of Processing Elements for High Efficiency Video Coding

Authors: Seongsoo Lee

Abstract:

Motion estimation occupies the heaviest computation in HEVC (high efficiency video coding). Many fast algorithms such as TZS (test zone search) have been proposed to reduce the computation. Still the huge computation of the motion estimation is a critical issue in the implementation of HEVC video codec. In this paper, motion estimator architecture with optimized number of PEs (processing element) is presented by exploiting early termination. It also reduces hardware size by exploiting parallel processing. The presented motion estimator architecture has 8 PEs, and it can efficiently perform TZS with very high utilization of PEs.

Keywords: Motion estimation, test zone search, high efficiency video coding, processing element, optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1511
2884 Modified Montgomery for RSA Cryptosystem

Authors: Rupali Verma, Maitreyee Dutta, Renu Vig

Abstract:

Encryption and decryption in RSA are done by modular exponentiation which is achieved by repeated modular multiplication. Hence efficiency of modular multiplication directly determines the efficiency of RSA cryptosystem. This paper designs a Modified Montgomery Modular Multiplication in which addition of operands is computed by 4:2 compressor. The basic logic operations in addition are partitioned over two iterations such that parallel computations are performed. This reduces the critical path delay of proposed Montgomery design. The proposed design and RSA are implemented on Virtex 2 and Virtex 5 FPGAs. The two factors partitioning and parallelism have improved the frequency and throughput of proposed design.

Keywords: RSA, Montgomery modular multiplication, 4:2 compressor, FPGA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2553
2883 Measuring the Efficiency of Medical Equipment

Authors: Panagiotis H. Tsarouhas

Abstract:

the reliability analysis of the medical equipments can help to increase the availability and the efficiency of the systems. In this manuscript we present a simple method of decomposition that could be easily applied on the complex medical systems. Using this method we can easily calculate the effect of the subsystems or components on the reliability of the overall system. Furthermore, to investigate the effect of subsystems or components on system performance, we perform a numerical study varying every time the worst reliability of subsystem or component with another which has higher reliability. It can also be useful to engineers and designers of medical equipment, who wishes to optimize the complex systems.

Keywords: Reliability, Availability, Series-parallel System, medical equipment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2347
2882 Forward Kinematics Analysis of a 3-PRS Parallel Manipulator

Authors: Ghasem Abbasnejad, Soheil Zarkandi, Misagh Imani

Abstract:

In this article the homotopy continuation method (HCM) to solve the forward kinematic problem of the 3-PRS parallel manipulator is used. Since there are many difficulties in solving the system of nonlinear equations in kinematics of manipulators, the numerical solutions like Newton-Raphson are inevitably used. When dealing with any numerical solution, there are two troublesome problems. One is that good initial guesses are not easy to detect and another is related to whether the used method will converge to useful solutions. Results of this paper reveal that the homotopy continuation method can alleviate the drawbacks of traditional numerical techniques.

Keywords: Forward kinematics, Homotopy continuationmethod, Parallel manipulators, Rotation matrix

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2613
2881 Anaerobic Treatment of Petroleum Refinery Wastewater

Authors: H. A. Gasim, S. R. M. Kutty, M. Hasnain Isa

Abstract:

Anaerobic treatment has many advantages over other biological method particularly when used to treat complex wastewater such as petroleum refinery wastewater. In this study two Up-flow Anaerobic Sludge Blanket (UASB) reactors were operated in parallel to treat six volumetric organic loads (0.58, 1.21, 0.89, 2.34, 1.47 and 4.14 kg COD/m3·d) to evaluate the chemical oxygen demand (COD) removal efficiency. The reactors were continuously adapting to the changing of operation condition with increase in the removal efficiency or slight decrease until the last load which was more than two times the load, at which the reactor stressed and the removal efficiency decreased to 75% with effluent concentration of 1746 mg COD/L. Other parameters were also monitored such as pH, alkalinity, volatile fatty acid and gas production rate. The UASB reactor was suitable to treat petroleum refinery wastewater and the highest COD removal rate was 83% at 1215 kg/m3·d with COD concentration about 356 mg/L in the effluent.

Keywords: Petroleum refinery wastewater, anaerobic digestion, UASB, organic volumetric loading rate

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2896
2880 Development of Circulating Support Environment of Multilingual Medical Communication using Parallel Texts for Foreign Patients

Authors: Mai Miyabe, Taku Fukushima, Takashi Yoshino, Aguri Shigeno

Abstract:

The need for multilingual communication in Japan has increased due to an increase in the number of foreigners in the country. When people communicate in their nonnative language, the differences in language prevent mutual understanding among the communicating individuals. In the medical field, communication between the hospital staff and patients is a serious problem. Currently, medical translators accompany patients to medical care facilities, and the demand for medical translators is increasing. However, medical translators cannot necessarily provide support, especially in cases in which round-the-clock support is required or in case of emergencies. The medical field has high expectations from information technology. Hence, a system that supports accurate multilingual communication is required. Despite recent advances in machine translation technology, it is very difficult to obtain highly accurate translations. We have developed a support system called M3 for multilingual medical reception. M3 provides support functions that aid foreign patients in the following respects: conversation, questionnaires, reception procedures, and hospital navigation; it also has a Q&A function. Users can operate M3 using a touch screen and receive text-based support. In addition, M3 uses accurate translation tools called parallel texts to facilitate reliable communication through conversations between the hospital staff and the patients. However, if there is no parallel text that expresses what users want to communicate, the users cannot communicate. In this study, we have developed a circulating support environment for multilingual medical communication using parallel texts. The proposed environment can circulate necessary parallel texts through the following procedure: (1) a user provides feedback about the necessary parallel texts, following which (2) these parallel texts are created and evaluated.

Keywords: multilingual medical communication, parallel texts.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1442
2879 GPU-Accelerated Triangle Mesh Simplification Using Parallel Vertex Removal

Authors: Thomas Odaker, Dieter Kranzlmueller, Jens Volkert

Abstract:

We present an approach to triangle mesh simplification designed to be executed on the GPU. We use a quadric error metric to calculate an error value for each vertex of the mesh and order all vertices based on this value. This step is followed by the parallel removal of a number of vertices with the lowest calculated error values. To allow for the parallel removal of multiple vertices we use a set of per-vertex boundaries that prevent mesh foldovers even when simplification operations are performed on neighbouring vertices. We execute multiple iterations of the calculation of the vertex errors, ordering of the error values and removal of vertices until either a desired number of vertices remains in the mesh or a minimum error value is reached. This parallel approach is used to speed up the simplification process while maintaining mesh topology and avoiding foldovers at every step of the simplification.

Keywords: Computer graphics, half edge collapse, mesh simplification, precomputed simplification, topology preserving.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2744
2878 Motion Protection System Design for a Parallel Motion Platform

Authors: Dongsu Wu, Hongbin Gu

Abstract:

A motion protection system is designed for a parallel motion platform with subsided cabin. Due to its complex structure, parallel mechanism is easy to encounter interference problems including link length limits, joints limits and self-collision. Thus a virtual spring algorithm in operational space is developed for the motion protection system to avoid potential damages caused by interference. Simulation results show that the proposed motion protection system can effectively eliminate interference problems and ensure safety of the whole motion platform.

Keywords: Motion protection, motion platform, parallelmechanism, Stewart platform, collision avoidance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1515
2877 Improvement of Parallel Compressor Model in Dealing Outlet Unequal Pressure Distribution

Authors: Kewei Xu, Jens Friedrich, Kevin Dwinger, Wei Fan, Xijin Zhang

Abstract:

Parallel Compressor Model (PCM) is a simplified approach to predict compressor performance with inlet distortions. In PCM calculation, it is assumed that the sub-compressors’ outlet static pressure is uniform and therefore simplifies PCM calculation procedure. However, if the compressor’s outlet duct is not long and straight, such assumption frequently induces error ranging from 10% to 15%. This paper provides a revised calculation method of PCM that can correct the error. The revised method employs energy equation, momentum equation and continuity equation to acquire needed parameters and replace the equal static pressure assumption. Based on the revised method, PCM is applied on two compression system with different blades types. The predictions of their performance in non-uniform inlet conditions are yielded through the revised calculation method and are employed to evaluate the method’s efficiency. Validating the results by experimental data, it is found that although little deviation occurs, calculated result agrees well with experiment data whose error ranges from 0.1% to 3%. Therefore, this proves the revised calculation method of PCM possesses great advantages in predicting the performance of the distorted compressor with limited exhaust duct.

Keywords: Parallel Compressor Model (PCM), Revised Calculation Method, Inlet Distortion, Outlet Unequal Pressure Distribution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1636
2876 Fast Database Indexing for Large Protein Sequence Collections Using Parallel N-Gram Transformation Algorithm

Authors: Jehad A. H. Hammad, Nur'Aini binti Abdul Rashid

Abstract:

With the rapid development in the field of life sciences and the flooding of genomic information, the need for faster and scalable searching methods has become urgent. One of the approaches that were investigated is indexing. The indexing methods have been categorized into three categories which are the lengthbased index algorithms, transformation-based algorithms and mixed techniques-based algorithms. In this research, we focused on the transformation based methods. We embedded the N-gram method into the transformation-based method to build an inverted index table. We then applied the parallel methods to speed up the index building time and to reduce the overall retrieval time when querying the genomic database. Our experiments show that the use of N-Gram transformation algorithm is an economical solution; it saves time and space too. The result shows that the size of the index is smaller than the size of the dataset when the size of N-Gram is 5 and 6. The parallel N-Gram transformation algorithm-s results indicate that the uses of parallel programming with large dataset are promising which can be improved further.

Keywords: Biological sequence, Database index, N-gram indexing, Parallel computing, Sequence retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2082
2875 Detecting the Edge of Multiple Images in Parallel

Authors: Prakash K. Aithal, U. Dinesh Acharya, Rajesh Gopakumar

Abstract:

Edge is variation of brightness in an image. Edge detection is useful in many application areas such as finding forests, rivers from a satellite image, detecting broken bone in a medical image etc. The paper discusses about finding edge of multiple aerial images in parallel. The proposed work tested on 38 images 37 colored and one monochrome image. The time taken to process N images in parallel is equivalent to time taken to process 1 image in sequential. Message Passing Interface (MPI) and Open Computing Language (OpenCL) is used to achieve task and pixel level parallelism respectively.

Keywords: Edge detection, multicore, GPU, openCL, MPI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2304
2874 Experimental Parallel Architecture for Rendering 3D Model into MPEG-4 Format

Authors: Ajay Joshi, Surya Ismail

Abstract:

This paper will present the initial findings of a research into distributed computer rendering. The goal of the research is to create a distributed computer system capable of rendering a 3D model into an MPEG-4 stream. This paper outlines the initial design, software architecture and hardware setup for the system. Distributed computing means designing and implementing programs that run on two or more interconnected computing systems. Distributed computing is often used to speed up the rendering of graphical imaging. Distributed computing systems are used to generate images for movies, games and simulations. A topic of interest is the application of distributed computing to the MPEG-4 standard. During the course of the research, a distributed system will be created that can render a 3D model into an MPEG-4 stream. It is expected that applying distributed computing principals will speed up rendering, thus improving the usefulness and efficiency of the MPEG-4 standard

Keywords: Cluster, parallel architecture, rendering, MPEG-4.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1422
2873 Singularity Loci of Actuation Schemes for 3RRR Planar Parallel Manipulator

Authors: S. Ramana Babu, V. Ramachandra Raju, K. Ramji

Abstract:

This paper presents the effect of actuation schemes on the performance of parallel manipulators and also how the singularity loci have been changed in the reachable workspace of the manipulator with the choice of actuation scheme to drive the manipulator. The performance of the eight possible actuation schemes of 3RRR planar parallel manipulator is compared with each other. The optimal design problem is formulated to find the manipulator geometry that maximizes the singularity free conditioned workspace for all the eight actuation cases, the optimization problem is solved by using genetic algorithms.

Keywords: Actuation schemes, GCI, genetic algorithms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1562
2872 Unrelated Parallel Machines Scheduling Problem Using an Ant Colony Optimization Approach

Authors: Y. K. Lin, H. T. Hsieh, F. Y. Hsieh

Abstract:

Total weighted tardiness is a measure of customer satisfaction. Minimizing it represents satisfying the general requirement of on-time delivery. In this research, we consider an ant colony optimization (ACO) algorithm to solve the problem of scheduling unrelated parallel machines to minimize total weighted tardiness. The problem is NP-hard in the strong sense. Computational results show that the proposed ACO algorithm is giving promising results compared to other existing algorithms.

Keywords: ant colony optimization, total weighted tardiness, unrelated parallel machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1857
2871 Bounds on Reliability of Parallel Computer Interconnection Systems

Authors: Ranjan Kumar Dash, Chita Ranjan Tripathy

Abstract:

The evaluation of residual reliability of large sized parallel computer interconnection systems is not practicable with the existing methods. Under such conditions, one must go for approximation techniques which provide the upper bound and lower bound on this reliability. In this context, a new approximation method for providing bounds on residual reliability is proposed here. The proposed method is well supported by two algorithms for simulation purpose. The bounds on residual reliability of three different categories of interconnection topologies are efficiently found by using the proposed method

Keywords: Parallel computer network, reliability, probabilisticgraph, interconnection networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1299
2870 Minimizing Makespan Subject to Budget Limitation in Parallel Flow Shop

Authors: Amin Sahraeian

Abstract:

One of the criteria in production scheduling is Make Span, minimizing this criteria causes more efficiently use of the resources specially machinery and manpower. By assigning some budget to some of the operations the operation time of these activities reduces and affects the total completion time of all the operations (Make Span). In this paper this issue is practiced in parallel flow shops. At first we convert parallel flow shop to a network model and by using a linear programming approach it is identified in order to minimize make span (the completion time of the network) which activities (operations) are better to absorb the predetermined and limited budget. Minimizing the total completion time of all the activities in the network is equivalent to minimizing make span in production scheduling.

Keywords: parallel flow shop, make span, linear programming, budget

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1742
2869 Design and Implementation of Shared Memory based Parallel File System Logging Method for High Performance Computing

Authors: Hyeyoung Cho, Sungho Kim, SangDong Lee

Abstract:

I/O workload is a critical and important factor to analyze I/O pattern and file system performance. However tracing I/O operations on the fly distributed parallel file system is non-trivial due to collection overhead and a large volume of data. In this paper, we design and implement a parallel file system logging method for high performance computing using shared memory-based multi-layer scheme. It minimizes the overhead with reduced logging operation response time and provides efficient post-processing scheme through shared memory. Separated logging server can collect sequential logs from multiple clients in a cluster through packet communication. Implementation and evaluation result shows low overhead and high scalability of this architecture for high performance parallel logging analysis.

Keywords: I/O workload, PVFS, I/O Trace.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1523
2868 Novel Ridge Orientation Based Approach for Fingerprint Identification Using Co-Occurrence Matrix

Authors: Mehran Yazdi, Zahra Adelpour, Batoul Bahraini, Yasaman Keshtkar Jahromi

Abstract:

In this paper we use the property of co-occurrence matrix in finding parallel lines in binary pictures for fingerprint identification. In our proposed algorithm, we reduce the noise by filtering the fingerprint images and then transfer the fingerprint images to binary images using a proper threshold. Next, we divide the binary images into some regions having parallel lines in the same direction. The lines in each region have a specific angle that can be used for comparison. This method is simple, performs the comparison step quickly and has a good resistance in the presence of the noise.

Keywords: Parallel lines detection, co-occurrence matrix, fingerprint identification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1319
2867 An Output Oriented Super-Efficiency Model for Considering Time Lag Effect

Authors: Yanshuang Zhang, Byungho Jeong

Abstract:

There exists some time lag between the consumption of inputs and the production of outputs. This time lag effect should be considered in calculating efficiency of decision making units (DMU). Recently, a couple of DEA models were developed for considering time lag effect in efficiency evaluation of research activities. However, these models can’t discriminate efficient DMUs because of the nature of basic DEA model in which efficiency scores are limited to ‘1’. This problem can be resolved a super-efficiency model. However, a super efficiency model sometimes causes infeasibility problem. This paper suggests an output oriented super-efficiency model for efficiency evaluation under the consideration of time lag effect. A case example using a long term research project is given to compare the suggested model with the MpO model.

Keywords: DEA, Super-efficiency, Time Lag.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2639
2866 Power Efficiency Characteristics of Magnetohydrodynamic Thermodynamic Gas Cycle

Authors: Mahmoud Huleihil

Abstract:

In this study, the performance of a thermodynamic gas cycle of magnetohydrodynamic (MHD) power generation is considered and presented in terms of power efficiency curves. The dissipation mechanisms considered include: fluid friction modeled by means of the isentropic efficiency of the compressor, heat transfer leakage directly from the hot reservoir to the cold heat reservoir, and constant velocity of the MHD generator. The study demonstrates that power and efficiency vanish at the extremes of both slow and fast operating conditions. These points are demonstrated on power efficiency curves and the locus of efficiency at maximum power and the locus of maximum efficiency. Qualitatively, the considered loss mechanisms have a similar effect on the efficiency at maximum power operation and on maximum efficiency operation, thus these efficiencies are reduced, even for small values of the loss mechanisms.

Keywords: Magnetohydrodynamic generator, electrical efficiency, maximum power, maximum efficiency, heat engine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 667
2865 Parallel Vector Processing Using Multi Level Orbital DATA

Authors: Nagi Mekhiel

Abstract:

Many applications use vector operations by applying single instruction to multiple data that map to different locations in conventional memory. Transferring data from memory is limited by access latency and bandwidth affecting the performance gain of vector processing. We present a memory system that makes all of its content available to processors in time so that processors need not to access the memory, we force each location to be available to all processors at a specific time. The data move in different orbits to become available to other processors in higher orbits at different time. We use this memory to apply parallel vector operations to data streams at first orbit level. Data processed in the first level move to upper orbit one data element at a time, allowing a processor in that orbit to apply another vector operation to deal with serial code limitations inherited in all parallel applications and interleaved it with lower level vector operations.

Keywords: Memory organization, parallel processors, serial code, vector processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1020
2864 Analyzing the Factors that Cause Parallel Performance Degradation in Parallel Graph-Based Computations Using Graph500

Authors: Mustafa Elfituri, Jonathan Cook

Abstract:

Recently, graph-based computations have become more important in large-scale scientific computing as they can provide a methodology to model many types of relations between independent objects. They are being actively used in fields as varied as biology, social networks, cybersecurity, and computer networks. At the same time, graph problems have some properties such as irregularity and poor locality that make their performance different than regular applications performance. Therefore, parallelizing graph algorithms is a hard and challenging task. Initial evidence is that standard computer architectures do not perform very well on graph algorithms. Little is known exactly what causes this. The Graph500 benchmark is a representative application for parallel graph-based computations, which have highly irregular data access and are driven more by traversing connected data than by computation. In this paper, we present results from analyzing the performance of various example implementations of Graph500, including a shared memory (OpenMP) version, a distributed (MPI) version, and a hybrid version. We measured and analyzed all the factors that affect its performance in order to identify possible changes that would improve its performance. Results are discussed in relation to what factors contribute to performance degradation.

Keywords: Graph computation, Graph500 benchmark, parallel architectures, parallel programming, workload characterization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 479
2863 A Survey on Performance Tools for OpenMP

Authors: Mubarak S. Mohsen, Rosni Abdullah, Yong M. Teo

Abstract:

Advances in processors architecture, such as multicore, increase the size of complexity of parallel computer systems. With multi-core architecture there are different parallel languages that can be used to run parallel programs. One of these languages is OpenMP which embedded in C/Cµ or FORTRAN. Because of this new architecture and the complexity, it is very important to evaluate the performance of OpenMP constructs, kernels, and application program on multi-core systems. Performance is the activity of collecting the information about the execution characteristics of a program. Performance tools consists of at least three interfacing software layers, including instrumentation, measurement, and analysis. The instrumentation layer defines the measured performance events. The measurement layer determines what performance event is actually captured and how it is measured by the tool. The analysis layer processes the performance data and summarizes it into a form that can be displayed in performance tools. In this paper, a number of OpenMP performance tools are surveyed, explaining how each is used to collect, analyse, and display data collection.

Keywords: Parallel performance tools, OpenMP, multi-core.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1861
2862 Isotropic Stress Distribution in Cu/(001) Fe Two Sheets

Authors: A. Derardja, L. Baroura, M. Brioua

Abstract:

The nanotechnology based on epitaxial systems includes single or arranged misfit dislocations. In general, whatever is the type of dislocation or the geometry of the array formed by the dislocations; it is important for experimental studies to know exactly the stress distribution for which there is no analytical expression [1, 2]. This work, using a numerical analysis, deals with relaxation of epitaxial layers having at their interface a periodic network of edge misfit dislocations. The stress distribution is estimated by using isotropic elasticity. The results show that the thickness of the two sheets is a crucial parameter in the stress distributions and then in the profile of the two sheets. A comparative study between the case of single dislocation and the case of parallel network shows that the layers relaxed better when the interface is covered by a parallel arrangement of misfit. Consequently, a single dislocation at the interface produces an important stress field which can be reduced by inserting a parallel network of dislocations with suitable periodicity.

Keywords: Parallel array of misfit, interface, isotropic elasticity, single crystalline substrates, coherent interface

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1521
2861 Productive Design and Calculation of Intermittent Mechanisms with Radial Parallel Cams

Authors: Pavel Dostrašil, Petr Jirásko

Abstract:

The paper deals with the kinematics and automated calculation of intermittent mechanisms with radial cams. Currently, electronic cams are increasingly applied in the drives of working link mechanisms. Despite a huge advantage of electronic cams in their reprogrammability or instantaneous change of displacement diagrams, conventional cam mechanisms have an irreplaceable role in production and handling machines. With high frequency of working cycle periods, the dynamic load of the proper servomotor rotor increases and efficiency of electronic cams strongly decreases. Though conventional intermittent mechanisms with radial cams are representatives of fixed automation, they have distinct advantages in their high speed (high dynamics), positional accuracy and relatively easy manufacture. We try to remove the disadvantage of firm displacement diagram by reducing costs for simple design and automated calculation that leads reliably to high-quality and inexpensive manufacture.

Keywords: Cam mechanism, displacement diagram, intermittentmechanism, radial parallel cam

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1798
2860 On Fault Diagnosis of Asynchronous Sequential Machines with Parallel Composition

Authors: Jung-Min Yang

Abstract:

Fault diagnosis of composite asynchronous sequential machines with parallel composition is addressed in this paper. An adversarial input can infiltrate one of two submachines comprising the composite asynchronous machine, causing an unauthorized state transition. The objective is to characterize the condition under which the controller can diagnose any fault occurrence. Two control configurations, state feedback and output feedback, are considered in this paper. In the case of output feedback, the exact estimation of the state is impossible since the current state is inaccessible and the output feedback is given as the form of burst. A simple example is provided to demonstrate the proposed methodology.

Keywords: Asynchronous sequential machines, parallel composition, fault diagnosis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 932