Parallel Vector Processing Using Multi Level Orbital DATA

Nagi Mekhiel

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Parallel Vector Processing Using Multi Level Orbital DATA

Authors: Nagi Mekhiel

Abstract:

Many applications use vector operations by applying single instruction to multiple data that map to different locations in conventional memory. Transferring data from memory is limited by access latency and bandwidth affecting the performance gain of vector processing. We present a memory system that makes all of its content available to processors in time so that processors need not to access the memory, we force each location to be available to all processors at a specific time. The data move in different orbits to become available to other processors in higher orbits at different time. We use this memory to apply parallel vector operations to data streams at first orbit level. Data processed in the first level move to upper orbit one data element at a time, allowing a processor in that orbit to apply another vector operation to deal with serial code limitations inherited in all parallel applications and interleaved it with lower level vector operations.

Keywords: Memory organization, parallel processors, serial code, vector processing.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1128965

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1065

References:

[1] J. Hennessy, D. A. Patterson Computer Architecture: A Quantitative Approach Morgan Kaufmann Publishers, Inc, San Francisco, CA, 1996.
[2] Agarwal, B. H. Lim, D. Kranz and J. Kubiatowicz, April: A processor architecture for Multiprocessing, in Proceedings of the 17th Annual International Symposium on Computer Architectures, pages 104-114, May 1990.
[3] D. Burger, J. R. Goodman, and A. Kagi, Memory Bandwidth of Future Microprocessors, In Proc. 23rd Annual Int. Symp. on Computer Architecture, (ISCA’96), pp.78-89, Philadelphia, PA, 1996.
[4] Saulsbury, A.; Nowatzyk, A. Missing the memory wall: the case for processor memory integration, ISCA96: The 23rd Annual International Conference on Computer Architecture, Philadelphia, PA, USA, 22-24 May 1996 p.90-101.
[5] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Camean, A. Kyker, and P. Roussel, The microarchitecture of the Pentium 4 processor, Intel Technology Journal, 5(1), pages 1-133, Feb. 2001.
[6] Eichenberger et al., International Business Machines Corporation, Armonk, NY (US) Vector LoadsWith Multiple Vector Elements From a Same Cache Line in a Scattered Load Operation, US 8,904,153 B2 Dec. 2, 2014.
[7] Mekhiel, Data processing with time-based memory access, US 8914612B2 Dec 16, 2014.
[8] Introducing TAM: ”Time Based Access Memory”, Nagi Mekhiel, IEEE Access journal, March 30, 2016. P. 1061-1073 Volume 4.