|
|
|
|
|
BCDR: Data Reuse Framework for Multi-core Systems with Local Memories |
|
PP: 305-315 |
|
Author(s) |
|
Jue Wang,
YanGang Wang,
|
|
Abstract |
|
Emerging heterogeneous multi-core systems, such as the IBM Cell BE, are deployed with multiple hardware accelerators
to enhance the performance of the systems. In these systems, each accelerator includes its own local memory where software
controlled DMA transfers are provided to utilize the memory bandwidth. Two important software controlled management methods
(direct buffering and software controlled cache) are applied in regular and irregular references, respectively. The run-time coherence
maintenance is performed when the same global memory location is referenced by both software controlled cache and buffer.This
paper proposes a BCDR framework to exploit data reuse for buffers and software cache. The framework includes buffer2buffer data
reuse optimizations, buffer2cache/cache2buffer data reuse optimizations and buffered array identification. For buffer2buffer data reuse
optimizations, the Retaining Buffered Data technique and pipelining optimization are given to optimize critical region after a basic
data reuse optimization. To make use of the opportunity induced by buffer2buffer optimizations, the buffer2cache/cache2buffer data
reuse optimizations are presented to improve the performance of applications with irregular accesses. Furthermore, a buffered data
identification algorithm is presented to increase the precise of global data-flow analysis for the coherence maintenance between SCC
and buffers. The experimental results show that our optimizations expose many opportunities for both buffer and cache. The transferred
data amount between the local store and global memory is reduced by 16.35% on average for all cases. Our optimizations further reduce
19.7% of the average execution time. In addition, the run-time coherence maintenance overhead is reduced significantly. |
|
|
|
|
|