When Cache Blocking of Sparse Matrix Vector Multiply …:当缓存阻塞稀疏矩阵向量乘法….ppt
文本预览下载声明
Bandwidth Avoiding Stencil Computations By Kaushik Datta, Sam Williams, Kathy Yelick, and Jim Demmel, and others Berkeley Benchmarking and Optimization Group UC Berkeley March 13, 2008 kdatta@ Outline Outline What are stencil codes? For a given point, a stencil is a pre-determined set of nearest neighbors (possibly including itself) A stencil code updates every point in a regular grid with a constant weighted subset of its neighbors (“applying a stencil”) Stencil Applications Stencils are critical to many scientific applications: Diffusion, Electromagnetics, Computational Fluid Dynamics Both explicit and implicit iterative methods (e.g. Multigrid) Both uniform and adaptive block-structured meshes Na?ve Stencil Pseudocode (One iteration) void stencil3d(double A[], double B[], int nx, int ny, int nz) { for all grid indices in x-dim { for all grid indices in y-dim { for all grid indices in z-dim { B[center] = S0* A[center] + S1*(A[top] + A[bottom] + A[left] + A[right] + A[front] + A[back]); } } } } 2D Poisson Stencil- Specific Form of SpMV Stencil uses an implicit matrix No indirect array accesses! Stores a single value for each diagonal 3D stencil is analagous (but with 7 nonzero diagonals) Reduce Memory Traffic! Stencil performance usually limited by memory bandwidth Goal: Increase performance by minimizing memory traffic Even more important for multicore! Concentrate on getting reuse both: within an iteration across iterations (Ax, A2x, …, Akx) Only interested in final result Outline Grid Traversal Algorithms Grid Traversal Algorithms Na?ve Algorithm Traverse the 3D grid in the usual way No exploitation of locality Grids that don’t fit in cache will suffer Grid Traversal Algorithms Cache Blocking- Single Iteration At a Time Guarantees reuse within an iteration “Shrinks” each plane so that three source planes fit into cache However, no reuse across iterations In 3D, there is tradeoff be
显示全部