CPL - Chalmers Publication Library
| Utbildning | Forskning | Styrkeområden | Om Chalmers | In English In English Ej inloggad.

Efficient stream compaction on wide SIMD many-core architectures

Markus Billeter (Institutionen för data- och informationsteknik, Datorteknik (Chalmers)) ; Ola Olsson (Institutionen för data- och informationsteknik, Datorteknik (Chalmers)) ; Ulf Assarsson (Institutionen för data- och informationsteknik, Datorteknik (Chalmers))
Proceedings of the Conference on High Performance Graphics Vol. 2009 (2009), p. 159-166.
[Konferensbidrag, refereegranskat]

Stream compaction is a common parallel primitive used to remove unwanted elements in sparse data. This allows highly parallel algorithms to maintain performance over several processing steps and reduces overall memory usage. For wide SIMD many-core architectures, we present a novel stream compaction algorithm and explore several variations thereof. Our algorithm is designed to maximize concurrent execution, with minimal use of synchronization. Bandwidth and auxiliary storage requirements are reduced significantly, which allows for substantially better performance. We have tested our algorithms using CUDA on a PC with an NVIDIA GeForce GTX280 GPU. On this hardware, our reference implementation provides a 3x speedup over previous published algorithms.

Nyckelord: CUDA, GPGPU, parallel sorting, prefix sum, stream compaction



Denna post skapades 2009-12-15. Senast ändrad 2017-10-03.
CPL Pubid: 103709