CPL - Chalmers Publication Library
| Utbildning | Forskning | Styrkeområden | Om Chalmers | In English In English Ej inloggad.

A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs

Alexandra Angerd (Institutionen för data- och informationsteknik, Datorteknik (Chalmers)) ; Erik Sintorn (Institutionen för data- och informationsteknik, Datorteknik (Chalmers)) ; Per Stenström (Institutionen för data- och informationsteknik, Datorteknik (Chalmers))
ACM Transactions on Architecture and Code Optimization (TACO) (1544-3566). Vol. 14 (2017), 4,
[Artikel, refereegranskad vetenskaplig]

Reducing the precision of floating-point values can improve performance and/or reduce energy expenditure in computer graphics, among other, applications. However, reducing the precision level of floating-point values in a controlled fashion needs support both at the compiler and at the microarchitecture level. At the compiler level, a method is needed to automate the reduction of precision of each floating-point value. At the microarchitecture level, a lower precision of each floating-point register can allow more floating-point values to be packed into a register file. This, however, calls for new register file organizations. This article proposes an automated precision-selection method and a novel GPU register file organization that can store floating-point register values at arbitrary precisions densely. The automated precision-selection method uses a data-driven approach for setting the precision level of floating-point values, given a quality threshold and a representative set of input data. By allowing a small, but acceptable, degradation in output quality, our method can remove a significant amount of the bits needed to represent floating-point values in the investigated kernels (between 28% and 60%). Our proposed register file organization exploits these lower-precision floating-point values by packing several of them into the same physical register. This reduces the register pressure per thread by up to 48%, and by 27% on average, for a negligible output-quality degradation. This can enable GPUs to keep up to twice as many threads in flight simultaneously.

Nyckelord: approximate computing, floating-point precision, gpu, llvm, register file

Denna post skapades 2017-12-18.
CPL Pubid: 253839


Läs direkt!

Länk till annan sajt (kan kräva inloggning)

Institutioner (Chalmers)

Institutionen för data- och informationsteknik, Datorteknik (Chalmers)



Chalmers infrastruktur