Performance Impact of Batching Web Application Requests using Hot-spot Processing on GPUs

Tobias Fjälling ; Per Stenström (Institutionen för data- och informationsteknik, Datorteknik (Chalmers))
29th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015, Hyderabad, India, 25-29 May 2015 (1530-2075). p. 989-999. (2015)
Web applications are a good fit for many-core servers because of their inherent high-degree of request-level parallelism. Yet, processing-intensive web-server requests can lead to low quality-of-service due to hot-spots, which calls for methods that can improve single-thread performance. This paper explores how to use off-chip GPUs to speed up web application hot-spots written in productivity-friendly environments (e.g. C#). First, we apply a number of straightforward optimizations through refactoring of a commercial-strength, web application code. This yields a speedup of 7.6 in a CPU multi-threaded, and multi-core test. Second, we then gather similar requests from different threads of the optimized code, by applying a technique called batching, to exploit SIMD parallelism provided by GPUs. Surprisingly, there is ample parallelism to be exploited from the already optimized code yielding a speedup of a factor between 2x to 3x compared to the best optimized CPU version.

Nyckelord: Cloud computing, data parallelism, code optimization

