CPL - Chalmers Publication Library
| Utbildning | Forskning | Styrkeområden | Om Chalmers | In English In English Ej inloggad.

Techniques to Reduce Thread-Level Speculation Overhead

Fredrik Warg (Institutionen för data- och informationsteknik, Datorteknik (Chalmers))
Göteborg : Chalmers University of Technology, 2006. ISBN: 91-7291-803-9.- 845 kB s.

The traditional single-core processors are being replaced by chip multiprocessors (CMPs) where several processor cores are integrated on a single chip. While this is beneficial for multithreaded applications and multiprogrammed workloads, CMPs do not provide performance improvements for single-threaded applications. Thread-level speculation (TLS) has been proposed as a way to improve single-thread performance on such systems. TLS is a technique where programs are aggressively parallelized at run-time -- threads speculate on data and control dependences but have to be squashed and start over in case of a dependence violation. Unfortunately, various sources of overhead create a major performance problem for TLS. This thesis quantifies the impact of overheads on the performance of TLS systems, and suggests remedies in the form of a number of overhead-reduction techniques. These techniques target run-time parallelization that do not require recompilation of sequential binaries. The main source of parallelism investigated in this work is module continuations, i.e. functions or methods are run in parallel with the code following the call instruction. Loops is another source. Run-length prediction, a technique aimed at reducing the amount of short threads, is introduced. An accurate predictor that avoids short threads, or dynamically unrolls loops to increase thread lengths, is shown to improve speedup for most of the benchmarks applications. Another novel technique is misspeculation prediction, which can remove most of the TLS overhead by reducing the number of misspeculations. The interaction between thread-level parallelism and instruction-level parallelism is studied -- in many cases, both sources can be exploited for additional performance gains, but in some cases there is a trade-off. Communication overhead and memory-level parallelism are found to play an important role. For some applications, prefetching from threads that are squashed contributes more to speedup than parallel execution. Finally, faster inter-thread communication is found to give simulataneous multithreaded (SMT) processors an advantage as the basis for TLS machines.

Nyckelord: Computer architecture, thread-level speculation, chip multiprocessors, multithreaded processors, speculation overhead, performance evaluation

Denna post skapades 2006-05-15. Senast ändrad 2013-09-25.
CPL Pubid: 20527


Läs direkt!

Länk till annan sajt (kan kräva inloggning)

Institutioner (Chalmers)

Institutionen för data- och informationsteknik, Datorteknik (Chalmers)



Chalmers infrastruktur


Datum: 2006-06-13
Tid: 13.15
Lokal: 13.15 HC2
Opponent: Professor Antonio González, Computer Architecture Departament, Universitat Politècnica de Catalunya, Spain

Ingår i serie

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie 2485

Technical report D - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University 18D