CPL - Chalmers Publication Library
| Utbildning | Forskning | Styrkeområden | Om Chalmers | In English In English Ej inloggad.

Analysis and Optimization of Communication Overheads in Multi-core Architectures

Madhavan Manivannan (Institutionen för data- och informationsteknik, Datorteknik (Chalmers))
Göteborg : Chalmers University of Technology, 2013. - 57 s.

The transition to multi-core architectures can be attributed mainly to fundamental limitations in clock frequency scaling coupled with a slow growth in uniprocessor performance effected by the challenges in exploiting instruction-level parallelism. Consequently, programmers can no longer realize significant performance gains without investing effort into parallelizing applications. The shared memory paradigm offers a natural advantage in this context because it obviates the need for explicitly managing communication in applications and instead lets programmer’s focus on identifying and expressing parallelism. Communication, established by performing loads and stores to shared memory locations, is an inherent aspect of the shared memory model. With increasing core counts and growing dominance of wire delay the impact of establishing communication is bound to increase. The goals of this thesis are twofold: (1) to analyze the impact of communication overheads on scalability of applications and the implications of such overheads on CMP design choices and (2) to devise new approaches to reduce communication overheads and its impact on scalability in the light of modern task-based runtime systems. The first study analyzes the impact of merging phases on scalability of data-mining applications. The merging phase assembles partial results from multiple threads and has an inherently serial component, that grows with the number of cores. The results establish that scalability of such applications is much lower than what is predicted using Amdahl’s law. It also shows that such applications favor designs with fewer large cores over designs with several small cores. The second study proposes architectural support for data forwarding to mitigate communication overheads associated with producer-consumer sharing. Existing forwarding approaches, that proactively forward data from the producer to consumer, are shown to have limited applicability for task-based applications. An alternate technique is proposed in which producers track the identity of the updated blocks and initiates forwarding after receiving an initial request from the consumer. The technique leverages the observation about spatial locality in producerconsumer data to simplify hardware changes needed for tracking updates. The proposed forwarding scheme is shown to mitigate communication overheads due to producer-consumer sharing. Finally, the third study investigates the potential of using inter-task dependency and mapping information available to the runtime system, to facilitate coherence optimizations. This study shows that by conveying information to the underlying cache coherence substrate, coherence optimizations can be triggered which in-turn can help significantly reduce communication overheads associated with prominent sharing patterns.

Nyckelord: multi-core, cache coherence, task parallelism, sharing patterns, runtime systems, Amdahl’s Law

Den här publikationen ingår i följande styrkeområden:

Läs mer om Chalmers styrkeområden  

Denna post skapades 2013-11-11. Senast ändrad 2013-11-15.
CPL Pubid: 186434


Institutioner (Chalmers)

Institutionen för data- och informationsteknik, Datorteknik (Chalmers)


Informations- och kommunikationsteknik

Chalmers infrastruktur

Relaterade publikationer

Inkluderade delarbeten:

Implications of Merging Phases on Scalability of Multi-core Architectures

Efficient Forwarding of Producer-Consumer Data in Task-based Programs

Runtime-Guided Cache Coherence Optimizations in Multi-core Architectures


Datum: 2013-12-06
Tid: 13:15
Lokal: Room EB, EDIT Building, Rännvägen 6
Opponent: Dr. Pedro Trancoso, University of Cyprus