CPL - Chalmers Publication Library
| Utbildning | Forskning | Styrkeområden | Om Chalmers | In English In English Ej inloggad.

Efficient Implementation of Concurrent Data Structures on Multi-core and Many-core Architectures

Bapi Chatterjee (Institutionen för data- och informationsteknik, Nätverk och system, Datakommunikation och distribuerade system (Chalmers))
Göteborg : Chalmers University of Technology, 2015. - 106 s.

Synchronization of concurrent threads is the central problem in order to design efficient concurrent data-structures. The compute systems widely available in market are increasingly becoming heterogeneous involving multi-core Central Processing Units (CPUs) and many-core Graphics Processing Units (GPUs). This thesis contributes to the research of efficient synchronization in concurrent data-structures in more than one way. It is divided into two parts. In the first part, a novel design of a Set Abstract Data Type (ADT) based on an efficient lock-free Binary Search Tree (BST) with improved amortized bounds of the time complexity of set operations - Add, Remove and Contains, is presented. In the second part, a comprehensive evaluation of concurrent Queue implementations on multi-core CPUs as well as many-core GPUs are presented. Efficient Lock-free BST -To the best of our knowledge, the lock-free BST presented in this thesis is the first to achieve an amortized complexity of O(H(n)+c) for all Set operations where H(n) is the height of a BST on n nodes and c is the contention measure. Also, the presented lock-free algorithm of BST comes with an improved disjoint-access-parallelism compared to the previously existing concurrent BST algorithms. This algorithm uses single-word compare-and-swap (CAS) primitives. The presented algorithm is linearizable. We implemented the algorithm in Java and it shows good scalability. Evaluation of concurrent data-structures - We have evaluated the performance of a number of concurrent FIFO Queue algorithms on multi-core CPUs and many-core GPUs. We studied the portability of existing design of concurrent Queues from CPUs to GPUs which are inherently designed for SIMD programs. We observed that in general concurrent queues offer them to efficient implementation on GPUs with faster cache memory and better performance support for atomic synchronization primitives such as CAS. To the best of our knowledge, this is the first attempt to evaluate a concurrent data-structure on GPUs.

Nyckelord: Lock-free, Concurrent Data Structures, Lock-free Binary search tree, Synchronization Primitives

Den här publikationen ingår i följande styrkeområden:

Läs mer om Chalmers styrkeområden  

Denna post skapades 2015-03-02. Senast ändrad 2015-03-02.
CPL Pubid: 213279


Läs direkt!

Länk till annan sajt (kan kräva inloggning)

Institutioner (Chalmers)

Institutionen för data- och informationsteknik, Nätverk och system, Datakommunikation och distribuerade system (Chalmers)


Informations- och kommunikationsteknik
Data- och systemvetenskap

Chalmers infrastruktur

Relaterade publikationer

Inkluderade delarbeten:

Understanding the Performance of Concurrent Data Structures on Graphics Processors

Efficient lock-free binary search trees


Datum: 2015-03-13
Tid: 13:15
Lokal: Room EA, EDIT
Opponent: Dr. Neeraj Mittal Department of Computer Science, The University of Texas at Dallas, USA