PD_ParallelDecomposition PD_ParallelPerformance PD_ParallelArchitecture TCPP_Algorithms K_12 CS0" CS1 CS2 DSA accessible touch visual

Originally described by Gregory F. Bachelis, David A. James, Bruce R. Maxim and Quentin F. Stout (Bachelis1994). Also alluded to by Garcia and (Moore2000).

Variation using a deck of playing cards by (Bogaerts2014 and Ghafoor2019):

iPDC Modules

Similiar Exercises:

Sorting: oddEvenTranspositionSort, ParallelRadixSort, PipelineSort, sortingNetwork

Other activities by authors

(Bachelis1994): ParallelAddition, FindSmallestCard, SieveOfErastothenes, OddEvenTranspositionSort, PipelineSort

(Moore2000): SieveOfErastothenes, OddEvenTranspositionSort, MatrixAddition

(Ghafoor2019): ArraysInParallel, CandySorting, FindYoungestStudentInClass, PBJinParalell,


From (Bachelis1994): The goal is to sort a deck of 16 cards (it is assumed that there are unique numbers) using parallel selection sort. The notion of a “two-card comparator” is discussed as part of the process (e.g. input is two cards, outputs the smaller of the two cards). The processor therefore employs a series of “two-card comparators” to sort the set of cards. (Bechelis1994) recommends that students are first introduced to the FindSmallestCard activity.

Serial case:

Parallel Case

Two Students:

Four Students:

Eight students

The class then should asked to calculate the speedup of the parallel cases over the serial cases. They should be able to recognize that as more processors are added, the merge step starts to require the majority of the time.

Playing Card Variation (Bogaerts2014)

Steve Bogaerts describes a variation using playing cards rather than the 16 numbered cards that (Bachelis1994) recommends. Student volunteers are separated into three groups (A, B, and C) containing 1 student, 3 students and 2 students respectively.

The remaining students in the class are asked to watch the students as they each attempt to sort their playing cards. Common observations include that while the student in group A had no overhead, it took them a long time to sort the cards since they were working alone. The students in groups B and C generally take less time, but there are different issues. Group B takes a longer time to communicate, owing to the distance between them. While the two students in Group C have an easier time communicating, there is less space and there is a chance that they will “mess up” by working in a shared space. This latter scenario is foreshadowing of race conditions.

Other Variations

(Ghafoor2019) describes a variation using two standard decks of playing cards. There is also no specified sorting algorithm in this variant.

The concept of card sorting is described by (Moore2000). In a parallel processing lab, students form groups to develop card sorting algorithms to illustrate data parallelism in shared memory, data parallelism in message passing, and task parallelism for message passing. However, additional details on how exactly this is accomplished is not provided.

CS2013 Knowledge Unit Coverage

Parallel Decomposition (Core Tier 1)

  1. Explain why synchronization is necessary in a specific parallel program. [Usage]

  2. Identify opportunities to partition a serial program into independent parallel modules. [Familiarity]

PD/Parallel Algorithms, Analysis & Programming - Core Tier 2

3. Define “speed-up” and explain the notion of an algorithm’s scalability in this regard. [Familiarity]

4. Identify independent tasks in a program that may be parallelized. [Usage]

6. Implement a parallel divide-and-conquer (and/or graph algorithm) and empirically measure its performance relative to its sequential analog.

Parallel Architecture

1. Explain the differences between shared and distributed memory. [Familiarity] [Core-Tier2]

8. Describe the key performance challenges in different memory and distributed system topologies. [Familiarity]

Parallel Performance (Elective)

  1. Detect and correct a load imbalance. [Usage]
  2. Calculate the implications of Amdahl’s law for a particular parallel algorithm (cross-reference SF/Evaluation for Amdahl’s Law). [Usage]
  3. Describe how data distribution/layout can affect an algorithm’s communication costs. [Familiarity]

TCPP Topics Coverage

Algorithms Topics


Yes. This activity should be accessible to a wide range of students. The activity can easily be done at a table. Blind students can also participate in the exercise if Braille playing cards were used.


(Bogaerts2014) used sorting playing cards an activity as part of a larger unit in parallelism in a CS1 course. He mentions that the total amount of time spent on parallelism was larger in the section that used analogies and hands-on activities compared to the one that presented the topics in a traditional lecture-style format (4 hours vs 90 minutes). However, the section that used analogies and hands-on activities performed better than those who received the information in a traditional lecture-format. Bogaerts argues that it is much better to spend more time on fewer parallel concepts in a hands-on way in an introductory course, rather than covering a variety of parallel concepts in a non-hands-on way. The final conclusion drawn is that analogies and hands-on activities enabled students to learn better and stimulated greater interest in the subject than a course that delivered the material in a typical lecture-style fashion. (Bogaerts2017) extends the assessment of the original paper, but found that while student interest increased, the desire to learn more decreased. The authors theorize that this is because most of the students in the course were non-majors who will not be pursuing computing in the future.

(Moore2000) discusses the impact of the parallel computing labs in her course. In general, students responded to the labs positively, and felt that labs increased their knowledge significantly.