Sorting

Github Classroom
You can clone the starter files for this lab at this GitHub Classroom link. Be sure to open the assignment in VSCode and setup up your virtual environment following these instructions.
Introduction
You are helping implement Courses at Brown! You are given a list of search results, and your job is to sort them by name lexicographically!
Each of the search results represents a course offering. Each offering is associated with the course subject, course code, and total enrollment in course-data.json
.
elements = [
{"code": {"subject": "MCM", "number": "1204K"}, "enrollment": 40},
{"code": {"subject": "TAPS", "number": "1281W"}, "enrollment": 23},
{"code": {"subject": "ENGN", "number": "2172"}, "enrollment": 12},
{"code": {"subject": "ENGN", "number": "0032"}, "enrollment": 100}
]
Since we’ll want to sort each object in lexicographic order based on their course code, we need to transform our input data to extract the course code. The built-in Python sort
and sorted
functions take in an optional function key
. The key
function acts as a transform function for elements of the input list into the elements used for comparison-based sorting. For example:
arr = [{"a": 3}, {"a": 1}, {"a": 2}]
sorted_arr = sorted(arr, key=lambda d: d["a"])
print(sorted_arr) # [{"a": 1}, {"a": 2}, {"a": 3}]
The registrar has kindly provided a key
function to extract the course code from a course!
def key(course: Course) -> str:
'''This function is already defined in the stencil code.'''
# in Python, the builtin < will compare strings in lexicographic order
return course1['code']['subject'] + course1['code']['number']
Sorting with respect to this key
function gives:
elements = [
{"code": {"subject": "ENGN", "number": "0032"}, "enrollment": 100},
{"code": {"subject": "ENGN", "number": "2172"}, "enrollment": 12},
{"code": {"subject": "MCM", "number": "1204K"}, "enrollment": 40},
{"code": {"subject": "TAPS", "number": "1281W"}, "enrollment": 23}
]
Quicksort
Quicksort is a fast, general purpose, comparison sorting algorithm. Consider the following high level pseudocode for quicksort:
function quicksort(lst):
1. Check whether the input has fewer than two elements.
These lists are returned unmodified, since they are already sorted.
2. Select a pivot element p.
(The simplest choice is to choose the first element.)
3. Partition the list into two lists: elements less than p,
and elements greater than or equal to p (excluding p).
4. Recursively sort the two partitions.
5. Return the sorted concatenation of the sorted partitions and p.
Task: Implement quicksort based on the provided pseudocode. Your implementation will be the initial_quicksort()
function inside sorts.py
. Implement the pseudocode, do not apply any optimizations.
Tip: You may reference the implementation of insertion_sort()
to see how to use the key
function.
Tip: A common mistake is recursing on a partition that includes the pivot element itself, or not including other keys equal to the pivot value in the constructed list. (Can you think of why each of these might cause a problem?)
Self Checkpoint: Run pytest
in your terminal to run the tests located in test_sorts.py
.
Since you haven’t yet implemented the optimizations to quicksort, at this point you only want the tests with the label initial_quicksort
to pass.
Make sure your initial quicksort implementation passes all of the necessary tests!
Task: Benchmark the performance of your implementation by running benchmark.py
, which will generate a chart of the runtime of your implementation over different input sizes.
The runtime chart’s y axis has a logarithmic scale. This allows you to contrast the runtime of the sort implementations you write.
You should use python3 benchmark.py
, py benchmark.py
, or python benchmark.py
, depending on your operating system. Make sure you are running it from root of your lab repository (the directory containing course-data.json
).
Note: In the generated plot, you may see lines that stop suddenly. This means that the sorting implementation reached its maximum recursion depth (this is a limit imposed by Python to prevent infinite recursion).
This is expected for certain unoptimized implementations for specifically adversarial datasets.
TA Checkpoint: Show a TA the generated chart, and answer the following questions:
- How well does the initial quicksort work on
sorted_reversed_concatenated
dataset? Why do you think this is?
Random Pivot Quicksort
To improve the performance of quicksort on certain datasets, we can change the way that we choose the pivot. One of these ways is to select a random pivot element from the list, instead of always choosing the first element.
Task: Implement random_pivot_quicksort
by randomly selecting the pivot from the input array.
Hint: You may find the randrange
function helpful for your implementation!
Hint: Your implementation should be similar to initial_quicksort
.
Self Checkpoint: Run pytest
in your terminal to run the tests located in test_sorts.py
, and make sure that the tests for random_pivot_quicksort
now pass!
Small Sort Quicksort
One possible optimization to quicksort is to use a different sorting algorithm when the input is small. The idea is that even though the asymptotic runtime of an algorithm may be worse than quicksort, it may be faster for small inputs.
Task: Implement small_sort_quicksort
by using the provided insertion_sort
when the input size is smaller than some threshold (say 10), and otherwise just the normal quicksort algorithm.
Hint: Your implementation should be similar to initial_quicksort
.
Self Checkpoint: Run pytest
in your terminal to run the tests located in test_sorts.py
, and make sure that the tests for small_sort_quicksort
now pass!
Semi-In Place Quicksort
One real problem with the naive quicksort implementation is that in order to construct sub-problems and recurse, we need to allocate new lists for elements that are left and right of the pivot.
In practice, we want to avoid this, because memory allocation is costly. One way to get around this is to use only a single auxiliary array to store temporary elements, and reuse this throughout our recursive calls.
Consider the following helper pseudocode for semi-in-place quicksort that takes an input array and also an auxiliary array and sorts the section in the input array:
function semi_in_place_helper(lo, hi, input, aux):
1. If the range is a single element or less, do nothing.
2. Select the first input element as the pivot p.
3. From left to right in aux:
a. Place all elements less than p.
b. Place the pivot p.
c. Place all elements greater than or equal to p.
6. Recuse on the upper and lower partitions of aux,
this time passing in the input array as the aux array.
7. Copy the sorted aux array back into the input array.
Notice here that in the recursive calls, the input
for one becomes the aux
for the next, so we are only using these two arrays.
Task: Implement semi_in_place_quicksort
by using the provided semi_in_place_quicksort_helper
function as described above.
Self Checkpoint: Run pytest
in your terminal to run the tests located in test_sorts.py
, and make sure that the tests for semi_in_place_quicksort
now pass!
Final TA Checkpoint: Run benchmark.py
now that you have all of your implementations, and show it to a TA!
Congratulations! You have finished the sorting lab!
Just for Fun!
(Optional) Fully In-Place Quicksort
It is possible to implement quicksort without using any auxiliary arrays at all. We’ll call this fully in-place quicksort.
The idea is that we can remove the external memory dependency by using a portion of the input array as the auxiliary array. A basic pseudocode for the algorithm is given below:
function quicksort_in_place(input, lo, hi):
1. If the range is a single element or less, do nothing.
2. Select element hi - 1 as the pivot p.
3. Sweep the range [lo, hi), and swap elements with
keys less than the pivot key to the indices starting with lo and
moving to the right as you swap elements.
4. Recurse on the upper and lower partitions of the array.
Task: Implement in_place_helper
by using the provided in_place_helper
function as described above.
You should manipulate elements
through swapping, which can be done easily in Python: elements[i], elements[j] = elements[j], elements[i]
.