I implemented this as my final assignment for CS 15-418: Parallel Computer Architecture and Programming. This is actually a project I've been wanting to do for a while, as the concept of garbage collection has interested me ever since I first learned about it in Computer Systems, but it wasn't until this semester that I was able to find the time/motivation to finally do it.
My approach started with a simple mark and sweep algorithm. The idea behind this algorithm is that memory can be viewed as a graph where blocks of memory are vertices and references from one block to another can be considered directed edges. Root nodes of the graph would be memory on the stack, registers, and global variables. The first step in the algorithm "marks" every block that can be reached by performing a DFS starting from each of the root nodes. The second step then iterates through all the heap allocated blocks of memory and then frees all blocks that have not been marked; this is the "sweep" phase. The mark phase was parallelized using pthreads: each thread was given an approximately equal number of root nodes to start the marking DFS from. Additionally, I utilized fine-grained locks in the data structures to ensure thread safety. This also allowed the sweep phase to be accelerated, as multiple frees could happen simultaneously.
The biggest limitation of mark and sweep garbage collection is that the algorithm needs to stop execution of the program in order to run the algorithm. I imagined that thread safety could eliminate this bottleneck, as multiple threads should theoretically be able to modify the list of blocks safely. I implemented this feature by just spawning a thread to run the garbage collector in parallel to the executing program, but I'm not going to claim that it works 100% correctly because I wasn't able to create a test suite comprehensive enough to test this and it seems too simple to work this way.