In this assignment, you will explore the effects of parallelism by taking a single-threaded program and converting it to be multi-threaded. You will measure the performance gain but also measure the problems introduced by potential race conditions.
You will then modify your solution from lab assignment #2 to introduce parallelism and synchronization.
As in the previous labs, you may work with one other person on this assignment. In some parts of this assignment, you will need to write C code; in others, you will need to provide written solutions. Each student should do their own writeup, even if they are working with someone else. Also, please follow the naming conventions suggested in the assignment, as it will help the TAs with grading.
The program serial.c is a single-threaded simulation of a bank which performs a parameterized number of account transfers on a parameterized number of accounts with a parameterized amount of "think time" at the end of each transfer. All parameterized values are set via the command-line arguments to "main". Review and make sure you understand this code before proceeding.
Note that you may need to use the
-lpthread flag when compiling your code.
First, estimate the running time of serial.c for one million (1,000,000) account transfers. You should run the program multiple times to get a good idea of the running time.
You should be using the Linux lab machines (or remotely connect to an eniac machine) for all the experiments in this assignment. If, for whatever reason, you're using a different machine, in your writeup, please indicate what type of computer you are using for your experiment (operating system, number of CPUs/cores, and CPU speed; look here if you don't know how to find that info), and be sure to use the same computer for subsequent parts of this assignment!
Second, create a program called parallel.c that is a multi-threaded version of serial.c. This program should accept additional parameters "-n <number-of-threads>"; the default number of threads should be 2. Modify the rest of the code such that the desired number of threads are created in "main", and then each thread runs the "transact" function and executes the loop the appropriate number of times.
Note: you may need to make other changes to the code as well in order to make the multi-threaded program work correctly. However, this program should not use any synchronization (like mutexes); you will do that later.
Third, estimate the performance gain that you now achieve by introducing paralellism. Run parallel.c with two threads and 1,000,000 transactions; how much faster is the program now?
Increase the number of threads to 3 and then to 4. Are you seeing "linear" performance gain? That is, does the program run twice as fast with two threads as it does with one? And twice as fast with four threads as it does with two? Why or why not?
Fourth, estimate the likelihood of race conditions in the code, now that you have introduced parallelism. Run parallel.c with two threads and 1,000,000 transactions; how many of these test runs result in an inconsistency? Repeat this experiment with three threads and with four threads. What does this tell you about the likelihood of race conditions occuring in multi-threaded programs?
Fifth, create a program called synch.c that is a synchronized version of parallel.c. Use a mutex such that the critical section in the "transact" function is synchronized, i.e. only one thread can be executing it at a time. Note that if your code is running correctly, you should never observe any race conditions (inconsistencies) anymore, since the critical section is synchronized. If you see any race conditions, you haven't properly used the mutex!
Last, estimate the performance gain (compared to the serialized version) that we achieve by using parallelism and synchronization. Run synch.c with two threads and 1,000,000 transactions; how much faster is the program now? Repeat this experiment with three threads and with four threads. Compare the results to those with the unsynchronized code. What have you observed and why has it happened?
Before you can do that, you must first modify your hashtable implementation so that it is threadsafe: the skeleton code we provided made no guarantees as to race conditions within any of the functions. If you want multiple threads to access the same hashtable concurrently (which you probably do!), you must make changes to that code to make it threadsafe before proceeding. Modify the different functions as needed, using mutex locks in the appropriate places so that the hashtable is threadsafe, and name this file "hashtable-thread.c".
Now modify the rest of your solution to Part 5 of lab assignment #2 and add parallelization by using POSIX threads. Name your new program "comparedocs-thread.c". Before you begin coding, think about what parts of the program can be parallelized, how many threads you want to run, and how to avoid race conditions. Note also that the file reading code that we provided is not threadsafe either, if you are using that.
Your solution to this part of the assignment should still report that the files have 9307 words in common. Run your program a few times to make sure you get consistent results. If you get different results each time you run the program, you have a race condition!
You must use Helgrind to check that you do not have any concurrency errors. We will be running Helgrind against your program on the eniac cluster during grading, and it should pass with no problems reported. Hint: Rather than waiting to use Helgrind at the end, once you've finished adding parallelization, use it as you go along. Helgrind is a bit "verbose" and it can be difficult to fix problems if there are lots and lots of them being reported.
Important note about Helgrind! For whatever reason, Helgrind sometimes reports errors within itself (oops!). If you see something like this (where it appears that the race condition occurs in vgpreload_helgrind-amd64-linux.so), you can ignore it:
==14219== Possible data race during read of size 8 at 0x7fefffa50 by thread #1 ==14219== at 0x4C288AD: ??? (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so) ==14219== by 0x4C2893C: pthread_create@* (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so) ==14219== by 0x401027: main (main.c:240) ==14219== This conflicts with a previous write of size 8 by thread #3 ==14219== at 0x4C289B9: ??? (in /usr/lib64/valgrind/vgpreload_helgrind-amd64-linux.so) ==14219== by 0x4E3265C: start_thread (pthread_create.c:297) ==14219== by 0x5118ECC: clone (clone.S:112)
Once you are done with your implementation, measure how long it takes to run the program using parallelization. Comparing the parallelized C solution to the serial C solution from the previous assignment, are you getting the expected speed-up? Why do you think that might be? Discuss your findings in your write-up, describing your observations in terms of Amdahl's Law, and also considering the effect of the number of threads and the manner in which you made the code threadsafe. Also, consider the two "tuning parameters" of your program: the initial capacity (number of buckets) in the hashtable, and the number of threads. Which one has a bigger impact on performance? Why?
Make sure your code runs on eniac! The TAs will grade this assignment (and all others) on the eniac machines, so make sure it is possible to compile and run your code there, even if you did your experiments on a different platform.
Your submission should include your code for Part 1 (parallel.c and synch.c), your code for Part 2 (hashtable-thread.c, comparedocs-thread.c, and any other .h or .c files needed to compile your program) and an electronic version of the writeups. The writeups for both Parts 1 and 2 should be combined into a single .txt or .pdf document. You should also include Makefiles for Parts 1 and 2.
Homeworks are to be submitted via Blackboard, as described on the course overview page. Please tar and/or zip all of your files into a single submission file.
Updated: Thu, Jan 26, 10:27am