All of the Above

Use AI to draft multiple choice questions based on a reading assignment

A small robot atop a multiple choice bubble sheet watching as someone markes answers

Watch Me Mark, latent space "photography" by Colarusso

David Colaursso
Co-director, Suffolk's Legal Innovation & Tech Lab

This is the 17th post in my series 50 Days of LIT Prompts.

I began my professional career as a high school teacher, and I'm a big fan of formative assessment. So, when I returned to the classroom as a student for law school, I found its nearly exclusive reliance on summative assessments not only frustrating but perplexing. Such an inballance goes against nearly everything I learned during my Master's in Education. Now, I don't think a prompt template that makes reading questions based on the text of a reading assignment is by any means a magic fix, but I am intrigued by the idea that we can use technology to help improve instruction by making it easier for teachers to provide more formative assessments. Obviously, I'm a fan of simulations in part because of the promise they offer in this arena. But it's important not to overlook the "classics." Today's template lets one create a set of multiple-choice questions based on the text of a reading assignment.

Below you'll find the output of today's template, five multiple choice questions based on the text of Hawkins v. McGee, a focus of yesterday's template. Give them a look, or jump to the instructions below.

1. What was the main issue in the case of Hawkins v. McGee?
a) The duration of the hospital stay
b) The pain and suffering endured by the plaintiff
c) The guarantee made by the doctor to make the plaintiff's hand "a hundred per cent perfect hand"
d) The doctor's refusal to perform a further operation
Answer: c) The guarantee made by the doctor to make the plaintiff's hand "a hundred per cent perfect hand"

2. What was the measure of damages as determined by the court in the case of Hawkins v. McGee?
a) The plaintiff's pain and suffering
b) The difference between the value of a perfect hand and the value of the plaintiff's hand in its current condition
c) The cost of a further operation
d) The change for the worse in the condition of the plaintiff's hand resulting from the operation
Answer: b) The difference between the value of a perfect hand and the value of the plaintiff's hand in its current condition

3. What was the court's response to the defendant's argument that no reasonable man would understand his words as a contractual agreement?
a) The court agreed with the defendant
b) The court disagreed, stating that the defendant's words could be taken at face value
c) The court stated that the defendant's words were merely an expression of strong language
d) The court stated that the defendant's words were merely an expression of his belief and expectation
Answer: b) The court disagreed, stating that the defendant's words could be taken at face value

4. What was the court's decision regarding the defendant's requests for instructions?
a) The court agreed with all of the defendant's requests
b) The court disagreed with all of the defendant's requests
c) The court agreed with some of the defendant's requests and disagreed with others
d) The court did not consider the defendant's requests
Answer: b) The court disagreed with all of the defendant's requests

5. What was the final decision of the court in the case of Hawkins v. McGee?
a) The court ruled in favor of the plaintiff
b) The court ruled in favor of the defendant
c) The court ordered a new trial
d) The court set aside the verdict
Answer: c) The court ordered a new trial

It's worth noting that the quality of these questions is very closely tied to the model used. ~~My first attempts made use of gpt-3.5-turbo-16k, but I found issues with 2 out of the 5 questions. gpt-4 produced the above questions and though slower and more expensive, it does do a better job.~~ We have updated this page to use gpt-4o-mini. If you've been with us from the start, I don't need to tell you this, but an LLM's output should always start, not end the discussion. You need to check them as you would any secondary source. The idea here, beyond getting you to think about the promise and limits of LLMs, is to provide you with a tool that can make it easier to conduct formative assessments. Get a draft, go from there.

Let's build something!

We'll do our building in the LIT Prompts extension. If you aren't familiar with the LIT Prompts extension, don't worry. We'll walk you through setting things up before we start building. If you have used the LIT Prompts extension before, skip to The Prompt Pattern (Template).

Up Next

Setup LIT Prompts

Questions or comments? I'm on Mastodon @Colarusso@mastodon.social

Setup LIT Prompts

▼ Collapse

7 min intro video

LIT Prompts is a browser extension built at Suffolk University Law School's Legal Innovation and Technology Lab to help folks explore the use of Large Language Models (LLMs) and prompt engineering. LLMs are sentence completion machines, and prompts are the text upon which they build. Feed an LLM a prompt, and it will return a plausible-sounding follow-up (e.g., "Four score and seven..." might return "years ago our fathers brought forth..."). LIT Prompts lets users create and save prompt templates based on data from an active browser window (e.g., selected text or the whole text of a webpage) along with text from a user. Below we'll walk through a specific example.

To get started, follow the first four minutes of the intro video or the steps outlined below. Note: The video only shows Firefox, but once you've installed the extension, the steps are the same.

Install the extension

Follow the links for your browser.

Firefox: (1) visit the extension's add-ons page; (2) click "Add to Firefox;" and (3) grant permissions.
Chrome: (1) visit the extension's web store page; (2) click "Add to Chrome;" and (3) review permissions / "Add extension."

If you don't have Firefox, you can download it here. Would you rather use Chrome? Download it here.

Point it at an API

Here we'll walk through how to use an LLM provided by OpenAI, but you don't have to use their offering. If you're interested in alternatives, you can find them here. You can even run your LLM locally, avoiding the need to share your prompts with a third-party. If you need an OpenAI account, you can create one here. Note: when you create a new OpenAI account you are given a limited amount of free API credits. If you created an account some time ago, however, these may have expired. If your credits have expired, you will need to enter a billing method before you can use the API. You can check the state of any credits here.

Screenshot of the OpenAI API Keys page showing where to click to create a new key.

Once you are looking at the API docs, follow the steps outlined in the image above. That is:

Select "API keys" from the left menu
Click "+ Create new secret key"

On LIT Prompt's Templates & Settings screen, set your API Base to https://api.openai.com/v1/chat/completions and your API Key equal to the value you got above after clicking "+ Create new secret key". You get there by clicking the Templates & Settings button in the extension's popup:

open the extension
click on Templates & Settings
enter the API Base and Key (under the section OpenAI-Compatible API Integration)

Once those two bits of information (the API Base and Key) are in place, you're good to go. Now you can edit, create, and run prompt templates. Just open the LIT Prompts extension, and click one of the options. I suggest, however, that you read through the Templates and Settings screen to get oriented. You might even try out a few of the preloaded prompt templates. This will let you jump right in and get your hands dirty in the next section.

If you receive an error when trying to run a template after entering your Base and Key, and you are using OpenAI, make sure to check the state of any credits here. If you don't have any credits, you will need a billing method on file.

If you found this hard to follow, consider following along with the first four minutes of the video above. It covers the same content. It focuses on Firefox, but once you've installed the extension, the steps are the same.

The Prompt Pattern (Template)

A slide showing the George Box quote: All models are wrong, but some models are useful.

Maps are models; they don't show everything. That's okay as long as you don't confuse the map for the territory.

When crafting a LIT Prompts template, we use a mix of plain language and variable placeholders. Specifically, you can use double curly brackets to encase predefined variables. If the text between the brackets matches one of our predefined variable names, that section of text will be replaced with the variable's value. Today we'll be using the {{scratch}} variable. See the extension's documentation.

The {{scratch}} variable contains the text in your Scratch Pad. Remember, the Scratch Pad is accessible from the extension's popup window. The button is to the right of the Settings & Templates button that you have used before. Your Scratch Pad is just a place where you can hold a bunch of text. The idea here is that you'll cut and paste the text of your reading into the Scratch Pad and run the template.

Here's the template's title.

Draft multiple choice reading comprehension questions

Here's the template's text.

{{scratch}}

For the above reading assignment produce five multiple-choice questions aimed at gauging whether or not someone read the article. Focus on important points and indicate the correct answer below each question.

And here are the template's parameters:

Output Type: LLM. This choice means that we'll "run" the template through an LLM (i.e., this will ping an LLM and return a result). Alternatively, we could have chosen "Prompt," in which case the extension would return the text of the completed template.
Model: gpt-4. This input specifies what model we should use when running the prompt. Available models differ based on your API provider. See e.g., OpenAI's list of models.
Temperature: 0. Temperature runs from 0 to 1 and specifies how "random" the answer should be. Since we're seeking fidelity to a text, I went with the least "creative" setting—0.
Max Tokens: 1000. This number specifies how long the reply can be. Tokens are chunks of text the model uses to do its thing. They don't quite match up with words but are close. 1 token is something like 3/4 of a word. Smaller token limits run faster.
JSON: No. This asks the model to output its answer in something called JSON. We don't need to worry about that here, hence the selection of "No."
Output To: Screen + clipboard. We can output the first reply from the LLM to a number of places, the screen, the clipboard... Here, I've chosen the screen and clipboard so the results will be ready to paste where we like.
Post-run Behavior: FULL STOP. Like the choice of output, we can decide what to do after a template runs. To keep things simple, I went with "FULL STOP."
Hide Button: unchecked. This determines if a button is displayed for this template in the extension's popup window.

Working with the above templates

To work with the above template, you could copy it and its parameters into LIT Prompts one by one, or you could download a single prompts file and upload it from the extension's Templates & Settings screen. This will replace your existing prompts.

Screenshot of the LIT Prompts Templates and Settings page showing where to upload prompts files.

You can download a prompts file (the above template and its parameters) suitable for upload by clicking this button:

Download prompts file

Kick the Tires

It's one thing to read about something and another to put what you've learned into practice. Let's see how this template performs.

Make it your own. Paste in some text you know well. See how the template performs. Consider tweaking the prompt to ask for questions around a particular theme.

TL;DR References

ICYMI, if you didn't click through above, you might want to give this a look now.

Paper talk at the Computational Legal Studies Conference (2022). There's also a social media explainer.

Unsupervised Machine Scoring of Free Response Answers—Validated Against Law School Final Exams by David Colarusso. This paper presents a novel method for unsupervised machine scoring of short answer and essay question responses, relying solely on a sufficiently large set of responses to a common prompt, absent the need for pre-labeled sample answers—given said prompt is of a particular character. That is, for questions where “good” answers look similar, “wrong” answers are likely to be “wrong” in different ways. Consequently, when a collection of text embeddings for responses to a common prompt are placed in an appropriate feature space, the centroid of their placements can stand in for a model answer, providing a lodestar against which to measure individual responses. This paper examines the efficacy of this method and discusses potential applications.