Sadly Not, Havoc Dinosaur

2.B, or Not 2.B?

Have an AI answer multiple choice and true or false questions based on a reading assignment

Headshot of the author, Colarusso. David Colaursso

This is the 20th post in my series 50 Days of LIT Prompts.

Yesterday, inspired by GPT-4 passing the bar exam, we tried getting an LLM to answer some reading questions about Hawkins v. McGee. The LLMs didn't do so hot, only getting 40-47% of the questions right. So, what was missing? Remember, way back in the first post where we said, "If you want an LLM, or the man on the street, to summarize a text, have them read it first." Well, it turns out, if you want an LLM to answer questions about a text you should have them read it first. 😜

Yesterday's template was divorced from context. For example, one of the questions was "What was the court's decision regarding the defendant's requests for instructions?" In isolation, there's no way of knowing what decision the question is asking about. So, today we'll be including that context by making the case text part of our prompt. For the record, using gpt-3.5-trubo-16k and the template below I was able to get 4 out of 5 on Tuesday's questions, and 8 out of 10 on Wednesday's questions. That's a solid 80%. When I switched the model to gpt-4, it scored 5/5 and 9/10 respectively. That's 93%! Just to remind you, yesterday's template got 40% and 47%.

To be entierly clear, our gpt-4 isn't quite the same GPT-4 used in the MBE paper. The "GPT-4" moniker has pointed to differnt models over time. Also, I counted an answer as "correct" when it agreed with the GPT-generated answer provided in the prior posts. As I noted, I wouldn't take these answers as gospel. It did, however, seem reasonable here to measure the LLMs' performance against the answers drafted by an LLM. That is, we are measuring how well the LLMs did on a test written by LLMs. Regardless, I think we've established there's something to be said for context. Knowing this, let's build something!

We'll do our building in the LIT Prompts extension. If you aren't familiar with the LIT Prompts extension, don't worry. We'll walk you through setting things up before we start building. If you have used the LIT Prompts extension before, skip to The Prompt Pattern (Template).

Up Next

Questions or comments? I'm on Mastodon @Colarusso@mastodon.social


Setup LIT Prompts

7 min intro video

LIT Prompts is a browser extension built at Suffolk University Law School's Legal Innovation and Technology Lab to help folks explore the use of Large Language Models (LLMs) and prompt engineering. LLMs are sentence completion machines, and prompts are the text upon which they build. Feed an LLM a prompt, and it will return a plausible-sounding follow-up (e.g., "Four score and seven..." might return "years ago our fathers brought forth..."). LIT Prompts lets users create and save prompt templates based on data from an active browser window (e.g., selected text or the whole text of a webpage) along with text from a user. Below we'll walk through a specific example.

To get started, follow the first four minutes of the intro video or the steps outlined below. Note: The video only shows Firefox, but once you've installed the extension, the steps are the same.

Install the extension

Follow the links for your browser.

  • Firefox: (1) visit the extension's add-ons page; (2) click "Add to Firefox;" and (3) grant permissions.
  • Chrome: (1) visit the extension's web store page; (2) click "Add to Chrome;" and (3) review permissions / "Add extension."

If you don't have Firefox, you can download it here. Would you rather use Chrome? Download it here.

Point it at an API

Here we'll walk through how to use an LLM provided by OpenAI, but you don't have to use their offering. If you're interested in alternatives, you can find them here. You can even run your LLM locally, avoiding the need to share your prompts with a third-party. If you need an OpenAI account, you can create one here. Note: when you create a new OpenAI account you are given a limited amount of free API credits. If you created an account some time ago, however, these may have expired. If your credits have expired, you will need to enter a billing method before you can use the API. You can check the state of any credits here.

Login to OpenAI, and navigate to the API documentation.

Once you are looking at the API docs, follow the steps outlined in the image above. That is:

  1. Select "API keys" from the left menu
  2. Click "+ Create new secret key"

On LIT Prompt's Templates & Settings screen, set your API Base to https://api.openai.com/v1/chat/completions and your API Key equal to the value you got above after clicking "+ Create new secret key". You get there by clicking the Templates & Settings button in the extension's popup:

  1. open the extension
  2. click on Templates & Settings
  3. enter the API Base and Key (under the section OpenAI-Compatible API Integration)

Once those two bits of information (the API Base and Key) are in place, you're good to go. Now you can edit, create, and run prompt templates. Just open the LIT Prompts extension, and click one of the options. I suggest, however, that you read through the Templates and Settings screen to get oriented. You might even try out a few of the preloaded prompt templates. This will let you jump right in and get your hands dirty in the next section.

If you receive an error when trying to run a template after entering your Base and Key, and you are using OpenAI, make sure to check the state of any credits here. If you don't have any credits, you will need a billing method on file.

If you found this hard to follow, consider following along with the first four minutes of the video above. It covers the same content. It focuses on Firefox, but once you've installed the extension, the steps are the same.


The Prompt Pattern (Template)

When crafting a LIT Prompts template, we use a mix of plain language and variable placeholders. Specifically, you can use double curly brackets to encase predefined variables. If the text between the brackets matches one of our predefined variable names, that section of text will be replaced with the variable's value. Today we'll be using two of our favorites, {{highlighted}} and {{scratch}}. See the extension's documentation.

The {{highlighted}} variable contains any text you have highlighted/selected in the active browser tab when you open the extension. The {{scratch}} variable contains the text in your Scratch Pad. Remember, the scratch pad is accessible from the extension's popup window. The button is to the right of the Settings & Templates button that you have used before.

To run the template, make sure that the text of the reading is in your Scratch Page, highlight the text of your question, not including the answer, and trigger the extension. If all goes well the LLM will "read" the text along with your question and return an answer.

Here's the template's title.

Answer the selected question

Here's the template's text.

I'm going to show you the text of a reading assignment followed by a "multiple choice" or "true or false" question for that reading. Then I'm going to ask you to correctly answer the question.

READING:

{{scratch}}

QUESTION: 

{{highlighted}}

Now, provide the correct answer:       
      

And here are the template's parameters:

Working with the above templates

To work with the above template, you could copy it and its parameters into LIT Prompts one by one, or you could download a single prompts file and upload it from the extension's Templates & Settings screen. This will replace your existing prompts.

You can download a prompts file (the above template and its parameters) suitable for upload by clicking this button:


Kick the Tires

It's one thing to read about something and another to put what you've learned into practice. Let's see how this template performs.


TL;DR References

ICYMI, here are blubs for a selection of works I linked to in this post. If you didn't click through above, you might want to give them a look now.

Figure 1 from GPT-4 Passes the Bar Exam showing the performance over various LLMs on the multi-state bar exam.
Figure 1 from GPT-4 Passes the Bar Exam showing the performance over various LLMs on the multi-state bar exam. Click to enlarge.