Translate Legalese

David Colaursso
Co-director, Suffolk's Legal Innovation & Tech Lab
This is the 29th post in my series 50 Days of LIT Prompts.
Instead of feeding 5 words to a large language model (LLM) and expecting 500 (e.g., write me an essay discussing the lessons of the French Revolution), more folks should be feeding their AIs 500 words and asking them to generate 5. This funneling approach tends to mitigate hallucinations and the biases that creepin when LLM's free associate. If you're feeding in more than you expect to get out, you're looking through the wrong end of the telescope. Consequently, we've created a good number of templates that work with summarization and entity extraction. Restructuring existing text is one of the things LLMs excel at. So, today we'll ask our prompt to help turn complicated texts into plain language.
This is actually a use case I have some experience with as my lab has experimented with getting LLMs to make such rewrites when summarizing court forms. Generally speaking, LLMs do a pretty good job of making legalese into something more people could understand. That being said, a word of caution. I don't actually think these tools are at the stage where they can be trusted to rewrite complex text in plain language absent human supervision.
There's a quote I've included alongside every prompt pattern from the statistician George Box, "[A]ll models are wrong, but some are useful." It reminds us not to confuse the map for the teritory. Remember, maps are models and they are to some extent wrong. They don't show everything, but they can be useful. I tend to draw two actionable insights from the Box quote. "Because models are wrong, their output should start, not end, discussion. To determine if a model is useful, one must ask 'compared to what?'" So, the plain language write up produced by these tools should start the discussion. It's a first draft, and I think it could be helpful esp. when you consider the alternative is rewriting the text from scratch. In the hands of someone who can properly evaluate and edit the output before sharing widely, it's a great first step, but don't fall alseep at the wheel.
Let's see what today's template can do. Here's the current OpenAI Business Terms (the terms governing their API).
Here's what I see when I run the above through the tools at WordCounter: 4,646 words; Reading level: college graduate; Reading time: 17 minutes.
And here's the output from today's prompt with the above as input.
Here's what I see when I run the above through the tools at WordCounter: 1,138 words; Reading level: 9th-10th grade; Reading time: 4 minutes.
I'm particularly tickled by the rewritten Force Majeure:
Stuff happens: If something totally outside our control happens and we can't do what we promised, we won't be in trouble for it."
That being said...
Let's build something!
We'll do our building in the LIT Prompts extension. If you aren't familiar with the LIT Prompts extension, don't worry. We'll walk you through setting things up before we start building. If you have used the LIT Prompts extension before, skip to The Prompt Pattern (Template).
Up Next
Questions or comments? I'm on Mastodon @Colarusso@mastodon.social
Setup LIT Prompts
The Prompt Patterns (Templates)

When crafting a LIT Prompts template, we use a mix of plain language and variable placeholders. Specifically, you can use double curly brackets to encase predefined variables. If the text between the brackets matches one of our predefined variable names, that section of text will be replaced with the variable's value. Today we'll be using {{highlighted}}
. See the extension's documentation.
The {{highlighted}}
variable contains any text you have highlighted/selected in the active browser tab when you open the extension. To use this template, select the text you wish to decomplexify and run the template. Note: we've set the Post-run Behavior to CHAT so you can reshape or question the text after it provides your rewrite.
Here's the template's title.
Translate into plain language
Here's the template's text.
And here are the template's parameters:
- Output Type:
LLM
. This choice means that we'll "run" the template through an LLM (i.e., this will ping an LLM and return a result). Alternatively, we could have chosen "Prompt," in which case the extension would return the text of the completed template. - Model:
gpt-4o-mini
. This input specifies what model we should use when running the prompt. Available models differ based on your API provider. See e.g., OpenAI's list of models. - Temperature:
0
. Temperature runs from 0 to 1 and specifies how "random" the answer should be. Since we're seeking fidelity to a text, I went with the least "creative" setting—0. - Max Tokens:
1000
. This number specifies how long the reply can be. Tokens are chunks of text the model uses to do its thing. They don't quite match up with words but are close. 1 token is something like 3/4 of a word. Smaller token limits run faster. - JSON:
No
. This asks the model to output its answer in something called JSON. We don't need to worry about that here, hence the selection of "No." - Output To:
Screen + clipboard
. We can output the first reply from the LLM to a number of places, the screen, the clipboard... Here, I've chosen the screen and clipboard so the results will be ready to paste where we like. - Post-run Behavior:
CHAT
. Like the choice of output, we can decide what to do after a template runs. Here we want to be able to follow up with additional prompts. So, "CHAT" it is. - Hide Button:
unchecked
. This determines if a button is displayed for this template in the extension's popup window.
Working with the above template
To work with the above template, you could copy it and its parameters into LIT Prompts one by one, or you could download a single prompts file and upload it from the extension's Templates & Settings screen. This will replace your existing prompts.
You can download a prompts file (the above template and its parameters) suitable for upload by clicking this button:
Kick the Tires
It's one thing to read about something and another to put what you've learned into practice. Let's see how this template performs.
- Decomplexify. Find your favorite confusing text, it should be something you now know rather well, and see how the template does. Are there bits where it's just a little off?
- Role Play. Edit the prompt to tailor your output. Yes, you can have it explain the text like a pirate, but maybe there are other difference you'd like to focus on.
TL;DR References
ICYMI, here are blubs for a selection of works I linked to in this post. If you didn't click through above, you might want to give them a look now.
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 by Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. There's a lot of history behind this paper. It was part of a chain of events that forced Timnit Gebru to leave Google where she was the co-lead of their ethical AI team, but more than that, it's one of the foundational papers in AI ethics, not to be confused with the field of "AI safety," which we will discuss later. It discusses several risks associated with large language models, including environmental/financial costs, biased language, lack of cultural nuance, misdirection of research, and potential for misinformation. If you want to engage critically with LLMs, this paper is a must read.
- Beyond Readability with RateMyPDF: A Combined Rule-based and Machine Learning Approach to Improving Court Forms by Quinten Steenhuis, Bryce Willey, and David Colarusso. In this paper, we describe RateMyPDF, a web application that helps authors measure and improve the usability of court forms. It offers a score together with automated suggestions to improve the form drawn from both traditional machine learning approaches and the general purpose GPT-3 large language model. We worked with form authors and usability experts to determine the set of features we measure and validated them by gathering a dataset of approximately 24,000 PDF forms from 46 U.S. States and the District of Columbia. Our tool and automated measures allow a form author or court tasked with improving a large library of forms to work at scale. This paper describes the features that we find improve form usability, the results from our analysis of the large form dataset, details of the tool, and the implications of our tool on access to justice for self-represented litigants. We found that the RateMyPDF score significantly correlates to the score of expert reviewers. While the current version of the tool allows automated analysis of Microsoft Word and PDF court forms, the findings of our research apply equally to the growing number of automated wizard-driven interactive legal applications that replace paper forms with interactive websites.
- Unsupervised Machine Scoring of Free Response Answers—Validated Against Law School Final Exams by David Colarusso. This paper presents a novel method for unsupervised machine scoring of short answer and essay question responses, relying solely on a sufficiently large set of responses to a common prompt, absent the need for pre-labeled sample answers—given said prompt is of a particular character. That is, for questions where “good” answers look similar, “wrong” answers are likely to be “wrong” in different ways. Consequently, when a collection of text embeddings for responses to a common prompt are placed in an appropriate feature space, the centroid of their placements can stand in for a model answer, providing a lodestar against which to measure individual responses. This paper examines the efficacy of this method and discusses potential applications.