Magic Copy-and-Paste

Use AI to pluck out and copy only what you want AKA entity extraction

A scissors and glue on a table in front of a wizard's hat

Magic Copy-and-Paste, latent space "photography" by Colarusso

David Colaursso
Co-director, Suffolk's Legal Innovation & Tech Lab

This is the 38th post in my series 50 Days of LIT Prompts.

Today I'll show you how to add magic copy-and-paste to the LIT Prompts extension. What is magic copy-and-paste you ask? Well, its the name I'm giving to text-driven entity extraction. Which is a long-winded way of saying, you type what you want to copy from a page, and that content is "magiclly" copied into your clipboard.

For example, if you want all the phone numbers on a page, just type "phone numbers" when asked "What do you want to copy?" Wait a bit, and presto. There's a list of phone numbers in your clipboard. Conducting such a search is actually something I've taught my students in the past.

Alas, magic copy-and-paste was never an option. Instead we had to rely on something called regular expressions. Regular expressions are a wonderfully useful tool, but I must confess it takes longer to teach someone how to use them than magic copy-and-paste. Their also rather exacting. If you don't think of every eventuality when defining the pattern you are looking for, you'll miss things. Magic copy-and-paste is more forgiving in this respect. Here are some examples of magic copy-and-paste in action.

Screen capture of someone using magic copy-and-paste to find phone numbers and then SSNs on a page. If you want to see the rest of the comic shown in the background above, click here.

The page in the above screen capture is the same one I use to teach my students regular expressions. As you might have noticed it has a big block of numbers, and as you might have guessed it is here that we find the phone and Social Security numbers asked for in the above screen capture. Here's the block.

01110100 555.867.5309 01101000 1.4142135623 01101001 987-01-6661 01110011 202.555.9355 00100000 01101001 3.1415926535897932384626433832795 01110011 00100000 666-12-4895 01100001 202-555-9355 00100000 01101000 (555) 867-5309 01101001 01100100 2.718281828459 01100100 01100101 01101110 00100000 01101101 555-867-5309 01100101 01110011 01110011 555/867-5309 01100001 01100111 01100101

And here's the output today's template produced when asked to find "phone numbers."

555.867.5309,202.555.9355,555-867-5309,(555) 867-5309,202-555-9355,555-867-5309,555/867-5309

And here's the output today's template produced when asked to find "SSNs."

987-01-6661,666-12-4895

Don't worry these are made up.

Now stop to think about what's going on here. If you look at the templates below you'll see that no where do I tell it what a phone number or SSN is. Our LLM "knows" what to look for based only our simple descriptions ("phone numbers" and "SSNs"). But magic copy-and-paste isn't limited to looking for phone and Social Security numbers. Here's what I got back when I asked for "proper nouns."

Coding the Law,SUFFOLK LAW SCHOOL,LegalTech Adventure,Regular Expressions,Slack,XKCD,Law Technology Now,Legal Information Institute,Tom Bruce,Weapons of Math Destruction,How Not to Be Wrong,Bestlaw,Westlaw,Ars Technica,CFAA,Driverless Cars,GitHub,MassCases,Massachusetts General Laws,US Code,MA General Laws

Remember, LLMs can hallucinate. So, you shouldn't blindly trust output from today's template, but given that the full text of the page is right there while you're using it, you don't have to. That is, you're in the perfect position to check the quality of the output in real time. As with all choices, deciding to use magic copy-and-paste is an exercise in tradeoffs. It's answers may be a bit "fuzzy," but it can deal with "fuzzy" inputs. That being said...

Let's build something!

We'll do our building in the LIT Prompts extension. If you aren't familiar with the LIT Prompts extension, don't worry. We'll walk you through setting things up before we start building. If you have used the LIT Prompts extension before, skip to The Prompt Pattern (Template).

Up Next

Setup LIT Prompts

Questions or comments? I'm on Mastodon @Colarusso@mastodon.social

Setup LIT Prompts

▼ Collapse

7 min intro video

LIT Prompts is a browser extension built at Suffolk University Law School's Legal Innovation and Technology Lab to help folks explore the use of Large Language Models (LLMs) and prompt engineering. LLMs are sentence completion machines, and prompts are the text upon which they build. Feed an LLM a prompt, and it will return a plausible-sounding follow-up (e.g., "Four score and seven..." might return "years ago our fathers brought forth..."). LIT Prompts lets users create and save prompt templates based on data from an active browser window (e.g., selected text or the whole text of a webpage) along with text from a user. Below we'll walk through a specific example.

To get started, follow the first four minutes of the intro video or the steps outlined below. Note: The video only shows Firefox, but once you've installed the extension, the steps are the same.

Install the extension

Follow the links for your browser.

Firefox: (1) visit the extension's add-ons page; (2) click "Add to Firefox;" and (3) grant permissions.
Chrome: (1) visit the extension's web store page; (2) click "Add to Chrome;" and (3) review permissions / "Add extension."

If you don't have Firefox, you can download it here. Would you rather use Chrome? Download it here.

Point it at an API

Here we'll walk through how to use an LLM provided by OpenAI, but you don't have to use their offering. If you're interested in alternatives, you can find them here. You can even run your LLM locally, avoiding the need to share your prompts with a third-party. If you need an OpenAI account, you can create one here. Note: when you create a new OpenAI account you are given a limited amount of free API credits. If you created an account some time ago, however, these may have expired. If your credits have expired, you will need to enter a billing method before you can use the API. You can check the state of any credits here.

Screenshot of the OpenAI API Keys page showing where to click to create a new key.

Once you are looking at the API docs, follow the steps outlined in the image above. That is:

Select "API keys" from the left menu
Click "+ Create new secret key"

On LIT Prompt's Templates & Settings screen, set your API Base to https://api.openai.com/v1/chat/completions and your API Key equal to the value you got above after clicking "+ Create new secret key". You get there by clicking the Templates & Settings button in the extension's popup:

open the extension
click on Templates & Settings
enter the API Base and Key (under the section OpenAI-Compatible API Integration)

Once those two bits of information (the API Base and Key) are in place, you're good to go. Now you can edit, create, and run prompt templates. Just open the LIT Prompts extension, and click one of the options. I suggest, however, that you read through the Templates and Settings screen to get oriented. You might even try out a few of the preloaded prompt templates. This will let you jump right in and get your hands dirty in the next section.

If you receive an error when trying to run a template after entering your Base and Key, and you are using OpenAI, make sure to check the state of any credits here. If you don't have any credits, you will need a billing method on file.

If you found this hard to follow, consider following along with the first four minutes of the video above. It covers the same content. It focuses on Firefox, but once you've installed the extension, the steps are the same.

The Prompt Patterns (Templates)

A slide showing the George Box quote: All models are wrong, but some models are useful.

Maps are models; they don't show everything. That's okay as long as you don't confuse the map for the territory.

When crafting a LIT Prompts template, we use a mix of plain language and variable placeholders. Specifically, you can use double curly brackets to encase predefined variables. If the text between the brackets matches one of our predefined variable names, that section of text will be replaced with the variable's value. Today we'll be using {{innerText}}. See the extension's documentation.

The {{innerText}} variable will be replaced by the innerText of your current page (roughly speaking the hard-coded text of a page). In the template below, you'll see {{innerText}} on line 7. It is this text from which the prompt is directed to pull its output.

If the text within brackets is not the name of a predefined variable, like {{What do you want to copy?}}, it will trigger a prompt for your user that echo's the placeholder (e.g., a text bubble containing, "What do you want to copy?"). After the user answers, their reply will replace this placeholder. A list of predefined variables can be found in the extension's documentation.

So here we're using {{What do you want to copy?}} to specify what we want to plucj from {{innerText}}. To help contain the output text, we make use of JSON mode and pass our output to the sceen and clipboad via a seond template. We can then access this via the {{passThrough}} variable.

Here's the first template's title.

Magic Copy-and-Paste

Here's the template's text.

Your job is to create a JSON object from the following Source Text. It should have a single key-value pair. The key should be "extracted" and the value should contain "{{What do you want to copy?}}" found in the Source Text. If providing the value calls for a list, separate entries with commas followed by a space, unless the items contain commas, in which case, use semicolons. 

---

SOURCE TEXT

{{innerText}}

---

Now provide the JSON object.

And here are the template's parameters:

Output Type: LLM. This choice means that we'll "run" the template through an LLM (i.e., this will ping an LLM and return a result). Alternatively, we could have chosen "Prompt," in which case the extension would return the text of the completed template.
Model: gpt-3.5-turbo-1106. This input specifies what model we should use when running the prompt. Available models differ based on your API provider. See e.g., OpenAI's list of models.
Temperature: 0. Temperature runs from 0 to 1 and specifies how "random" the answer should be. Since we're seeking fidelity to a text, I went with the least "creative" setting—0.
Max Tokens: 250. This number specifies how long the reply can be. Tokens are chunks of text the model uses to do its thing. They don't quite match up with words but are close. 1 token is something like 3/4 of a word. Smaller token limits run faster.
JSON: Yes. This asks the model to output its answer in something called JSON, which is a nice machine-readable way to structure data. See https://en.wikipedia.org/wiki/JSON
Output To: Hidden. We can output the first reply from the LLM to a number of places, the screen, the clipboard... Here, I've chosen the hide the output entirely. This is uesful when passing output to another template.
Post-run Behavior: Copy to clipboard. Like the choice of output, we can decide what to do after a template runs. Here we will we send the output to the template named "Copy to clipboard."
Hide Button: unchecked. This determines if a button is displayed for this template in the extension's popup window.

Here's the second template's title.

Copy to clipboard

Here's the template's text.

{{passThrough["extracted"]}}

And here are the template's parameters:

Output Type: Prompt. By choosing "Prompt" the template runs without being submitted to an LLM. It's output is just the template after slotting in variable values.
Model: gpt-4o-mini. This input specifies what model we should use when running the prompt. Available models differ based on your API provider. See e.g., OpenAI's list of models.
Temperature: 0.7. Temperature runs from 0 to 1 and specifies how "random" the answer should be. Here I'm using 0.7 because I'm happy to have the text be a little "creative."
Max Tokens: 250. This number specifies how long the reply can be. Tokens are chunks of text the model uses to do its thing. They don't quite match up with words but are close. 1 token is something like 3/4 of a word. Smaller token limits run faster.
JSON: No. This asks the model to output its answer in something called JSON. We don't need to worry about that here, hence the selection of "No."
Output To: Screen + clipboard. We can output the first reply from the LLM to a number of places, the screen, the clipboard... Here, I've chosen the screen and clipboard so the results will be ready to paste where we like.
Post-run Behavior: FULL STOP. Like the choice of output, we can decide what to do after a template runs. To keep things simple, I went with "FULL STOP."
Hide Button: checked. This determines if a button is displayed for this template in the extension's popup window. We've checked the option because this template shouldn't be triggered by the user directly. Rather, it needs to be triggered by another template so that there's something in the {{passThrough}} variable.

Working with the above templates

To work with the above templates, you could copy them and their parameters into LIT Prompts one by one, or you could download a single prompts file and upload it from the extension's Templates & Settings screen. This will replace your existing prompts.

Screenshot of the LIT Prompts Templates and Settings page showing where to upload prompts files.

You can download a prompts file (the above template and its parameters) suitable for upload by clicking this button:

Download prompts file

Kick the Tires

It's one thing to read about something and another to put what you've learned into practice. Let's see how this template performs.

Create your own. Come up with new magic copy-and-paste targets. Test them out, and share the results with me @Colarusso@mastodon.social