The AI Ouroboros: Can 2024's AI Build 2014's AI?
David Colaursso
Co-director, Suffolk's Legal Innovation & Tech Lab
This is the 48th post in my series 50 Days of LIT Prompts.
Today if someone mentions "AI," it's a safe bet they're talking about large language models (LLMs). In 2014, however, the moniker most often referenced machine learning (ML). Below we will use today's AI (LLMs) to help make some good old-fashioned AI (ML). It's long been a joke to call statistical methods like linear and logistic regression machine learning/"AI." It's funny, however, because it's true. For nearly a decade, I've used regressions as a way to introduce AI to my students. Consequently, I'm always on the lookout for better ways to teach the subject. Recently, I made a website where students can create and train their own regression models—My Toy Models—and today we'll get an LLM to build a model we can upload there for training. But first, let me tell you about a paper.
The Robust Beauty of Improper Linear Models in Decision Making lives rent free in my head. I think about this paper from 1979 ALL. THE. TIME!
TL;DR: Experts can make robust linear models just by picking a few salient features based on their experience. From the paper (emphasis added):
Proper linear models are those in which predictor variables are given weights in such a way that the resulting linear composite optimally predicts some criterion of interest; examples of proper linear models are standard regression analysis... proper linear models outperform clinical intuition. Improper linear models are those in which the weights of the predictor variables are obtained by some nonoptimal method; for example, they may be obtained on the basis of intuition, derived from simulating a clinical judge's predictions, or set to be equal. This article presents evidence that even such improper linear models are superior to clinical intuition when predicting a numerical criterion from numerical predictors. In fact, unit (i.e., equal) weighting is quite robust for making such predictions.
In today's parlance the TL;DR would read "feature selection is really important." You want to choose features that actually have some causal effect on the thing you're trying to predict, and clinical intuition is pretty good at this. Additionally, mathematical models tend to be a lot less noisy than clinical intuition when used on individual cases. Even if they're not perfect, their predictions bounce around a lot less. So, I got to thinking, if simply choosing the right features can get you "most of the way" to a good model, maybe LLMs could help introduce the feature selection process for those learning how to build their own models.
Getting students to think about potential features (inputs for their models) is a good place to start when building a model. The trick is to take what you know about some process and identify things you can measure that help to determine the thing you're trying to predict.
Today's template prompts the user for a question the model should answer (what they want to predict) and returns a list of features. For example, when fed "Will there be a snow day tomorrow (i.e., will they cancel school)?" it suggested the following features: predicted snowfall, temperature, and wind speed. That's not bad. When asked, "Will I get a good night's sleep?" it suggested tracking my bedtime routine (i.e., reading, meditation, watching TV, listening to music, none), exercise time, and caffeine consumption. Again, not a bad first pass.
Here's the actual template output, structured to be read by My Toy Models (don't feel bad about quickly scrolling past this).
{
"question": "Will I get a good night's sleep?",
"type": "categorical",
"target": [
"yes",
"no"
],
"features": [
{
"name": "bedtime routine",
"type": "categorical",
"categories": [
"reading",
"meditation",
"watching TV",
"listening to music",
"none"
]
},
{
"name": "exercise time",
"type": "continuous",
"units": "minutes",
"lower": 0,
"upper": 120,
"mean": 30
},
{
"name": "caffeine consumption",
"type": "continuous",
"units": "mg",
"lower": 0,
"upper": 500,
"mean": 50
}
],
"training": {
"headers": [
"recorded_on",
"bedtime_routine",
"exercise_time",
"caffeine_consumption",
"target",
"note"
],
"observations": []
},
"trained_on": 0,
"coefficients": {
"bedtime_routine": 0,
"exercise_time": 0,
"caffeine_consumption": 0,
"intercept": 0
},
"performance": {}
}
Not only is this a good introduction for students to feature selection, but it also helps introduce the data structure used by My Toy Models. I'm not convinced that the template produces the perfect feature set, but it's certainly good enough to get the idea across. So, in answer to the question, "Can 2024's AI Build 2014's AI?" the answer seems to be "in part." Of course, if we want to move beyond improper models to their proper counterparts we'll need to collect data and train the models. Luckily, that's what My Toy Models is for.
If you'd like to take this tool for a spin and you haven't installed the LIT Prompts extension, you can access a standalone version of the tool here.
Let's build something!
We'll do our building in the LIT Prompts extension. If you aren't familiar with the LIT Prompts extension, don't worry. We'll walk you through setting things up before we start building. If you have used the LIT Prompts extension before, skip to The Prompt Pattern (Template).
Up Next
Questions or comments? I'm on Mastodon @Colarusso@mastodon.social
Setup LIT Prompts
LIT Prompts is a browser extension built at Suffolk University Law School's Legal Innovation and Technology Lab to help folks explore the use of Large Language Models (LLMs) and prompt engineering. LLMs are sentence completion machines, and prompts are the text upon which they build. Feed an LLM a prompt, and it will return a plausible-sounding follow-up (e.g., "Four score and seven..." might return "years ago our fathers brought forth..."). LIT Prompts lets users create and save prompt templates based on data from an active browser window (e.g., selected text or the whole text of a webpage) along with text from a user. Below we'll walk through a specific example.
To get started, follow the first four minutes of the intro video or the steps outlined below. Note: The video only shows Firefox, but once you've installed the extension, the steps are the same.
Install the extension
Follow the links for your browser.
- Firefox: (1) visit the extension's add-ons page; (2) click "Add to Firefox;" and (3) grant permissions.
- Chrome: (1) visit the extension's web store page; (2) click "Add to Chrome;" and (3) review permissions / "Add extension."
If you don't have Firefox, you can download it here. Would you rather use Chrome? Download it here.
Point it at an API
Here we'll walk through how to use an LLM provided by OpenAI, but you don't have to use their offering. If you're interested in alternatives, you can find them here. You can even run your LLM locally, avoiding the need to share your prompts with a third-party. If you need an OpenAI account, you can create one here. Note: when you create a new OpenAI account you are given a limited amount of free API credits. If you created an account some time ago, however, these may have expired. If your credits have expired, you will need to enter a billing method before you can use the API. You can check the state of any credits here.
Login to OpenAI, and navigate to the API documentation.
Once you are looking at the API docs, follow the steps outlined in the image above. That is:
- Select "API keys" from the left menu
- Click "+ Create new secret key"
On LIT Prompt's Templates & Settings screen, set your API Base to https://api.openai.com/v1/chat/completions
and your API Key equal to the value you got above after clicking "+ Create new secret key". You get there by clicking the Templates & Settings button in the extension's popup:
- open the extension
- click on Templates & Settings
- enter the API Base and Key (under the section OpenAI-Compatible API Integration)
Once those two bits of information (the API Base and Key) are in place, you're good to go. Now you can edit, create, and run prompt templates. Just open the LIT Prompts extension, and click one of the options. I suggest, however, that you read through the Templates and Settings screen to get oriented. You might even try out a few of the preloaded prompt templates. This will let you jump right in and get your hands dirty in the next section.
If you receive an error when trying to run a template after entering your Base and Key, and you are using OpenAI, make sure to check the state of any credits here. If you don't have any credits, you will need a billing method on file.
If you found this hard to follow, consider following along with the first four minutes of the video above. It covers the same content. It focuses on Firefox, but once you've installed the extension, the steps are the same.
The Prompt Patterns (Templates)
When crafting a LIT Prompts template, we use a mix of plain language and variable placeholders. Specifically, you can use double curly brackets to encase predefined variables. If the text between the brackets matches one of our predefined variable names, that section of text will be replaced with the variable's value. Today we'll be using {{scratch}}
. See the extension's documentation.
If the text within brackets is not the name of a predefined variable, like {{What question should your model answer?}}
, it will trigger a prompt for your user that echo's the placeholder (e.g., a text bubble containing, "What question should your model answer?"). After the user answers, their reply will replace this placeholder. A list of predefined variables can be found in the extension's documentation.
There are two templates below. The first suggests features in the My Toy Models file format, placing them in the Scrtach Pad, and the second downloads the file.
Here's the first template's title.
Create a toy model
Here's the template's text.
You are a data scientist building a mathematical model to answer the following question:
Q: {{What question should your model answer?}}
State whether the model's answer/target is a continuous or categorical. Now, provide the shortest list of ex-ante features a group of experts would agree to include as inputs for your model formatted as a JSON objects of the following form:
```model = {
"question":"the question the model is trying to answer",
"type":"continuous",
"features" :[
{ "name":"mode of travel", "type":"categorical", "categories":["on foot","car","train","plane","boat"] },
{ "name":"height", "type":"continuous", "units":"inches", "lower":0, "upper":100, "mean":70 }
],
"training": {
"headers": [
"recorded_on",
"mode_of_travel",
"height",
"target",
"note"
],
"observations": []
},
"trained_on": 0,
"coefficients": {
"mode_of_travel": 0,
"height": 0,
"intercept": 0
},
"performance": {}
}```
OR
```model = {
"question":"the question the model is trying to answer",
"type":"categorical",
"target": { ["yes","no"] },
"features" :[
{ "name":"mode of travel", "type":"categorical", "categories":["on foot","car","train","plane","boat"] },
{ "name":"height", "type":"continuous", "units":"inches", "lower":0, "upper":100, "mean":70 }
],
"training": {
"headers": [
"recorded_on",
"mode_of_travel",
"height",
"target",
"note"
],
"observations": []
},
"trained_on": 0,
"coefficients": {
"mode_of_travel": 0,
"height": 0,
"intercept": 0
},
"performance": {}
}```
Be sure to use appropriate units and ranges. That is, make sure any units used are those in which the feature is commonly measured and make sure that the range is large enough to capture the expected variability. Also, choose a sensible value for the mean. Under "headers" always include "recorded_on", "target", and "note". Under "coefficients" always include "intercept" and set all coefficients equal to 0 like above.
If the question asks for an answer other than a continuous or categorical value, respond with the output that looks like this:
```model = {"question":"the question the model is trying to answer", "type":"uncomputable"}```
And here are the template's parameters:
- Output Type:
LLM
. This choice means that we'll "run" the template through an LLM (i.e., this will ping an LLM and return a result). Alternatively, we could have chosen "Prompt," in which case the extension would return the text of the completed template. - Model:
gpt-4o-mini
. This input specifies what model we should use when running the prompt. Available models differ based on your API provider. See e.g., OpenAI's list of models. - Temperature:
0.7
. Temperature runs from 0 to 1 and specifies how "random" the answer should be. Here I'm using 0.7 because I'm happy to have the text be a little "creative." - Max Tokens:
1000
. This number specifies how long the reply can be. Tokens are chunks of text the model uses to do its thing. They don't quite match up with words but are close. 1 token is something like 3/4 of a word. Smaller token limits run faster. - JSON:
Yes
. This asks the model to output its answer in something called JSON, which is a nice machine-readable way to structure data. See https://en.wikipedia.org/wiki/JSON - Output To:
Screen + replace scratch pad
. We can output the first reply from the LLM to a number of places, the screen, the clipboard... Here, I've chosen the screen and replacing the the current text of the Scrtach Pad with this output. - Post-run Behavior:
FULL STOP
. Like the choice of output, we can decide what to do after a template runs. To keep things simple, I went with "FULL STOP." - Hide Button:
unchecked
. This determines if a button is displayed for this template in the extension's popup window.
Here's the second template's title.
Download MyToyModel file
Here's the template's text.
{{scratch}}
And here are the template's parameters:
- Output Type:
Prompt
. By choosing "Prompt" the template runs without being submitted to an LLM. It's output is just the template after slotting in variable values. - Model:
n/a
. Since Output Type is set to Prompt, we don't have to set LLM-specific parameters. - Temperature:
n/a
. Since Output Type is set to Prompt, we don't have to set LLM-specific parameters. - Max Tokens:
n/a
. Since Output Type is set to Prompt, we don't have to set LLM-specific parameters. - JSON:
No
. This asks the model to output its answer in something called JSON. We don't need to worry about that here, hence the selection of "No." - Output To:
Screen Only
. We can output the first reply from the LLM to a number of places, the screen, the clipboard... Here, we're content just to have it go to the screen. - Post-run Behavior:
SAVE TO FILE
. Like the choice of output, we can decide what to do after a template runs. Here we will save the output to a file. This will trigger your browser's download feature. - Hide Button:
unchecked
. This determines if a button is displayed for this template in the extension's popup window.
Working with the above templates
To work with the above templates, you could copy them and their parameters into LIT Prompts one by one, or you could download a single prompts file and upload it from the extension's Templates & Settings screen. This will replace your existing prompts.
You can download a prompts file (the above template and its parameters) suitable for upload by clicking this button:
Kick the Tires
It's one thing to read about something and another to put what you've learned into practice. Let's see how this template performs.
- You're the expert. Use the template to construct a model for something you know pretty well. See how it's suggested features stack up.
Export and Share
After you've made the templates your own and them behaving the way you like, you can export and share them with others. This will produce an HTML file you can share. This file should work on any internet connected device. To create your file, click the Export Scratch Pad & Interactions Page button. The contents of the textarea above the button will be appended to the top of your exported file. Importantly, if you don't want to share your API key, you should temporarily remove it from your settings before exporting.
If you want to see what an exported file looks like without having to make one yourself. You can use the buttons below. View export in browser will open the file in your browser, and Download export will download a file. In either case the following custom header will be inserted into your file. It will NOT include an API key. So, you'll have to enter one when asked if you want to see things work. This information is saved in your browser. If you've provided it before, you won't be asked again. It is not shared with me. To remove this information for this site (and only this site, not individual files), you can follow the instructions found on my privacy page. Remember, when you export your own file, whether or not it contains and API key depends on if you have one defined at the time of output.
Custom header:
<h2>Get Help Building a Toy Model</h2>
<p>
Use this tool to suggest features for use in a linear or logistic regression model in the form of a <a href="https://mytoymodels.org/" target="_blank">My Toy Models</a> data file. For context check out <a href="https://sadlynothavocdinosaur.com/posts/data-science" target="_blank">The AI Ouroboros: Can 2024's AI Build 2014's AI? Using LLMs to suggest features for linear and logistic regression models</a>.
</p>
<hr style="border: solid 0px; border-bottom: solid 1px #555;margin: 5px 0 15px 0"/>
Not sure what's up with all those greater than and less than signs? Looking for tips on how to style your HTML? Check out this general HTML tutorial.
TL;DR References
ICYMI, here are blubs for a selection of works I linked to in this post. If you didn't click through above, you might want to give them a look now.
-
The Robust Beauty of Improper Linear Models in Decision Making by Robyn M. Dawes.
ABSTRACT: Proper linear models are those in which predictor variables are given weights in such a way that the resulting linear composite optimally predicts some criterion of interest; examples of proper linear models are standard regression analysis, discriminant function analysis, and ridge regression analysis. Research summarized in Paul Meehl's book on clinical versus statistical prediction—and a plethora of research stimulated in part by that book—all indicates that when a numerical criterion variable (e.g., graduate grade point average) is to be predicted from numerical predictor variables, proper linear models outperform clinical intuition. Improper linear models are those in which the weights of the predictor variables are obtained by some nonoptimal method; for example, they may be obtained on the basis of intuition, derived from simulating a clinical judge's predictions, or set to be equal. This article presents evidence that even such improper linear models are superior to clinical intuition when predicting a numerical criterion from numerical predictors. In fact, unit (i.e., equal) weighting is quite robust for making such predictions. The article discusses, in some detail, the application of unit weights to decide what bullet the Denver Police Department should use. Finally, the article considers commonly raised technical, psychological, and ethical resistances to using linear models to make important social decisions and presents arguments that could weaken these resistances.