Chapter 1:
Develop your prompt

We’re going to use py.space to develop the code to extract flight details from an email. We’ll try out the code to call an LLM (Large Language Model) API and check that the results are what we expect.

Py.space is an online environment for building small useful things with Python, so it’s ideal for tinkering with the prompt you send to the LLM. Before you start experimenting with a prompt, it’s tough to know what will produce the best results for the output you’re trying to achieve – LLMs are highly non-deterministic and it’s not intuitive what you should ask for. Using py.space, you can easily try out variations in the prompt and get rapid feedback on your script. Let’s get started.

Step 1: Set up the py.space environment

Open up py.space and create a new script. We’ll start by adding a couple of packages to our environment: openai and pydantic. Later, we’ll use openai to call a LLM and pydantic to define our data schema. Add these from the packages tab. Leave the version number blank to use the latest version.

Adding packages to py.space

In this tutorial, we’ll be using an LLM from OpenAI because it offers one particular feature: the option to only return outputs that fit a schema we define. However, depending on your use case you may want to use Claude from Anthropic, Gemini from Google or another LLM. It’s good practice to try out different models to see which one performs best on your use case.

Step 2: Add your API key to py.space

To use OpenAI’s API, we’ll need an API key. You can obtain this by creating an account with OpenAI and providing your payment details. It’s a good idea to set billing limits so that you don’t get unexpectedly large bills. For context, developing this tutorial in mid 2024 cost approximately $0.03.

We’ll use secrets to store the API key. It’s important to store your API securely so that other people don’t use it and incur charges on your account.

Click on the ‘Secrets’ tab, create a new secret and call it openai_api_key. Click on the pencil icon to set its value to your API key.

Adding the API key to py.space

Step 3: Define the output data schema with Pydantic

LLMs generally produce unstructured text, which isn’t helpful when trying to consume output as a program. You can explicitly ask them to produce outputs in JSON format, but they will often miss keys or not adhere to the exact shape of data we want it to be. To avoid this, there are some techniques we can use to force LLMs to produce data in the shape we want.

The OpenAI SDK for Python allows us to specify what shape of data we want (a schema) with Pydantic. Pydantic is a very handy tool for data validation. It allows us to define schemas by creating Python classes with type annotations, and it turns any given JSON data into instances of those classes. It also ensures that the data actually matches the correct structure of those classes – if it doesn’t, it will force the LLM to try again until it matches before returning an output.

In this step, we’re going to use the Pydantic library to define a schema that we can later use with the OpenAI API.

We start by importing BaseModel from pydantic. BaseModel is our starting point for defining the schema, the classes we create will inherit from it:

from pydantic import BaseModel

Given some email text, we want our model to extract some flight information. This information will be a flight number, an origin, a destination, an arrival time and a departure time. So, we create our classes according to that:

from pydantic import BaseModel

# We need a nested schema because one email can contain many flights.
class Flights(BaseModel):
  
  class FlightDetails(BaseModel):
    flight_number: str
    origin: str
    destination: str
    departure_time: str
    arrival_time: str
    
  # This schema will return an object that is a list of separate flights.
  flights: list[FlightDetails]

Step 4: Call an LLM API from py.space

The next step is to call the LLM. Start by importing OpenAI and anvil.secrets at the top of your script. Then, create an instance of OpenAI and pass in your API key by using anvil.secrets.get_secret('openai_api_key').

from openai import OpenAI
import anvil.secrets

openai_client = OpenAI(api_key=anvil.secrets.get_secret('openai_api_key'))

Now, let’s make our request. We want to generate a response from some input text, in this case the contents of an email, similarly to how we would if we were using ChatGPT. Because of this, we’ll be using the Chat API.

In the body of our request, we’re going to include:

  • The model we want to use to generate the response. We’ll be using gpt-4o-mini – a relatively small, cheap model that will be powerful enough for our use case.
  • Some input messages from which the model will generate the response. The first one will provide our prompt and the second will be our email text.
  • The response_format that we want our outputs to follow.

Add the following code to your script:

response = openai_client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": flight_extraction_prompt},
        {"role": "user", "content": email_text},
    ],
    response_format=Flights,
)

# Since we're using the Chat API, the response will also be a message
result = response.choices[0].message.parsed

print(result)

Before we run this, we need to add the prompt (flight_extraction_prompt) and a sample email (email_text). We’ll do this in the next step.

Step 5: Add the prompt and test the LLM call

It’s time to add in a prompt. When writing your prompt, be clear and specific about what you want the model to do. Take a look at this guide from OpenAI for more information on prompting best practices and strategies.

Here’s the prompt we’ll use to extract flight details from an email:

flight_extraction_prompt = """
You will extract flight details from an email and return it as JSON with the following fields: 
flight_number, origin, destination, departure_time, arrival_time. 
Provide details for each separate flight.
Provide origin and destination as standard 3 letter airport codes.
Provide arrival and departure time in this format: %Y/%m/%d %H:%M using standard Python datetime formatting.
If there are no flights in the email, return each element in the json as an empty string.
"""

We also need to pass in a sample email with which to test our prompt:

email_text = """
Your e-ticket receipt
Thank you for booking with British Airways.
Ticket Type: e-ticket
This is your e-ticket receipt. Your ticket is held in our systems, you will not receive a paper ticket for your booking.
BA0049
British Airways | World Traveller | Confirmed
9 Jun 2024
15:25
Heathrow (London)
Terminal 5
9 Jun 2024 
17:15
Seattle-Tacoma International (WA)
Baggage allowances
Hand and checked baggage allowances
"""

Add both the flight_extraction_prompt and the email_text to the script, above the API request. Here’s the full py.space widget we’ve just created:


Now we can run the code and get a response from the LLM:

Running the py.space script to get a response from the LLM

Hooray! The LLM has extracted the information we wanted from the text.

In Chapter 2, we’ll move to Anvil and build an app that uses this code.

Chapter complete

Great! We now have a prompt and the code that calls the LLM.

Let’s move to Anvil and build an app! On to Chapter 2!