Building a Data Retrieval Workflow for AI with Structured Output Libraries like Marvin and Instructor

Previously, I have shown how to use LangGraph to build an agentic RAG workflow that can answer user’s questions about data by checking if the required data is available in the database, planning what data has to be retrieved, generating the query, executing it, and generating a human-readable answer. If the data wasn’t available, the workflow responded apologizing for the inconvenience.

Table of Contents

  1. Building a Data Retrieval Workflow with Marvin
    1. Defining a Marvin Function with a Pydantic Model as the Structured Output
    2. Defining a Marvin Function with a Single Value as the Structured Output
    3. Defining a Workflow with Marvin Functions
  2. What Else Marvin Can Do?
  3. Building a Data Retrieval Workflow with Instructor
  4. Why I Wouldn’t Use Marvin or Instructor in Production?

We can streamline the code and use only Python functions with structure output libraries such as Marving or Instructor. I build the same data retrieval workflow in this article with Marvin and Instructor. Ultimately, I will explain why I wouldn’t use them in production and where they work best.

Building a Data Retrieval Workflow with Marvin

The workflow I built previously was running with the Llama3 model. Right now, Marvin works only with OpenAI, so I have to modify all of the prompts to eliminate the XML tags used by Llama 3. I will also pass the database description as a parameter of the functions instead of including the description in the prompt because I must define the prompt as a docstring of the function, and I would have to copy the entire database description to every prompt using it.

Data preparation and the database description variable (DB_DESCRIPTION) remain the same as in the previous article. Here is the modified query_db function with added conversion to Markdown:

def query_db(query):
 conn = sqlite3.connect('telco.db')
  try:
    return pd.read_sql_query(query, conn).to_markdown()
  finally:
 conn.close()

Before we start, we have to pass the OpenAI API key to Marvin:

import marvin


marvin.settings.openai.api_key=OPENAI_API_KEY

Defining a Marvin Function with a Pydantic Model as the Structured Output

Marvin uses Pydantic models to define the output structure if we want more than one value returned. Primitive types can be returned without specifying a model.

First, we need a function to check if the user’s question can be answered using our data and plan the query. The function returns the reasoning (that we later use as the query plan) and a boolean value indicating if the data is available.

from pydantic import BaseModel


class CanAnswerQuestion(BaseModel):
 reasoning: str
 can_answer: bool

Marvin functions are Python functions with a decorator. The library also offers other ways to define functions, but I will show them later. For now, we use decorators. Note that all parts of the function matter and become a part of the prompt: the name, the argument names and types, the return type, and the docstring.

@marvin.fn
def check_if_question_can_be_answered(database_description: str, question: str) -> CanAnswerQuestion:
  """You are a database reading bot that can answer users' questions using information from a database.

 Given the user's question, decide whether the question can be answered using the information in the database.

 Examples:
 reasoning: I can find the average revenue of customers with tenure over 24 months by averaging the Total Revenue column in the Billing table filtered by Tenure in Months > 24, can_answer: true
 reasoning: I can find customers who signed up during the last 12 months using the Tenure in Months column in the Billing table, can_answer: true
 reasoning: I can't answer how many customers churned previous year because the Churn table doesn't contain a year, can_answer: false"""

When I call the function, I get the CanAnswerQuestion object with LLMs output parsed into the fields:

check_if_question_can_be_answered(DB_DESCRIPTION, "How many married seniors have the revenue higher than the average value for all users?")
CanAnswerQuestion(reasoning="I can find how many married senior citizens have revenue higher than the average value for all users by filtering the Customer table for 'Married' equal to 'Yes' and 'Senior_Citizen' equal to 'Yes' and then comparing their 'Total_Revenue' from the Billing table to the average 'Total_Revenue' of all users.", can_answer=True)

Defining a Marvin Function with a Single Value as the Structured Output

The next function we need is the one generating the SQL query. Since we need only one returned value, we don’t have to define a Pydantic model and can get a string directly.

@marvin.fn
def write_sql_query(database_description: str, question: str, plan: str) -> str:
  """You are a database reading bot that can answer users' questions using information from a database.

 In the previous step, you have prepared a query plan.

 Return an SQL query with no preamble or explanation. Don't include any markdown characters or quotation marks around the query."""

When I call the function, I get the SQL query:

write_sql_query(
    DB_DESCRIPTION,
    "How many married seniors have the revenue higher than the average value for all users?",
    "I can find how many married senior citizens have revenue higher than the average value for all users by filtering the Customer table for 'Married' equal to 'Yes' and 'Senior_Citizen' equal to 'Yes' and then comparing their 'Total_Revenue' from the Billing table to the average 'Total_Revenue' of all users."
)
SELECT COUNT(*) FROM Customer c JOIN Billing b ON c.Customer_ID = b.Customer_ID WHERE c.Married = 'Yes' AND c.Senior_Citizen = 'Yes' AND b.Total_Revenue > (SELECT AVG(Total_Revenue) FROM Billing)

And we can pass the query to the query_db function to get the result:

query_db("SELECT COUNT(*) FROM Customer c JOIN Billing b ON c.Customer_ID = b.Customer_ID WHERE c.Married = 'Yes' AND c.Senior_Citizen = 'Yes' AND b.Total_Revenue > (SELECT AVG(Total_Revenue) FROM Billing)")
|    |   COUNT(*) |
|---:|-----------:|
|  0 |        353 |

Next, we need two functions to write the answer. The first function generates the answer when everything went well, and we have a response:

@marvin.fn
def write_answer(question: str, plan: str, sql_query: str, sql_result: str) -> str:
  """You are a database reading bot that can answer users' questions using information from a database.

 In the previous step, you have prepared a query plan, generated a query, and retrieved the data from the database.

 Return a text answering the user's question using the provided data."""

The second function generates the apology when the data isn’t available:

@marvin.fn
def explain_why_cannot_answer(question: str, problem_explanation: str) -> str:
  """You are a database reading bot that can answer users' questions using information from a database.

 You cannot answer the user's questions because of the problem with data.

 Explain the issue to the user and apologize for the inconvenience."""

Defining a Workflow with Marvin Functions

In the LangGraph article, we defined the workflow as a graph with nodes representing the functions and edges representing the data flow. Using Marvin, we define the workflow as a Python function that calls other functions.

def answer_user_question(question: str) -> str:
 can_answer = check_if_question_can_be_answered(DB_DESCRIPTION, question)

  if can_answer.can_answer:
 plan = can_answer.reasoning
 sql_query = write_sql_query(DB_DESCRIPTION, question, plan)
 query_result = query_db(sql_query)
    return write_answer(question, plan, sql_query, query_result)
  else:
    return explain_why_cannot_answer(question, can_answer.reasoning)

When we call the workflow function:

answer_user_question("Count customers by zip code. Return the 5 most common zip codes")

We get the answer:

The 5 most common zip codes among customers are as follows:
1. Zip Code: 92028, Customer Count: 43
2. Zip Code: 92027, Customer Count: 38
3. Zip Code: 92122, Customer Count: 36
4. Zip Code: 92117, Customer Count: 34
5. Zip Code: 92126, Customer Count: 32

What Else Marvin Can Do?

Function decorators are not Marvin’s main feature. It offers a few other ways to access the OpenAI API.

With Marvin, we can generate data:

marvin.generate(n=5, instructions="Generate subjects for an email about using structured outputs in LLMs")
['Harnessing Structured Outputs in LLMs: The Future of AI Communication',
 "Boost Your AI's Performance with Structured Outputs",
 'Mastering LLMs: The Importance of Structured Data',
 'Revolutionize Your AI Workflow with Structured Output Implementation',
 'Why Structured Outputs are Essential for Efficient LLMs']

We can also classify text:

marvin.classify(
    "Revolutionize Your AI Workflow with Structured Output Implementation", labels=["clickbait", "not_clickbait"]
)
# clickbait

Or extract information from text:

marvin.extract("Great hotel, nice and clean room, good location, and convenient parking. Staff were very nice, friendly and helpful too. We stayed here for a night and will definitely stay again.", instructions="get any amenities mentioned in the review")

# ['clean room', 'good location', 'convenient parking']

Marvin can also create images, generate captions for images, or classify them. The library works with audio and video and can use OpenAI Assistants for building interactive applications and giving AI access to tools.

Building a Data Retrieval Workflow with Instructor

The Instructor library is simpler than Marvin. All Instruct does is ensure the LLMs output adheres to the expected structure. However, the instructor works with multiple models, not only OpenAI.

Before using it, we must import the library, create an OpenAI client, and pass the client to Instructor. Instructor will generate a wrapper for the client to ensure the output structure.

import instructor
from openai import OpenAI


openai_client = OpenAI(api_key=OPENAI_API_KEY)
client = instructor.from_openai(openai_client)

We can use the client in the same way as the OpenAI client, but the decorated client has one additional parameter: response_model. We can pass a Pydantic model (or a primitive type) to ensure the output structure.

In the Instructor version, the check_if_question_can_be_answered function may look like this:

def check_if_question_can_be_answered(question: str) -> CanAnswerQuestion:
 prompt = f"""You are a database reading bot that can answer users' questions using information from a database. \n

 Given the user's question, decide whether the question can be answered using the information in the database. \n\n

    {DB_DESCRIPTION}

 Examples:

 reasoning: I can find the average revenue of customers with tenure over 24 months by averaging the Total Revenue column in the Billing table filtered by Tenure in Months > 24, can_answer: true
 reasoning: I can find customers who signed up during the last 12 month using the Tenure in Months column in the Billing table, can_answer: true
 reasoning: I can't answer how many customers churned last year because the Churn table doesn't contain a year, can_answer: false"""

  return client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=CanAnswerQuestion,
    messages=[
 {"role": "system", "content": prompt},
 {"role": "user", "content": question}
 ],
)

When called, the function returns the CanAnswerQuestion object with the LLMs output parsed into the fields:

check_if_question_can_be_answered("How many married seniors have the revenue higher than the average value for all users?")
CanAnswerQuestion(reasoning='To answer this question, we would need to calculate the average revenue for all users and then filter the data to find the number of married seniors whose revenue is higher than the average value.', can_answer=True)

Similarly, we can implement the write_sql_query function:

def write_sql_query(question: str, plan: str) -> str:
 prompt = f"""You are a database reading bot that can answer users' questions using information from a database. \n

  {DB_DESCRIPTION}

 In the previous step, you have prepared a query plan.

 Return an SQL query with no preamble or explanation. Don't include any markdown characters or quotation marks around the query."""

  return client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=str,
    messages=[
 {"role": "system", "content": prompt},
 {"role": "user", "content": "Query plan: " + plan}
 ],
 )

The workflow defined with Instruct would be similar to the one defined with Marvin. Because I define prompts in the function body (not a docstring), I can use f-strings to pass the database description as the prompt instead of a parameter.

Why I Wouldn’t Use Marvin or Instructor in Production?

Marvin and Instructor are great for building prototypes or experimenting with OpenAI models. Both libraries simplify the code and allow us to use only Python functions. That’s great for Jupiter Notebooks during data analysis or for building a quick proof of concept, but I wouldn’t use them in a production environment.

Marvin hides too many details. Imagine debugging a workflow where all AI interactions are hidden in wrappers. The wrappers generate prompts and output parsers. You can turn on the debug logging but have no control over the prompt. It would be a nightmare. Instructor is simpler, but it still hides the details of output parsing (and validation if you use the validation feature).

You could persuade me to deploy Instructor in production. After all, Instruct doesn’t do much, and we have to write our prompts. But Marvin is a no-go for me. It’s too complex and hides too many details. (In production, of course. Marving is perfect for prototyping and experimenting.)


Do you need help building AI-powered applications for your business?
You can hire me!

Older post

Zusammenarbeit mit einer KI-Automatisierungsagentur

Maximieren Sie die Effizienz und Genauigkeit in Ihrem Unternehmen mit KI-Automatisierung. Erfahren Sie, wie KI-Automatisierungsagenturen Ihre Abläufe revolutionieren können, von der Rationalisierung sich wiederholender Aufgaben bis zur Verbesserung der Datenanalyse und Entscheidungsfindung.

Newer post

How can we measure improvement in information retrieval quality in RAG systems?

Every RAG system starts with retrieval. How do you know if your retrieval code is good enough? You measure it. The article shows how to use the ir_measures library to calculate Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG) to quantify the performance of your retrieval code.

Are you looking for an experienced AI consultant? Do you need assistance with your RAG or Agentic Workflow?
Schedule a call, send me a message on LinkedIn. Schedule a call or send me a message on LinkedIn

>