Recently as part of working on a personal project I learned a pretty neat trick.

The Challenge

The general-purpose LLMs these days are pretty smart, they can answer all sorts of questions and perform various tasks without any fine tuning or specific training.

But what if I want to use the output of the LLM in my program or script? Since the output is just text, it could be anything. I would have to parse it into something that my program can understand, which might not be so straightforward.

Here is an example:

Say I ask the LLM: “Here is a Twitter post, please list all the animals mentioned in the post”. The LLM’s answer might contain irrelevant sentences like “Sure, here are all the animals: Cat, Dog…”. It might output the animals separated by commas, or newlines, it might add a bullet point on each line, etc. So parsing the information is not so simple.

We can try to give the LLM more strict instructions like: “Output only the data, be terse and concise. Output the data in the form of a JSON array.” This usually helps, but the LLM doesn’t always follow all the instructions, and it’s not easy to list all the possible edge cases with such instructions.

Giving the LLM a Schema!

So after messing around with that for a bit, I found that modern LLMs actually support JSON schema! I can specify the schema of the output that I want, and the LLM will try its best to adhere to that schema. This is not bulletproof, but it’s much much better than what I tried before.

So I wrote a little function that takes a Zod schema, converts it to JSON schema, gives it to the LLM, and then takes the LLM output and parses it with the Zod schema. The result is that I can now talk to the LLM and get back structured type-safe data!

Here is the code:

import { GoogleGenAI, GenerateContentParameters } from '@google/genai';
import { z, ZodTypeAny } from 'zod';
import { merge } from 'ts-deepmerge';
import { toGeminiSchema } from 'gemini-zod';

type ModelExecutor = ReturnType<typeof createModelExecutor>;

function createModelExecutor(apiKey: string, model: string) {
  // Initialize the connection to Gemini
  const ai = new GoogleGenAI({ apiKey });
  const defaultParams: GenerateContentParameters = { model, contents: '' };

  const modelExecutor = {
    generateContentZod: async <T extends ZodTypeAny>(params: Partial<GenerateContentParameters>, zodSchema: T): Promise<z.infer<T>> => {
      // Converting the Zod schema to a format that the LLM understands
      const schemaParams = {
        config: {
          responseMimeType: 'application/json',
          responseSchema: toGeminiSchema(zodSchema),
        },
      };
      const mergedParams = merge(defaultParams, params, schemaParams) as GenerateContentParameters;

      // Getting the result from the LLM
      const response = await ai.models.generateContent(mergedParams);
      const text: string = response.text ?? '';

      try {
        // Parsing the result into a type-safe object
        const parsed = JSON.parse(text);
        return zodSchema.parse(parsed);
      } catch (err) {
        throw new Error(`Failed to parse or validate response: ${(err as Error).message}`);
      }
    },
  };
  return modelExecutor;
}

Now I can call the LLM like this:

const animals: string[] = await modelExecutor.generateContentZod({
      config: {
        systemInstruction: {
          parts: [{ text: `Here is a Twitter post, please list all the animals mentioned in the post` }],
        },
      },
      contents: [
        {
          role: 'user',
          parts: [{ text: 'The Twitter post' }],
        },
      ],
    },
    z.array(z.string()), // <---------- The Zod schema
);

And animals is guaranteed to be an array of strings! Pretty cool, right?

How good is it really?

I found that I can use pretty complex Zod schemas and the LLM handles them pretty well. With complex schemas I found that it’s good to give the AI some “help”:

  • Make some attributes optional and nullable if they are not strictly required.
  • Add a description for each attribute to help the LLM understand exactly what it represents.

Example:

z.object({
    animals: z.array(z.string()).description('All the animals mentioned in the post'),
    cutest: z.string().optional().nullable().description('The cutest animal mentioned in the post'),
})

I really like this because now I can write very simple scripts that before would be very difficult. For example, I used this method to go over posts in a facebook group for rental apartments and parse the price, number of rooms, etc, from each post, and then filter out just the posts that are relevant to me. It was quite easy to write that script, and it saved me a bunch of time looking through irrelevant posts.

Some more tricks

Here are some ideas I’ve been experimenting with since I started using LLMs with structured output:

  • Specify a confidence attribute to let the LLM tell us how confident it is in the answer it gave us. We can use this confidence score in our logic to filter out irrelevant answers, or to find places where the LLM was not sure about its answer and try to figure out why.
  • Specify a reasoning attribute to let the LLM expose its reasoning behind why it gave this answer. It can be used to debug the AI answers, trying to understand why the AI got confused and produced a wrong answer.
z.object({
    cutest: z.object({
        reasoning: z.string().description('Explain the reasoning behind your answer'),
        confidence: z.number().description('Confidence score'),
        value: z.string().optional().nullable().description('The cutest animal mentioned in the post'),
    }),
})

The reasoning attribute can also be used to make a self-improving loop for difficult tasks:

  • Ask the LLM a question, and get an answer
  • Give the question and the answer to another LLM and ask it if the answer is correct. Tell it to explain the reasoning and specify what is missing or wrong.
  • Give this analysis to the original LLM and tell it to refine its answer based on the feedback.
  • Repeat this until the answer is satisfactory.

This method simulates how a person would iterate with an LLM, but does it automatically. Instead of the person we have an AI critic that evaluates the answer until it is good enough.

If you’re interested in using a reasoning field with your LLM, I recommend reading this article: Order of fields in Structured output can hurt LLMs output.

P.S.

I am using Gemini for my personal projects because at the time of writing this post Gemini offers a free tier. But I am pretty sure that you would be able to apply the ideas in this blog post to any other major LLM provider like OpenAI etc.