AI is machine learning. Sometimes we forget that. The problem is machine learning isn’t deterministic. You can’t force it to follow your plan.

When you train a machine learning model, it won’t converge to the optimal solution on your schedule. You may allocate 100 hours for training, but it doesn’t mean you’ll get a good enough model after those 100 hours. You may get the best model in 10 hours or 1000 hours. But we don’t train an AI model, do we? We use pre-trained models.

In an AI project, we tweak prompts. Tweaking a prompt requires having an evaluation dataset, code that calculates metrics, and the code of your AI application. We can produce those parts in a predictable time. You can even accurately estimate how much time you need to run a single experiment. How many experiments do you need to find the best prompt? That we don’t know.

Want to build AI systems that actually work?

Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.

I recommend estimating AI projects differently.

First, we estimate the constant parts of the project. The code of the application, the evaluation dataset, the testing code, etc. This is data engineering and manual review of the example datasets. We can estimate the time quite accurately. Then, we estimate the time required to run a single experiment. We know how many examples we have in a single test and how much time we need for a single AI call. We can estimate the experiment duration, too.

Now, the fun part. How many experiments do we need? There is no way to tell. So, instead, I suggest allocating a time block for each experiment. You decide you can spend 20 hours on each part of the AI workflow. You run your first experiment to get some baseline, and you research better ideas while the first experiment is still running. Then, you review the test cases where the AI model performs poorly, apply changes you think will improve the results, and rerun the experiment. You finish when you are satisfied with the results, run out of ideas, or run out of time.

Can we do better? Yes. A lot of research has been done on prompt engineering. We don’t have to repeat the same mistakes. Finding a research paper that describes a solution to your problem is not only possible, but it will save you a lot of time. Will the solution work in your case? Nobody knows. You have to test.


Do you need help building AI-powered applications for your business?
You can hire me!

Want to build AI systems that actually work?

Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.

Older post

Building Reliable AI: A Testing-First Approach

Learn how to properly test AI systems using familiar software testing concepts. Discover key metrics, alignment checks, and robustness testing strategies for reliable AI deployment.

Are you looking for an experienced AI consultant? Do you need assistance with your RAG or Agentic Workflow?
Book a Quick Consultation, send me a message on LinkedIn. Book a Quick Consultation or send me a message on LinkedIn

>