Why is it so hard to correctly estimate AI projects?

AI is machine learning. Sometimes we forget that. The problem is machine learning isn’t deterministic. You can’t force it to follow your plan.

When you train a machine learning model, it won’t converge to the optimal solution on your schedule. You may allocate 100 hours for training, but it doesn’t mean you’ll get a good enough model after those 100 hours. You may get the best model in 10 hours or 1000 hours. But we don’t train an AI model, do we? We use pre-trained models.

In an AI project, we tweak prompts. Tweaking a prompt requires having an evaluation dataset, code that calculates metrics, and the code of your AI application. We can produce those parts in a predictable time. You can even accurately estimate how much time you need to run a single experiment. How many experiments do you need to find the best prompt? That we don’t know.

I recommend estimating AI projects differently.

First, we estimate the constant parts of the project. The code of the application, the evaluation dataset, the testing code, etc. This is data engineering and manual review of the example datasets. We can estimate the time quite accurately. Then, we estimate the time required to run a single experiment. We know how many examples we have in a single test and how much time we need for a single AI call. We can estimate the experiment duration, too.

Now, the fun part. How many experiments do we need? There is no way to tell. So, instead, I suggest allocating a time block for each experiment. You decide you can spend 20 hours on each part of the AI workflow. You run your first experiment to get some baseline, and you research better ideas while the first experiment is still running. Then, you review the test cases where the AI model performs poorly, apply changes you think will improve the results, and rerun the experiment. You finish when you are satisfied with the results, run out of ideas, or run out of time.

Can we do better? Yes. A lot of research has been done on prompt engineering. We don’t have to repeat the same mistakes. Finding a research paper that describes a solution to your problem is not only possible, but it will save you a lot of time. Will the solution work in your case? Nobody knows. You have to test.

Is your AI hallucinating in production? Take my 10-minute AI Readiness Assessment to identify critical vulnerabilities or schedule a consultation.

Why is it so hard to correctly estimate AI projects?

Building Reliable AI: A Testing-First Approach

How to use Pydantic Graph (an alternative to LangGraph)

Why is it so hard to correctly estimate AI projects?

Building Reliable AI: A Testing-First Approach

How to use Pydantic Graph (an alternative to LangGraph)

Related Posts

How to Detect and Block AI Hallucinations in Chatbots

Stop LLM Hallucinations in Fintech Apps: A CTO’s Guide to Risk-Proof AI Evaluation

LLM Sampling Demystified: How to Stop Hallucinations in Your Stack