---
title: "Alternatives to OpenAI GPT model: using an open-source Cerebras model with LangChain"
description: "Discover how to leverage the powerful open-source Cerebras model with LangChain in this comprehensive guide, featuring step-by-step instructions for loading the model with HuggingFace Transformers, creating prompt templates, and integrating it with LangChain Agents."
author: "Bartosz Mikulski"
author_bio: "Principal AI Engineer & MLOps Architect. I bridge the gap between \"it works in a notebook\" and \"it works for 200 million users.\""
author_url: https://mikulskibartosz.name
author_linkedin: https://www.linkedin.com/in/mikulskibartosz/
author_github: https://github.com/mikulskibartosz
canonical_url: https://mikulskibartosz.name/alternatives-to-open-ai-gpt-using-open-source-models-with-langchain
---

In this article, I will show you how to use an open-source Cerebras model with LangChain. The Cerebras model is a model with a GPT-3 style architecture. Cerebras has created several versions of the model with a different number of parameters. For building agentic workflows with open-source models like Llama 3, see my article on [building an agentic AI workflow with LangGraph](https://mikulskibartosz.name/agentic-workflow-with-opensource-llms).

I will use the [cerebras/Cerebras-GPT-2.7B](https://huggingface.co/cerebras/Cerebras-GPT-2.7B) model, which is the largest model I managed to load on the Google Colab Pro+ platform. All larger models are too big to fit on the Colab Pro+ platform, even when you have 50GB of RAM available. All Cerebras-GPT models are [available on HuggingFace](https://huggingface.co/cerebras).

## Required libraries

I will show you how to use the model with prompt templates and Langchain agents. We will need the Transformers library to download the models, Langchain to use it, and SERP API as an example tool for the Agent.

Let's install them first:

```
pip install transformers langchain google-search-results
```

## Loading the Cerebras Model with Transformers

Because the Cerebras model is available on HuggingFace, we can load both the model and the text tokenizer using the Transformers library:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "cerebras/Cerebras-GPT-2.7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
```

Now, we need to create a HuggingFace text-generation pipeline to turn the input text into tokens, pass the tokens to the model, and convert the output tokens back to text. Additionally, the pipeline contains the configuration text generation features as described in [the HuggingFace documentation](https://huggingface.co/docs/transformers/main_classes/text_generation).

In this case, we setup the `max_new_tokens` parameter, which controls the maximal number of tokens generated by the model. We also set the `early_stopping` parameter to `True`, so the text generation doesn't try to find better candidates. We also set the `no_repeat_ngram_size` parameter to `2`, which means that the model won't repeat the same n-grams of size 2.

```python
from transformers import pipeline

pipe = pipeline(
    "text-generation", model=model, tokenizer=tokenizer,
    max_new_tokens=100, early_stopping=True, no_repeat_ngram_size=2
)
```

## Using a Model from HuggingFace with LangChain

In the next step, we have to import the `HuggingFacePipeline` from Langchain. We will use it as a model implementation. If you follow any other Langchain tutorial, the `HuggingFacePipeline` is the only thing you need to change when you want to replace `OpenAI` with a model from HuggingFace.

```python
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=pipe)
```

## Creating a Prompt Template

Let's use the model. For a start, we will create a prompt template without providing any additional text in the prompt. Instead, the template will pass the input verbatim into the model.

```python
from langchain import PromptTemplate
from langchain import LLMChain

template = """
{input}
"""

prompt = PromptTemplate(
    input_variables=["input"],
    template=template,
)

chain = LLMChain(
    llm=llm,
    verbose=True,
    prompt=prompt
)
```

When I run the chain, with the input "When I opened the door, I saw a" it generated:

```python
response = chain.run("""When I opened the door, I saw a""")
print(response)
```

```
woman in a white dress,
with a black veil over her face.
She was holding a baby in her arms.
```

It's a decent start, and we know the model is working. Let's make something more useful by adding a prompt template instructing the model to extract the topic from a tweet:

```python
template = """
Given a tweet:
---
{input}
---
The topic of the tweet is:
"""

prompt = PromptTemplate(
    input_variables=["input"],
    template=template,
)

chain = LLMChain(
    llm=llm,
    verbose=True,
    prompt=prompt
)
```

Let's test it with one of my tweets:

```python
response = chain.run("""After writing the same client code in two languages:
Perhaps, SDKs should be thin clients for a backend SDK service that dispatches the requests to actual backend services.""")
print(response)
```

The output isn't perfect, but it is a good start:

```
- What is the difference between a thin client and a thick client?
- What are the advantages and disadvantages of each?
- How do you write a client in one language and another in another? (e.g. in Java)
```

## Smaller Open-Source Models vs. GPT-3 or GPT-4

As we see, we must carefully write the prompt to trick the model into generating exactly what we want. With more advanced models, we can ask multiple questions at once or ask the model about abstract concepts. Simpler models have trouble generating text that is not directly related to the input. Therefore, it helps when a part of the answer is already written in the prompt, and the model only needs to fill in the gaps.

In the next example, I will show you why `cerebras/Cerebras-GPT-2.7B` isn't good enough to be used with Langchain agents. Later, I tried also `cerebras/Cerebras-GPT-13B`, but it didn't help much. Both of them aren't good enough to work as a Langchain agent.

## Using a Cerebras model with LangChain Agents

Our LangChain agent will retrieve the current weather forecast from Google results. GPT-3 (and, obviously, GPT-4) can easily handle the task when we provide a tool they can use to retrieve the results. However, the `Cerebras-GPT-2.7B` model is insufficient to handle the job. Let's see why.

```python
from langchain import SerpAPIWrapper
from langchain.agents import Tool
from langchain.agents import initialize_agent

serpapi = SerpAPIWrapper(serpapi_api_key='...')
tools = [
    Tool(
        name = "Search",
        func=serpapi.run,
        description="useful for when you need to get a weather forecast"
    )
]

agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
```

Now, we ask the agent to retrieve the weather forecast for Poznan, Poland:

```python
agent.run(input="What is the weather forecast for Poznan, Poland")
```

We are going to get an exception from Langchain and a weird-looking answer:

```
ValueError: Could not parse LLM output: ` The weather is warm and sunny.
Answer: Warm and Sunny

What is a thought?
A thought is an idea or a feeling. It is not a fact. A thought can be a
conclusion, a question, or an action.
```

What happened? Why did it generate something like this? The LangChain agents use the technique called MRKL (Modular Reasoning, Knowledge, and Language). When we use tools, the model [receives a prompt that looks like this](https://github.com/hwchase17/langchain/blob/6c66f51fb864a67529d427af97e555d718029fe6/langchain/agents/mrkl/prompt.py):

```
Answer the following questions as best you can. You have access to the following tools:

// Here is a list of tools

Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question"""

Begin!
Question: {input}
Thought:{agent_scratchpad}"""
```

The model iteratively goes through at least three steps:

* First, it generates the Thought, Action, and Action Input. The Thought is when the model can break down the query into a plan. After the initial thought, the model chooses a tool and provides a text input for the tool.
* At this point, Langchain interrupts the model and runs the tool.
* The tool returns an observation, and the model continues to generate the next Thought. After the though, it may choose another tool or generate the Final Answer.

In the output of the model I used, we see that the model was confused with the given format, but at least it tried to do something.

