Deploy LLama 3 for free using Cloudflare AI in 3 minutes

Aman Kumar
4 min readMay 13, 2024
Created Using DallE

In this guide, we’ll explore how to deploy and host LLaMA 3, a powerful language model, for free using Cloudflare Workers.

Llama 3 models take data and scale to new heights. It’s been trained on Meta’s two recently announced custom-built 24K GPU clusters on over 15T tokens of data — a training dataset 7x larger than that used for Llama 2, including 4x more code.

Llama3 Model

LLMs (Large Language Models) and AI technologies are rapidly advancing, and with Cloudflare’s generous pricing model, you’re perfectly positioned to start developing your own AI applications.

Cloudflare Workers pricing

Follow these steps to set up your application:

1. Create a Cloudflare Account

Start by signing up or logging into your Cloudflare account.

2. Navigate to Workers & Pages

Find the “Workers & Pages” section in your dashboard to begin setting up your new application.

3. Create Your Application

Click on the “Create Application” button to initiate the setup process.

4. Create a Worker

Choose “LLM App” from the templates under the Workers tab. This choice serves as the foundation of your application, enabling JavaScript execution on Cloudflare’s servers. Selecting the LLM App template provides a head start with the necessary packages for running such an application.

Workers Tab

5. Deploy Your Worker

After creating your worker, click on “Deploy”. Don’t worry; we can update your worker’s code later.

Hurrah! Our worker is now live; request like the below to see the magic happen. Replace the url with your worker url.

curl --location 'https://llm-app-damp-sun.2000-aman-sinha.workers.dev/' \
--header 'Content-Type: application/json' \
--data '{
"prompt": "say a joke"
}'

6. Edit Worker’s Code

Now, it’s time to customize your worker. Click on Edit code it to start coding. Initially, your worker is up and running.

Edit index.js to contain the below content (Code)

The Final project looks like this.

7. Click “Save and Deploy ”

Click on Deploy button on the top right, then Save and Deploy

Our LLM is now up and running and accepting our requests and prompts.

Let’s try by making the below request to get a joke. Change URL your worker URL for it to work.

curl --location 'https://llm-app-damp-sun.2000-aman-sinha.workers.dev/' \
--header 'Content-Type: application/json' \
--data '{
"prompt": "say a joke"
}'

Well Done. Our app is live, and we can now make use of Llama3.

You can make API Calls using Chat, too, by making calls in the following format:

curl --location 'https://llm-app-damp-sun.2000-aman-sinha.workers.dev/' \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "How many world cups has france won?" }
]
}'

You can check various models available on Cloudflare AI here:

Try multiple models, see what fits best for your use case, and move forward.

If this article was helpful, give it some claps. I’m deeply involved with AI and LLMs. Follow me on Medium for more insights.
Feel free to say hi or connect via Twitter and LinkedIn.

--

--