Deploy LLama 3 for free using Cloudflare AI in 3 minutes
In this guide, we’ll explore how to deploy and host LLaMA 3, a powerful language model, for free using Cloudflare Workers.
Llama 3 models take data and scale to new heights. It’s been trained on Meta’s two recently announced custom-built 24K GPU clusters on over 15T tokens of data — a training dataset 7x larger than that used for Llama 2, including 4x more code.
LLMs (Large Language Models) and AI technologies are rapidly advancing, and with Cloudflare’s generous pricing model, you’re perfectly positioned to start developing your own AI applications.
Follow these steps to set up your application:
1. Create a Cloudflare Account
Start by signing up or logging into your Cloudflare account.
2. Navigate to Workers & Pages
Find the “Workers & Pages” section in your dashboard to begin setting up your new application.
3. Create Your Application
Click on the “Create Application” button to initiate the setup process.
4. Create a Worker
Choose “LLM App” from the templates under the Workers tab. This choice serves as the foundation of your application, enabling JavaScript execution on Cloudflare’s servers. Selecting the LLM App template provides a head start with the necessary packages for running such an application.
5. Deploy Your Worker
After creating your worker, click on “Deploy”. Don’t worry; we can update your worker’s code later.
Hurrah! Our worker is now live; request like the below to see the magic happen. Replace the url
with your worker url
.
curl --location 'https://llm-app-damp-sun.2000-aman-sinha.workers.dev/' \
--header 'Content-Type: application/json' \
--data '{
"prompt": "say a joke"
}'
6. Edit Worker’s Code
Now, it’s time to customize your worker. Click on Edit code
it to start coding. Initially, your worker is up and running.
Edit index.js
to contain the below content (Code)
The Final project looks like this.
7. Click “Save and Deploy ”
Click on Deploy
button on the top right, then Save and Deploy
Our LLM is now up and running and accepting our requests and prompts.
Let’s try by making the below request to get a joke. Change URL
your worker URL for it to work.
curl --location 'https://llm-app-damp-sun.2000-aman-sinha.workers.dev/' \
--header 'Content-Type: application/json' \
--data '{
"prompt": "say a joke"
}'
Well Done. Our app is live, and we can now make use of Llama3.
You can make API Calls using Chat, too, by making calls in the following format:
curl --location 'https://llm-app-damp-sun.2000-aman-sinha.workers.dev/' \
--header 'Content-Type: application/json' \
--data '{
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "How many world cups has france won?" }
]
}'
You can check various models available on Cloudflare AI here:
Try multiple models, see what fits best for your use case, and move forward.