Automating Form Filling with AI — Part 2

Aman Kumar

--

In our previous article, we explored how to automate form filling using Playwright Python — eliminating the tediousness of repetitive tasks. Now, we’re taking automation to a whole new level by enabling AI agents to control browsers autonomously. By combining Gemini/LLM with Browser Use, we not only fill out forms with dynamic, realistic data but also lay the groundwork for tackling far more complex tasks.

Created using DallE

The Power of a Simple Task

At first glance, automating a form fill might seem like a basic exercise. However, this simplicity is exactly what makes it a powerful demonstration of how AI can autonomously complete tasks. Our current example focuses on one specific task: using an AI agent to navigate to a form and fill it with genuine-looking data. We introduce a variable, person_name, so you can easily change the persona and see how the AI adapts its output.

Yet, this simple example is just the tip of the iceberg.

Scaling Up with More Tools

Imagine if you had an AI agent equipped with a suite of tools instead of just one. With additional capabilities — such as data extraction, multi-tab management, custom action handling, and error recovery — the complexity of tasks you can automate grows exponentially. In a more advanced setup, an AI agent could:

  • Perform Multi-Step Workflows: Navigate between multiple pages, extract data, analyze it, and then perform actions based on that analysis.
  • Integrate with Databases and APIs: Automatically fetch, process, and store data from various sources.
  • Handle Dynamic Web Elements: Interact with websites that require real-time decision-making, such as booking systems or complex multi-stage forms.
  • Self-Correct Errors: Identify when something goes wrong, adjust its approach, and retry actions without human intervention.

With Browser Use’s intelligent control over web interactions and the robust data generation from Gemini/LLM, even the most intricate tasks become achievable. This simple form-filling example is our starting point — a demonstration of what’s possible when AI can take autonomous action in a controlled environment.

Project Setup: Getting Started

Before diving into more complex integrations, let’s review our project setup for this form-filling task:

  1. Install Required Packages:
    Create a virtual environment and install the necessary libraries:
python -m venv venv && source venv/bin/activate  
# On Windows, use `venv\Scripts\activate`
pip install playwright langchain_google_genai browser_use aiofiles

2. Then, install Playwright browser binaries:

playwright install
  1. Configure Your Environment:
    Create a .env file in your project root with your Gemini API key, which you can obtain from AI Studio by Google:
GEMINI_API_KEY=your_gemini_api_key_here

Understand Browser Use:
Browser Use empowers AI agents to interact with web pages by extracting interactive elements and managing browser actions autonomously. Whether it’s simple form filling or multi-step complex workflows, this tool is designed to enable AI-driven automation.

Integrating AI and Browser Use: The Code

Below is our streamlined script that demonstrates autonomous form filling. The code integrates Gemini/LLM with Browser Use, and introduces a customizable variable person_name:

Code

Code Link: https://gist.github.com/onlyoneaman/3bdfe85a87002734f6e6b73f72742634

You can checkout the result after filling out in this sheet: https://docs.google.com/spreadsheets/d/1pdNsoW2yAUKwNdaR9xZ2cbzHwyvXfNX6rJQFHaSS4qE/edit?usp=sharing

How It Works:

  1. Environment Configuration:
    The script loads your Gemini API key from the .env file and initializes the AI with the specified model.
  2. Customizing the Persona:
    By simply changing the person_name variable, you can simulate different user identities. This small detail enhances the realism of your tests.
  3. Autonomous Browser Interaction:
    Using Browser Use’s Agent, the AI autonomously navigates to the form URL, extracts interactive elements, and fills in the form with data generated by the Gemini model.
  4. Foundation for Complexity:
    While this example automates a simple form fill, the same architecture can be extended to handle much more complex tasks. With additional tools, the AI agent can integrate multiple steps, interact with various APIs, and perform intricate workflows with minimal human intervention.

Final Thoughts

This enhanced automation setup is more than just a script to fill out forms — it’s a glimpse into the future of autonomous task management. The simplicity of our example belies its potential; as AI agents are equipped with more tools and capabilities, they’ll be able to tackle increasingly complex problems and execute multi-faceted workflows. From simple form submissions to elaborate, multi-step processes, the possibilities are endless.

By integrating Gemini/LLM with Browser Use, we’re not just automating tasks — we’re empowering AI agents to control browsers, analyze data, and even make decisions on their own. The future of automation is here, and it’s only going to get smarter.

Happy automating, and may your AI agents evolve to handle every challenge with finesse!

If this article was helpful, give it some claps. I’m deeply involved with AI and LLMs. Follow me on Medium for more insights.
Feel free to say hi or connect via Twitter and LinkedIn.

--

--

No responses yet