AutoGPT Forge: Interacting with your Agent

Craig Swift
8 min readSep 29


After taking your first steps into the realm of AutoGPT Forge and unraveling the blueprint of an AI agent, you’re now poised to delve deeper. This next phase of our guide is centered around the hands-on aspect of the journey, emphasizing the practicalities of agent interaction using the user interface.

In this tutorial, we’ll guide you step-by-step on how to efficiently communicate with your agent through the UI, ensuring that you can not only create tasks but also monitor and analyze them seamlessly. Moreover, the significance of benchmarks in the AI ecosystem cannot be understated. Thus, we’ll walk you through the process of running benchmarks, understanding your agents performance, and critically, showcasing how to submit those all-important scores to the leaderboard.

This guide is designed to transition your theoretical knowledge into tangible skills. By the end, you should feel confident in navigating the AutoGPT Forge UI, conducting benchmarks, and participating competitively on the leaderboard.

So, with the foundational understanding in place from our previous tutorials, let’s venture further and master the interactive world of AutoGPT Forge. Let’s get started!

Initiating the Agent

Before we dive into the UI, ensure your agent is up and running.

In the terminal, navigate to the root of the repository you cloned during our introductory tutorial. If you missed this step or need a refresher, you can follow along with our initial guide here.

Execute the following command to start your agent:

./run agent start agent_name

Make sure to replace agent_name with the name of your specific agent.

Accessing the AutoGPT UI

AutoGPT Login Screen

Once your agent is running, open up your preferred web browser and navigate to the AutoGPT UI, you can find it at http://localhost:8000 . The above screenshot is what you’ll be greeted with — the login screen for AutoGPT UI.

To access the UI, authenticate using either your GitHub or Google account. These platforms ensure a secure login process.

The Task Screen

Upon successful authentication, you’ll be directed to the main hub of the UI: the task screen.

The Task Screen and Landing page after logging in

The design is reminiscent of ChatGPT, providing a chat-like interface. Those familiar with ChatGPT will find this layout intuitive.

Let’s break down the components of this screen:

Side Navigation: On the top-left, you’ll notice three primary buttons:

  • Tasks: This is your current screen, where you interact with and monitor tasks.
  • Benchmarking: Transition to this section when you’re ready to run benchmarks on your agent.
  • Settings: Here, you can adjust preferences, manage agent configurations, and customize the UI experience.

The main screen area is split into two sections:

  • Tasks List: Just to the right of the Side Nav Bar, this section lists all the tasks or benchmark test suites you’ve run. It offers a concise view of previous interactions, making it easier to revisit or analyze specific tasks.
  • Task Interface: On the right side of the screen, you’ll find the main interaction space. Input tasks, view agent responses, or monitor ongoing tasks in real-time from this panel.

Creating Tasks with AutoGPT Forge UI

Now that you’re familiar with the layout of the AutoGPT Forge UI, let’s delve into the process of creating and managing tasks.

Initiating a New Task

Begin by locating the New Task button at the top of the Tasks List section and click on it.

Inputting Task Details

After initiating a new task, you can input the desired task in the Task Interface. This is where you’ll be interacting with your agent.

Sending Tasks to the Agent

Once you’ve keyed in your task, you’ll see two buttons to the right of the input box. Click on the first button (Send) to transmit the task to the agent. This action executes the initial step of the task as demonstrated below:

Continued Interaction

The dialogue doesn’t end there! Feel free to keep conversing with your agent by typing in more messages and pressing the send button, as seen here:

Leveraging the Continuous Mode

AutoGPT Forge UI also introduces the feature of **Continuous Mode**. By using this mode, the agent can execute steps in a loop until the task reaches its conclusion.

However, a word of caution: when you click on the **second button** in the task input box, a warning message appears. This warning is vital as it cautions users about the potential for agents to get trapped in endless loops, especially if the task conditions aren’t explicit.

For those not entirely sure about the agent’s behavior or the task’s specifics, it’s advisable to advance through the task one step at a time. You can do this by repeatedly hitting the send button and inputting “y” until the task completes.

Benchmarking: Gauging Your Agent’s Proficiency

Benchmarking is at the heart of agent development. It’s the yardstick against which the agent’s prowess and capabilities are measured. Benchmarks serve as structured challenges, pushing the bounds of what agents can do. Surpass them all, and you’re looking at one of the world’s most powerful generalist agents!

Here’s a guided walkthrough to understanding and executing benchmarks in AutoGPT Forge UI:

Accessing the Benchmarking Page

Kick things off by clicking on the **Trophy symbol** located on the left of the screen.

The benchmarking screen

This screen presents a tree structure, showcasing the myriad of challenges available for your agent to tackle.

Navigating Challenge Categories

At the top-left corner, there’s a dropdown menu. It lets you peruse and select from specific challenge categories. The General category is an umbrella that encompasses all challenges.

Selecting Challenge Trees

Dive deeper into the categories:

Data Challenges

Data Challenges Tree

Scrape and Synthesise Challenges

Scrape and Synthesise Tree

Coding Challenges

Coding Challenges Tree

Launching a Benchmark Suite

Back on the general page, click on any node. This action reveals a pathway of escalating challenges that lead to your selected node. This progressive set of tests constitutes what we term as a suite.

Selecting a Node

To initiate the suite, click the Initiate Test Suite button.

Monitoring the Benchmark Run

Once the suite starts, the status of each individual challenge becomes visible. It offers a real-time update, showing:

  • Passed Challenges: Denoted by a green color.
  • Failed Challenges: Marked in red.
  • In-progress Challenge: Indicated by a rotating circle.
Running Benchmarks

The right side of the screen actively displays the currently executing task, letting you keep tabs on the agent’s real-time performance.

Completed Benchmark

Here’s a glimpse of a more comprehensive test suite:

Failed Benchmark Mid-run

It’s crucial to note: if a challenge fails mid-suite, subsequent challenges halt, and the run terminates.

Benchmarking is both a measure and a motivator. It’s a testament to how far your agent has come and a signpost pointing towards the future strides it can make. So, challenge your agent, learn from the results, iterate, and push those boundaries!

Submitting Your Agent’s Prowess to the Leaderboard

With your benchmarking complete, it’s time to share your achievements and see how your agent stacks up against others! Here’s how you can submit your scores to the leaderboard:

Initiating Submission

After successfully running a test suite, look out for the Submit to Leaderboard button which will turn green, indicating its active state.

Clicking on it will usher you to the submission form:

Submission Form

Filling Out the Submission Form

Here, you’ll need to input a few crucial details:

- Team Name: Input your team’s name. Participants of the AutoGPT Arena Hacks should ensure this name precisely matches the team name registered on LabLab to avoid any discrepancies.

- Github Repo URL: This is the web address of the forked repository you created. If you’re drawing a blank, head to your GitHub profile page; it will list all your repositories.

- Commit SHA: The backbone of your submission. Make sure to commit all changes you’ve made to your repository. Navigate to the root of your repo and run the command git rev-parse HEAD. This will yield a unique git hash. Copy this hash and paste it into the designated field. This step is paramount as it provides a snapshot of your work, ensuring traceability of changes made to your agent.

Submit and Compare

Once the form’s complete, click Submit. Your scores are now on their way to the leaderboard. To see how you fare in comparison to others, visit the leaderboard.

AutGPT Leaderboard

Congratulations! By submitting to the leaderboard, you’ve not only showcased your agent’s capabilities but also taken a leap in the community-driven evolution of AI. Check back frequently to see where you stand and keep refining your agent to climb those ranks!

Wrap Up

And there you have it — a comprehensive tutorial on navigating the AutoGPT Forge UI, interacting with your agent, and benchmarking its capabilities. From initiating tasks to submitting your scores on the leaderboard, you are now well-versed in harnessing the power of the Forge. But, as with all great stories, this is just one chapter.

As you might have guessed, there’s more to come. Our next tutorial will focus on the heart of what makes an agent truly intelligent — the logic behind it. In “Crafting Intelligent Agent Logic”, we’re shifting gears from interface interactions to diving deep into the code, allowing you to witness the beauty of an LLM operating as the cognitive center of our agent. Through a simple yet captivating task, you’ll see firsthand how our agent, with just a broad instruction, can smartly deduce steps and execute commands seamlessly.

Stay tuned, as we’ll embark on this coding adventure, unraveling the potential of LLMs in action. Trust us, you won’t want to miss it. Until then, keep experimenting and exploring the vast horizons of AI with AutoGPT!



Craig Swift

AI expert & former CTO. Firm believer in open-source. Founding AI Engineer at AutoGPT. Passionate about the future of work!