Skip to content
ClaudeClaude

Agent Battle: Mine the most diamonds in 45 minutes

Head-to-head agent build-off on a shared game harness. Build an agent, submit runs, watch scores and a live game feed stream to the leaderboard on the big screen. Can you build the top-performing agent in the room?

May 23, 20268mWatch on YouTube ↗

CHAPTERS

  1. Workshop kickoff: Agent battle premise and goal

    Ben opens the session by framing the workshop as an “agent battle” where participants build agents that mine diamonds in Minecraft. The focus is on moving quickly due to a tight timebox and using agents (not humans) to play the game.

  2. What participants will learn: managed agents, configuration, and improvement loops

    Ben outlines three learning objectives: deploying a managed agent, understanding how configuration choices affect behavior, and iterating via eval-driven hill climbing. The chapter sets expectations that this is both a game and a practical agent-engineering exercise.

  3. Competition rules: runs, timing, leaderboard, and token-efficiency tiebreakers

    The rules are explained: participants have a limited build window, can submit multiple attempts but only their best counts, and each scored run is a short mining session. If tied on diamonds, token efficiency decides—pushing participants to optimize prompts and model selection.

  4. Harness overview: Minecraft clone + Mineflayer bot (no visuals)

    Jeff describes the technical harness: a Minecraft-like environment connected to a Mineflayer bot controlled through tools rather than a visual interface. Participants mainly optimize agent behavior using provided capabilities rather than manual play.

  5. Fair starting conditions: fixed seed and consistent reset kit

    To keep the competition fair, everyone starts from the same world seed and receives the same starting kit on each reset. This removes randomness and emphasizes agent tuning and decision-making.

  6. Where to edit: repo structure and the main agent file

    Jeff points participants to the code entry point: the included repository and the `my_agent.py` file where key settings live. This is where participants adjust model selection, system prompt, and skills/integrations to improve mining outcomes.

  7. Extensibility options: skills and MCP server customization

    Participants can go beyond prompt/model tuning by swapping skills and optionally adjusting the MCP server setup. Jeff frames these as advanced levers for changing the agent’s capabilities and behavior.

  8. Iteration workflow: quick evals and rapid experimentation

    Jeff encourages iterative development: run evals, adjust parameters, and repeat to improve performance. A faster eval set is mentioned as a way to iterate without spending a full run each time.

  9. Countdown begins: participants start runs and leaderboard activity

    The session transitions into active competition as the timer is reset and participants begin. The leaderboard starts populating with early results, indicating live progress tracking.

  10. Troubleshooting during the battle: connectivity and setup support

    Jeff notes that some attendees are experiencing connection issues (e.g., Cloudflare), likely due to conference Wi‑Fi load. He references a command workaround and a CloudCode skill in the repo to help with setup problems.

  11. Final minutes drama: ties, suspicious token reporting, and a record score

    As time winds down, the leaderboard shows ties and an anomaly where a top participant appears to have zero tokens, raising suspicion. Another participant breaks the apparent ceiling (19 diamonds) near the end, creating a clear winner.

  12. Time’s up: winner announced and post-game verification

    The battle ends with a clear winner, but second and third place require investigation due to the token anomaly. Facilitators ask top finishers to come up and discuss techniques while they sort out rankings.

Get more out of YouTube videos.

High quality summaries for YouTube videos. Accurate transcripts to search & find moments. Powered by ChatGPT & Claude AI.

Add to Chrome