AI Learns to Cheat at Hide and Seek #OpenAI #HideandSeek #MachineLearning #ArtificialIntelligence #ReinforcementLearning @OpenAI

Three children playing “hide and seek” in a forest. Signed “Meyerheim”, probably by Friedrich Eduard Meyerheim (1808-1879). This work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer.


OpenAI recently posted on Twitter about teaching computer agents ‘hide and seek’. The agents learned at least six different strategies for playing the game and eventually learned a few cheats:

We’ve observed AIs discovering complex tool use while competing in a simple game of hide-and-seek. They develop a series of six distinct strategies and counter strategies, ultimately using tools in the environment to break our simulated physics.

In the simulations, seekers are incentivized to maintain line of sight of hiders and hiders are incentivized to avoid line of sight from seekers. The agents environments contain various shelters including cubicles, movable partitions, blocks and ramps. That said, there is no built-in incentive for agents to interact with objects around them. The first stages of training see random movement for hiders and seekers. After many rounds, six different strategies and counter strategies emerged.

As agents train against each other in hide-and-seek, as many as six distinct strategies emerge. Each new strategy creates a previously nonexistent pressure for agents to progress to the next stage. Note that there are no direct incentives for agents to interact with objects or to explore; rather, the emergent strategies shown below are a result of the autocurriculum induced by multi-agent competition and the simple dynamics of hide-and-seek.


  1. Chasing – The seekers learned to increase line of sight by chasing after the hiders.
  2. Finding or Building Shelter – the hiders learn to block doors or build shelters to prevent line of sight from seekers. This is akin to locking yourself in a closet when playing in real life. Smart but not necessarily in the spirit of the game.
  3. Ramp Usage – To level the playing field for the seekers there are also ramps in the simulation. These ramps allow agents to go over cubicle or partition walls by moving a ramp to a the shelter and climbing over.
  4. Ramp Defense – Hiders learned to bring the ramp into their hide-y hole with them to prevent seekers from using them. In some cases they would lock them in place instead which brings us to #5…where AI breaks physics and learns to cheat.
  5. Cube Surfing – If a ramp is locked in place, the seekers can’t move it to a partition wall to climb over. The solution? Cheat…hard or SURF A CUBE. The seekers did this by moving an unlocked cube over to the locked ramp which they used to climb on top of the cube. Then, the agents would ‘surf’ the cube over to the partition wall and climb over. The simulation physics somewhat accidentally allowed agents to create magic carpet cubes…
  6. Cube Surfing Defense – Hiders countered the cube surfing offensive by locking all boxes walls and ramps in place so that they could not be used to climb over the partition walls.


OpenAI utilized the algorithms from Dactyl and OpenAI Five for training. The agent policies were trained with self-play and Proximal Policy Optimization and are independent in the simulation environment. Agents have unique observations and hidden memory state.

We use the same training infrastructure and algorithms used to train OpenAI Five and Dactyl. However, in our environment each agent acts independently, using its own observations and hidden memory state. Agents use an entity-centric state-based representation of the world, which is permutation invariant with respect to objects and other agents.

…Agent policies are trained with self-play and Proximal Policy Optimization. During optimization, agents can use privileged information about obscured objects and other agents in their value function.

In Conclusion:

We’ve provided further evidence that human-relevant strategies and skills, far more complex than the seed game dynamics and environment, can emerge from multi-agent competition and standard reinforcement learning algorithms at scale.

If you’d like to learn more about these Hide and Seek Bots checkout OpenAI’s blog post. Here’s the code for the world generation and the environment generation and the publication about this work.

Adafruit publishes a wide range of writing and video content, including interviews and reporting on the maker market and the wider technology world. Our standards page is intended as a guide to best practices that Adafruit uses, as well as an outline of the ethical standards Adafruit aspires to. While Adafruit is not an independent journalistic institution, Adafruit strives to be a fair, informative, and positive voice within the community – check it out here:

Join Adafruit on Mastodon

Adafruit is on Mastodon, join in!

Stop breadboarding and soldering – start making immediately! Adafruit’s Circuit Playground is jam-packed with LEDs, sensors, buttons, alligator clip pads and more. Build projects with Circuit Playground in a few minutes with the drag-and-drop MakeCode programming site, learn computer science using the CS Discoveries class on, jump into CircuitPython to learn Python and hardware together, TinyGO, or even use the Arduino IDE. Circuit Playground Express is the newest and best Circuit Playground board, with support for CircuitPython, MakeCode, and Arduino. It has a powerful processor, 10 NeoPixels, mini speaker, InfraRed receive and transmit, two buttons, a switch, 14 alligator clip pads, and lots of sensors: capacitive touch, IR proximity, temperature, light, motion and sound. A whole wide world of electronics and coding is waiting for you, and it fits in the palm of your hand.

Have an amazing project to share? The Electronics Show and Tell is every Wednesday at 7pm ET! To join, head over to YouTube and check out the show’s live chat – we’ll post the link there.

Join us every Wednesday night at 8pm ET for Ask an Engineer!

Join over 36,000+ makers on Adafruit’s Discord channels and be part of the community!

CircuitPython – The easiest way to program microcontrollers –

Maker Business — “Packaging” chips in the US

Wearables — Enclosures help fight body humidity in costumes

Electronics — Transformers: More than meets the eye!

Python for Microcontrollers — Python on Microcontrollers Newsletter: Silicon Labs introduces CircuitPython support, and more! #CircuitPython #Python #micropython @ThePSF @Raspberry_Pi

Adafruit IoT Monthly — Guardian Robot, Weather-wise Umbrella Stand, and more!

Microsoft MakeCode — MakeCode Thank You!

EYE on NPI — Maxim’s Himalaya uSLIC Step-Down Power Module #EyeOnNPI @maximintegrated @digikey

New Products – Adafruit Industries – Makers, hackers, artists, designers and engineers! — #NewProds 7/19/23 Feat. Adafruit Matrix Portal S3 CircuitPython Powered Internet Display!

Get the only spam-free daily newsletter about wearables, running a "maker business", electronic tips and more! Subscribe at !

No Comments

No comments yet.

Sorry, the comment form is closed at this time.