0

AI Learns to Cheat at Hide and Seek #OpenAI #HideandSeek #MachineLearning #ArtificialIntelligence #ReinforcementLearning @OpenAI

Three children playing “hide and seek” in a forest. Signed “Meyerheim”, probably by Friedrich Eduard Meyerheim (1808-1879). This work is in the public domain in its country of origin and other countries and areas where the copyright term is the author’s life plus 70 years or fewer.

 

OpenAI recently posted on Twitter about teaching computer agents ‘hide and seek’. The agents learned at least six different strategies for playing the game and eventually learned a few cheats:

We’ve observed AIs discovering complex tool use while competing in a simple game of hide-and-seek. They develop a series of six distinct strategies and counter strategies, ultimately using tools in the environment to break our simulated physics.

In the simulations, seekers are incentivized to maintain line of sight of hiders and hiders are incentivized to avoid line of sight from seekers. The agents environments contain various shelters including cubicles, movable partitions, blocks and ramps. That said, there is no built-in incentive for agents to interact with objects around them. The first stages of training see random movement for hiders and seekers. After many rounds, six different strategies and counter strategies emerged.

As agents train against each other in hide-and-seek, as many as six distinct strategies emerge. Each new strategy creates a previously nonexistent pressure for agents to progress to the next stage. Note that there are no direct incentives for agents to interact with objects or to explore; rather, the emergent strategies shown below are a result of the autocurriculum induced by multi-agent competition and the simple dynamics of hide-and-seek.

Strategies:

  1. Chasing – The seekers learned to increase line of sight by chasing after the hiders.
  2. Finding or Building Shelter – the hiders learn to block doors or build shelters to prevent line of sight from seekers. This is akin to locking yourself in a closet when playing in real life. Smart but not necessarily in the spirit of the game.
  3. Ramp Usage – To level the playing field for the seekers there are also ramps in the simulation. These ramps allow agents to go over cubicle or partition walls by moving a ramp to a the shelter and climbing over.
  4. Ramp Defense – Hiders learned to bring the ramp into their hide-y hole with them to prevent seekers from using them. In some cases they would lock them in place instead which brings us to #5…where AI breaks physics and learns to cheat.
  5. Cube Surfing – If a ramp is locked in place, the seekers can’t move it to a partition wall to climb over. The solution? Cheat…hard or SURF A CUBE. The seekers did this by moving an unlocked cube over to the locked ramp which they used to climb on top of the cube. Then, the agents would ‘surf’ the cube over to the partition wall and climb over. The simulation physics somewhat accidentally allowed agents to create magic carpet cubes…
  6. Cube Surfing Defense – Hiders countered the cube surfing offensive by locking all boxes walls and ramps in place so that they could not be used to climb over the partition walls.

Model:

OpenAI utilized the algorithms from Dactyl and OpenAI Five for training. The agent policies were trained with self-play and Proximal Policy Optimization and are independent in the simulation environment. Agents have unique observations and hidden memory state.

We use the same training infrastructure and algorithms used to train OpenAI Five and Dactyl. However, in our environment each agent acts independently, using its own observations and hidden memory state. Agents use an entity-centric state-based representation of the world, which is permutation invariant with respect to objects and other agents.

…Agent policies are trained with self-play and Proximal Policy Optimization. During optimization, agents can use privileged information about obscured objects and other agents in their value function.

In Conclusion:

We’ve provided further evidence that human-relevant strategies and skills, far more complex than the seed game dynamics and environment, can emerge from multi-agent competition and standard reinforcement learning algorithms at scale.

If you’d like to learn more about these Hide and Seek Bots checkout OpenAI’s blog post. Here’s the code for the world generation and the environment generation and the publication about this work.


Stop breadboarding and soldering – start making immediately! Adafruit’s Circuit Playground is jam-packed with LEDs, sensors, buttons, alligator clip pads and more. Build projects with Circuit Playground in a few minutes with the drag-and-drop MakeCode programming site, learn computer science using the CS Discoveries class on code.org, jump into CircuitPython to learn Python and hardware together, TinyGO, or even use the Arduino IDE. Circuit Playground Express is the newest and best Circuit Playground board, with support for CircuitPython, MakeCode, and Arduino. It has a powerful processor, 10 NeoPixels, mini speaker, InfraRed receive and transmit, two buttons, a switch, 14 alligator clip pads, and lots of sensors: capacitive touch, IR proximity, temperature, light, motion and sound. A whole wide world of electronics and coding is waiting for you, and it fits in the palm of your hand.

Join 14,000+ makers on Adafruit’s Discord channels and be part of the community! http://adafru.it/discord

CircuitPython 2019!

Have an amazing project to share? The Electronics Show and Tell is every Wednesday at 7:30pm ET! To join, head over to YouTube and check out the show’s live chat – we’ll post the link there.

Join us every Wednesday night at 8pm ET for Ask an Engineer!

Follow Adafruit on Instagram for top secret new products, behinds the scenes and more https://www.instagram.com/adafruit/


Maker Business — Re-evaluating Amazon after its acquisition of Whole Foods

Wearables — It’s not a deal breaker

Electronics — Higher isn’t always better

Biohacking — Vitamin-C + Gelatin for Accelerated Recovery

Python for Microcontrollers — Supercharged Supercon with CircuitPython, Quoth the Raven MOAR PYTHON! #Python #Adafruit #CircuitPython #PythonHardware @circuitpython @micropython @ThePSF @Adafruit

Adafruit IoT Monthly — The S in IoT is for Security, Amazon announces Sidewalk and more!

Microsoft MakeCode — Xenomorph candy bucket and spooky workshops with MakeCode!

Get the only spam-free daily newsletter about wearables, running a "maker business", electronic tips and more! Subscribe at AdafruitDaily.com !



No Comments

No comments yet.

Sorry, the comment form is closed at this time.