AI Learns to Cheat at Hide and Seek #OpenAI #HideandSeek #MachineLearning #ArtificialIntelligence #ReinforcementLearning @OpenAI
OpenAI recently posted on Twitter about teaching computer agents ‘hide and seek’. The agents learned at least six different strategies for playing the game and eventually learned a few cheats:
We’ve observed AIs discovering complex tool use while competing in a simple game of hide-and-seek. They develop a series of six distinct strategies and counter strategies, ultimately using tools in the environment to break our simulated physics.
In the simulations, seekers are incentivized to maintain line of sight of hiders and hiders are incentivized to avoid line of sight from seekers. The agents environments contain various shelters including cubicles, movable partitions, blocks and ramps. That said, there is no built-in incentive for agents to interact with objects around them. The first stages of training see random movement for hiders and seekers. After many rounds, six different strategies and counter strategies emerged.
As agents train against each other in hide-and-seek, as many as six distinct strategies emerge. Each new strategy creates a previously nonexistent pressure for agents to progress to the next stage. Note that there are no direct incentives for agents to interact with objects or to explore; rather, the emergent strategies shown below are a result of the autocurriculum induced by multi-agent competition and the simple dynamics of hide-and-seek.
Chasing – The seekers learned to increase line of sight by chasing after the hiders.
Finding or Building Shelter – the hiders learn to block doors or build shelters to prevent line of sight from seekers. This is akin to locking yourself in a closet when playing in real life. Smart but not necessarily in the spirit of the game.
Ramp Usage – To level the playing field for the seekers there are also ramps in the simulation. These ramps allow agents to go over cubicle or partition walls by moving a ramp to a the shelter and climbing over.
Ramp Defense – Hiders learned to bring the ramp into their hide-y hole with them to prevent seekers from using them. In some cases they would lock them in place instead which brings us to #5…where AI breaks physics and learns to cheat.
Cube Surfing – If a ramp is locked in place, the seekers can’t move it to a partition wall to climb over. The solution? Cheat…hard or SURF A CUBE. The seekers did this by moving an unlocked cube over to the locked ramp which they used to climb on top of the cube. Then, the agents would ‘surf’ the cube over to the partition wall and climb over. The simulation physics somewhat accidentally allowed agents to create magic carpet cubes…
Cube Surfing Defense – Hiders countered the cube surfing offensive by locking all boxes walls and ramps in place so that they could not be used to climb over the partition walls.
We use the same training infrastructure and algorithms used to train OpenAI Five and Dactyl. However, in our environment each agent acts independently, using its own observations and hidden memory state. Agents use an entity-centric state-based representation of the world, which is permutation invariant with respect to objects and other agents.
…Agent policies are trained with self-play and Proximal Policy Optimization. During optimization, agents can use privileged information about obscured objects and other agents in their value function.
We’ve provided further evidence that human-relevant strategies and skills, far more complex than the seed game dynamics and environment, can emerge from multi-agent competition and standard reinforcement learning algorithms at scale.
We are angry, frustrated, and in pain because of the violence and murder of Black people by the police because of racism. We are in the fight AGAINST RACISM. George Floyd was murdered, his life stolen. The Adafruit teams have specific actions we’ve done, are doing, and will do together as a company and culture. We are asking the Adafruit community to get involved and share what you are doing. The Adafruit teams will not settle for a hash tag, a Tweet, or an icon change. We will work on real change, and that requires real action and real work together. That is what we will do each day, each month, each year – we will hold ourselves accountable and publish our collective efforts, partnerships, activism, donations, openly and publicly. Our blog and social media platforms will be utilized in actionable ways. Join us and the anti-racist efforts working to end police brutality, reform the criminal justice system, and dismantle the many other forms of systemic racism at work in this country, read more @ adafruit.com/blacklivesmatter
Stop breadboarding and soldering – start making immediately! Adafruit’s Circuit Playground is jam-packed with LEDs, sensors, buttons, alligator clip pads and more. Build projects with Circuit Playground in a few minutes with the drag-and-drop MakeCode programming site, learn computer science using the CS Discoveries class on code.org, jump into CircuitPython to learn Python and hardware together, TinyGO, or even use the Arduino IDE. Circuit Playground Express is the newest and best Circuit Playground board, with support for CircuitPython, MakeCode, and Arduino. It has a powerful processor, 10 NeoPixels, mini speaker, InfraRed receive and transmit, two buttons, a switch, 14 alligator clip pads, and lots of sensors: capacitive touch, IR proximity, temperature, light, motion and sound. A whole wide world of electronics and coding is waiting for you, and it fits in the palm of your hand.