Here’s more via 404 media:
Doom’s E1M1 – The Hanger, the iconic first person shooter’s first level, is often used to showcase how the open source game can run on almost any device you can think of. The video below is novel not because of what device it’s running on, but how it’s running at all. What you’re looking at is not the Doom game engine, but a diffusion model, a type of generative AI model most commonly used to generate media, that’s responding to player input in real time.
This is “GameNGen” (pronounced “game engine”), and is the work of researchers from Google, DeepMind, and Tel Aviv University. They call it “the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality.” Without getting too deep into the weeds, essentially the way it works is that the diffusion model is trained gameplay footage of Doom to produce the next frame based on the frames that came before it and player input.
All generative AI models essentially work like this. They are trained on massive amounts of data in order to predict what the next word, frame, or pixel is to automatically generate the desired output. GameNGen has impressively extended this method to a somewhat functioning, real time interactive video game. At the moment, GameNGen is running at about 20 frames per second, which is incredibly slow, especially for an old video game, but it does look like Doom. According to the GameNGen paper, 10 human raters presented with 130 random short gameplay clips had only a slightly better chance than random of telling the difference between a GameNGen-generated clip and a “real” Doom gameplay clip. I think that I, a Doom scholar, would do a lot better than that, but that’s neither here nor there.