
There are a number of interesting examples of generating text with AI including sports stats and fun/ridiculous costume ideas. In a recent toward data science article Eugene Hotaj uses language models to predict new Beatles’ lyrics. Hotaj starts off with a simple bigrams approach which predicts the next word in a series based on probability. The resulting lyrics are pretty obtuse:
A never-before-seen Beatles’ song
Here’s a small snippet of one of the songs the bigram model generates:
She’s so I love her heart.
Well they are; they said I’m so many,
She no surprise
When you’re mine
Sad and the Amsterdam Hilton
they make my way,
Yes I wait a boy been born with a rich man,
all share
Hotaj discusses the limitations of bigrams, trigrams, and n-grams highlighting that this approach will eventually overfit returning actual Beatles’ songs. To create NEW Beatles’ lyrics, Hotaj decided to use OpenAI’s GPT-2 algorithm. The GPT-2 algorithm is a transformer model that can generate highly realistic text. Although the full-sized model was not released there are a number of smaller versions of the model available for use. Utilizing transfer methodology, Hotaj tuned the trained model to generate new lyrics from the Beatles lyric dataset. Hotaj gives an example of underfit lyrics, overfit lyrics and some that are just right (after fine-tuning for ~350 batches). The model learns pretty quickly that it needs to give the song a title and attribution (which is included in the data sets). Example lyrics below!:
Woman in Black
Lennon & McCartney
I’d make a scene
If you don’t want me to appear
You might as well leave me alone.
I’m near death and I’m in love
It’s not quite Lennon and McCartney but it’s a start! Take a look at the article on towardsdatascience.com if you’d like to see some more of the NEW GPT2 Beatles’ lyrics. If you’d like to take a look at the code, check out Eugene Hotaj on GitHub. If you’d like to learn how to use the GPT-2 model for your next language project check out this post.