Getting Started with Tokenization, Transformers and NLP #NLP #Tokenization #MachineLearning #Transformers @huggingface @MorganFunto

Screenshot of @huggingface Tweet announcing the release of several hands-on tutorials with tokenizers, transformers, and pipelines.

 

Earlier this month @huggingface released a number of notebooks that walk users through some NLP basics. The three-part series, written by @MorganFunto, covers tokenizers, transformers, and pipelines utilizing Hugging Face’s transformer library. The notebooks cover the basics on a high level and get you working in the code quickly. The notebooks written in Colab allows anyone to run the code in the browser. Here’s the intro from the tokenization notebook:

Before going deep into any Machine Learning or Deep Learning Natural Language Processing models, every practitioner should find a way to map raw input strings to a representation understandable by a trainable model. One very simple approach would be to split inputs over every space and assign an identifier to each word.

The repo contains official notebooks provided by hugging face but also has a call for transformer notebooks from the community:

…we would like to list here interesting content created by the community. If you wrote some notebook(s) leveraging transformers and would like be listed here, please open a Pull Request and we’ll review it so it can be included here.

In addition to the three-part series described above, there are notebooks on “How to train a language model” and “How to generate text“.  You can find more details about the transformer library in their repo or paper. You can also use a transformer to generate text in the browser with their “write with transformer” tool.

 

Written by Rebecca Minich, Product Analyst, Data Science at Google. Opinions expressed are solely my own and do not express the views or opinions of my employer.

Have an amazing project to share? The Electronics Show and Tell is every Wednesday at 7:30pm ET! To join, head over to YouTube and check out the show’s live chat and our Discord!

Join us every Wednesday night at 8pm ET for Ask an Engineer!

Join over 38,000+ makers on Adafruit’s Discord channels and be part of the community! http://adafru.it/discord

CircuitPython – The easiest way to program microcontrollers – CircuitPython.org


New Products – Adafruit Industries – Makers, hackers, artists, designers and engineers! — New Products 11/15/2024 Featuring Adafruit bq25185 USB / DC / Solar Charger with 3.3V Buck Board! (Video)

Python for Microcontrollers – Adafruit Daily — Python on Microcontrollers Newsletter: A New Arduino MicroPython Package Manager, How-Tos and Much More! #CircuitPython #Python #micropython @ThePSF @Raspberry_Pi

EYE on NPI – Adafruit Daily — EYE on NPI Maxim’s Himalaya uSLIC Step-Down Power Module #EyeOnNPI @maximintegrated @digikey

Adafruit IoT Monthly — The 2024 Recap Issue!

Maker Business – Adafruit Daily — Apple to build another chip at TSMC Arizona

Electronics – Adafruit Daily — SMT Tip – Stop moving around!

Get the only spam-free daily newsletter about wearables, running a "maker business", electronic tips and more! Subscribe at AdafruitDaily.com !


No Comments

No comments yet.

Sorry, the comment form is closed at this time.