Getting Started with Tokenization, Transformers and NLP #NLP #Tokenization #MachineLearning #Transformers @huggingface @MorganFunto
Earlier this month @huggingfacereleased a number of notebooks that walk users through some NLP basics. The three-part series, written by @MorganFunto, covers tokenizers, transformers, and pipelines utilizing Hugging Face’s transformer library. The notebooks cover the basics on a high level and get you working in the code quickly. The notebooks written in Colab allows anyone to run the code in the browser. Here’s the intro from the tokenization notebook:
Before going deep into any Machine Learning or Deep Learning Natural Language Processing models, every practitioner should find a way to map raw input strings to a representation understandable by a trainable model. One very simple approach would be to split inputs over every space and assign an identifier to each word.
The repo contains official notebooks provided by hugging face but also has a call for transformer notebooks from the community:
…we would like to list here interesting content created by the community. If you wrote some notebook(s) leveraging transformers and would like be listed here, please open a Pull Request and we’ll review it so it can be included here.
Written by Rebecca Minich, Product Analyst, Data Science at Google. Opinions expressed are solely my own and do not express the views or opinions of my employer.
Adafruit publishes a wide range of writing and video content, including interviews and reporting on the maker market and the wider technology world. Our standards page is intended as a guide to best practices that Adafruit uses, as well as an outline of the ethical standards Adafruit aspires to. While Adafruit is not an independent journalistic institution, Adafruit strives to be a fair, informative, and positive voice within the community – check it out here: adafruit.com/editorialstandards
Stop breadboarding and soldering – start making immediately! Adafruit’s Circuit Playground is jam-packed with LEDs, sensors, buttons, alligator clip pads and more. Build projects with Circuit Playground in a few minutes with the drag-and-drop MakeCode programming site, learn computer science using the CS Discoveries class on code.org, jump into CircuitPython to learn Python and hardware together, TinyGO, or even use the Arduino IDE. Circuit Playground Express is the newest and best Circuit Playground board, with support for CircuitPython, MakeCode, and Arduino. It has a powerful processor, 10 NeoPixels, mini speaker, InfraRed receive and transmit, two buttons, a switch, 14 alligator clip pads, and lots of sensors: capacitive touch, IR proximity, temperature, light, motion and sound. A whole wide world of electronics and coding is waiting for you, and it fits in the palm of your hand.
Have an amazing project to share? The Electronics Show and Tell is every Wednesday at 7pm ET! To join, head over to YouTube and check out the show’s live chat – we’ll post the link there.