Computational linguistics has dramatically changed the way researchers study and understand language. The ability to number-crunch huge amounts of words for the first time has led to entirely new ways of thinking about words and their relationship to one another.
This number-crunching shows exactly how often a word appears close to other words, an important factor in how they are used. So the word Olympics might appear close to words like running, jumping, and throwing but less often next to words like electron or stegosaurus. This set of relationships can be thought of as a multidimensional vector that describes how the word Olympics is used within a language, which itself can be thought of as a vector space.
And therein lies this massive change. This new approach allows languages to be treated like vector spaces with precise mathematical properties. Now the study of language is becoming a problem of vector space mathematics.
Today, Timothy Baldwin at the University of Melbourne in Australia and a few pals explore one of the curious mathematical properties of this vector space: that adding and subtracting vectors produces another vector in the same space.
The question they address is this: what do these composite vectors mean? And in exploring this question they find that the difference between vectors is a powerful tool for studying language and the relationship between words.
First some background. The easiest way to think about words and how they can be added and subtracted like vectors is with an example. The most famous is the following: king – man + woman = queen. In other words, adding the vectors associated with the words king and woman while subtracting man is equal to the vector associated with queen. This describes a gender relationship.
Another example is: paris – france + poland = warsaw. In this case, the vector difference between paris and france captures the concept of capital city.
Baldwin and co ask how reliable this approach can be and how far it can be taken. To do this, they compare how vector relationships change according to the corpus of words studied. For example, do the same vector relationships work in the corpus of words from Wikipedia as in the corpus of words from Google News or Reuters English newswire?
Make a robot friend with Adafruit’s CRICKIT – A Creative Robotics & Interactive Construction Kit. It’s an add-on to our popular Circuit Playground Express, FEATHER and other platforms to make and program robots with CircuitPython, MakeCode, and Arduino. Start controlling motors, servos, solenoids. You also get signal pins, capacitive touch sensors, a NeoPixel driver and amplified speaker output. It complements & extends your boards so you can still use all the goodies on the microcontroller, now you have a robotics playground as well.
Have an amazing project to share? Join the SHOW-AND-TELL every Wednesday night at 7:30pm ET on Google+ Hangouts.
Join us every Wednesday night at 8pm ET for Ask an Engineer!
Maker Business — Fewer startups, and other collateral damage from the 2018 tariffs
Wearables — Light as a Worbla feather
Electronics — How to make your own magnetic field probe!
Biohacking — The State of DNA Analysis in Three Mindmaps
Python for Microcontrollers — One year of CircuitPython weeklies!
No comments yet.
Sorry, the comment form is closed at this time.