How AI discovers content using descriptors and k-nearest neighbors
June 7, 2024
|The world is a complicated place, so how can we advance AI to help us solve interesting problems in such a complex environment? For instance:
- How does Amazon decide what we might want to buy next (among millions of choices)?
- How can we find all the photos from our vacation last summer to the Grand Canyon if they are scattered all over the place?
- How can we get suggestions for new music to listen to based on our past choices?
- How can we find research related to a topic we care about (even if it’s in another language)?
The answers are descriptors and neighbors.
What are descriptors and nearest neighbors?
Let’s assume you invent an AI engine. Next, you feed it something (people’s purchasing histories, or images and videos, or music, or research papers) and it kicks out a descriptor: a mathematical expression that represents a compressed description of the object you fed in.
A relatable descriptor example is how you shrink your digital camera’s images by saving a photo as a JPEG (something you have been doing for years).
The descriptor, if you create it correctly, is capturing the essence of the original object.
Next, imagine you do a great job of creating descriptors for every piece of music. Then, you decide you would love to hear music that is like Miley Cyrus’ “Flowers.” One way to find similar songs is to go find the descriptor you created for “Flowers” and look for other descriptors that are nearby: we call those its nearest neighbors.
The descriptors nearest to “Flowers” are likely to be very similar songs, while the descriptors further away are likely to be less relevant. Now, we can even sort the results by relevance.
The nitty-gritty of doing this is a bit complicated, but let’s dive in enough to demystify it.
What is an n-dimensional vector?
Let’s begin thinking about dimensions with a relatable example: RGB colors. You can tell someone how much red, green, and blue the color you are looking at has, and they can enter those RGB numbers to see the same color. That is a solution that only needs three dimensions (like telling UPS the box you need to ship is four inches long, three inches wide and two inches deep).
Humans are comfortable thinking in terms of three dimensions, or a box of space.
Mathematicians have no problem thinking in terms of hundreds or thousands of dimensions. That’s handy when it comes to figuring out descriptors for complicated things you might be interested in.
The current state of the art for descriptors are things called n-dimensional vectors (where n is a specific big number, depending on what you are trying to accomplish).
Finding relevant content using nearby descriptors
We will take some liberties here to explain this: this is not how descriptors work, but you will get the idea. Let’s assume you want an AI model to know about all dog breeds, and you feed it a large training photo set of every known dog breed.
The model looks at the photos and the breed names a few million times (in perhaps an hour), decides what is most important to do a great job understanding dog breeds, and builds a descriptor (an n-dimensional vector) for each dog breed.
Let’s start with a single point floating in space to visualize n-dimensional vectors for dog breeds.
From that single point, there might be an arrow pointing straight up, and the length of that arrow is set by how pointy the ears of that specific dog are. Another arrow’s length, pointed in a slightly different direction, might be set by how long the dog’s tail is. Another arrow, pointing in another slightly different direction, could indicate whether the dog has long hair or short hair.
In math terms, each arrow points in a different direction and therefore lives in its own dimension. Since we can have as many dimensions as we need to characterize dog breeds, we can do a very comprehensive job categorizing or sorting dogs by breed.
You might imagine that a descriptor for a poodle (it’s n-dimensional vector) might look something like this:
While we can’t draw a picture of all the descriptors of all dog breeds in a multidimensional space, we can simplify it to a three-dimensional space. Here is what descriptors of various dog breeds might look like in a three-dimensional space:
Now, let’s say you were at the dog park this morning and saw a dog that kind of looked like a poodle but was somehow different. We could ask our AI model to find us the nearest neighbors to poodles:
Descriptors near the orange dot (our original poodle descriptor) represent standard Poodles, Toy Poodles, and Miniature Poodles. Further out, descriptors represent Labradoodles, Goldendoodles, Cockapoos, and Cavapoos. Descriptors even further out represent various other water dogs. It doesn’t get better than that.
Remember those difficult questions we posed at the beginning of this article?
Similarly, we could create another AI model where the green dots represent what people like us have purchased, or photos of the Grand Canyon, or musical choices.
Many previously impossible tasks can be done thanks to descriptors and neighbors.