Decoding the future: machine learning vs deep learning
March 22, 2024
|When people talk about AI, they use a puzzling collection of terms. You may have heard about deep learning as a subset of something called machine learning.
What is so deep about deep learning?
Also, how do all these different flavors of AI use data to learn?
A history of machine learning
Pretend you are a data scientist about 20 years ago. You look around for problems to solve with the existing technologies, and those problems generally had a finite set of possible answers. Some examples of problems you could solve included:
- Loan qualification: Understanding whether someone is qualified for a loan they have applied for. This would involve training a machine learning model on decades of history of good and bad loans and feeding in someone’s application. The model would output either yes (make the loan) or no (decline the loan).
- SPAM email identification: Detecting if an email is spam by using data to train a spam detection model. The model would output either yes (the email is SPAM) or no (the email is legitimate).
Both examples are called classifiers because they output one of a finite set of classifications.
Other kinds of models can be trained to answer questions like what someone’s house is worth based on its location, size, and age. The output is a single number (and we have lots of records of comparable houses and what they sold for).
These are all examples of useful machine learning applications, with decent sizes of training data available, and these applications are just limited by the magnitude of the problem we are trying to solve.
Machine learning evolves
Let’s move forward a decade or so, and our ambitions have grown: we want to tackle much bigger problems using machine learning! As a data scientist, you must figure out what it will take to solve this new set of problems.
Turns out, it takes three things:
1. Deep learning
Previously, we got our feet wet using neural networks: arrays of virtual (because they are written in software) nodes that recognize patterns in data and learn from them how to classify different things. Machine learning problems that were tackled back then used shallow models that didn’t have many nodes.
To solve today’s gigantic problems that sometimes can have infinite answers, data scientists realized it was going to take many layers of many of these nodes for models to do the job. That is where the term deep comes from. Deep learning refers to the depth of the nodes in the models we are going to have to create.
Today, we are in the world of deep learning.
But what we need for deep learning to work does not stop with building deep models!
2. Big Data
Any flavor of machine learning needs data. After all, how can there be learning without data to learn from? Previously in the machine learning world, we needed to train our models on data, but now that we are tackling much deeper problems, it follows we need much more data. Here are two examples of ambitious problems to data scientists are now trying to solve:
- Object identification in photography: For a model to identify all the objects in a photo, it must know about all possible objects that might be in the photo.
- Translation between languages: To translate from English to Italian, a model would have to know the full vocabulary in both languages, how their grammars work, and what the idioms of each language are (telling a friend to “break a leg” before a performance might not translate well into other languages).
These models would require a massive data training set to work well (bigger than all the entries in Wikipedia)! Fortunately, we live in the time of big data.
3. Graphics processing units (GPUs)
Last, but not least, to train a model with these enormous sets of data, we need massive computing resources. Traditionally, computers are powered by a CPU (central processing unit). CPUs make computers run fast, but they mostly work on one task at a time and are optimized to handle numbers.
Most deep learning models work with things called vectors, which you can think of as arrows of a specific length pointing in a specific direction. You can add, subtract, and multiply vectors, but doing it on a CPU is incredibly slow.
It could take dozens of years to train a language translator using CPUs. That just isn’t viable.
Luckily, starting about 20 years ago, a market evolved for passionate computer gamers. These were people who demanded ever-improving graphics for their games. They are great for 2D and 3D graphics, and they scale to meet different display sizes well. But a different kind of chip was needed for the unique task for vector computation.
GPUs (graphics processing units) were created and are terrific at vector manipulation. Not just that, but vector operations lend themselves well to parallelization: two GPUs can work together to get a job done twice as fast. GPUs can handle vector operations a thousand times faster than CPUs, and you can run hundreds or thousands of GPUs in parallel.
Powered by GPUs, a language translator can be trained in a reasonable amount of time.
The success of AI today
After decades of disappointment and unfulfilled promises (after all, AI research started in 1956), AI has finally hit the big time. We now have immense stores of data (much of it publicly accessible), clever data scientists, and GPUs for vector manipulation.
Just imagine: many problems that humans have struggled with for centuries might be solved with the help of AI.