Building a Hot Dog/Not Hot Dog Image Classifier in JavaScript
The Silicon Valley TV show joke comes to your browser, courtesy of MediaPipe and Brain.js
Here’s a question to consider: why is AI such a big deal right now? Yes, ChatGPT’s Web UI has put AI on the front page, but that alone shouldn’t be that exciting to software developers. There’s a serious limit to what you can build using prompt engineering and the OpenAI completions API.
Inversely, it’s not like there wasn’t any AI tooling in JavaScript before December 2022. Remember my wine tasting neural network blog post? Tools like brain.js have been around for years, if not decades.
The lesson here is that training a basic AI model isn’t that difficult. The hard parts are creating a reasonable data set and, for a sufficiently complex model, orchestrating the compute resources so you can train your model in a reasonable amount of time. The wine tasting neural network project took an afternoon; but it wouldn’t have been possible without a neatly formatted CSV of wine data.
ChatGPT and other AI APIs handle the compute resources for you, but they can also help you generate data sets. Embeddings, “ChatGPT’s Secret Weapon”, let you convert arbitrary text into a fixed length vector representing the semantics of the text. Get a bunch of embeddings, use them to train a neural network, and watch the magic happen.
And embeddings aren’t just for text. There are tools out there that generate embeddings for images. Which leads to the topic of this issue: building a neural network that can determine whether a given image is a picture of a hot dog. And doing all the work of training and running the neural network in the browser, no server required.
If you just want to skip to the code, here’s the GitHub repo and here’s the repo running on Netlify.
Using MediaPipe for Image Embeddings
MediaPipe is a new Google product that helps you run machine learning tasks on client devices. Including generating image embeddings directly from the browser. I learned how to use MediaPipe from this CodePen.
MediaPipe takes in an HTML image element, and outputs embeddings. Below is an example of how you can use MediaPipe to generate an embedding for an image.
Now you have the beginnings of a decent data set. Get a few pictures of hot dogs, a few pictures of not hot dogs, and train a neural network using brain.js on the data set. Here’s our 5 “hot dog” images, and our 5 “not hot dog” images. The 5 “not hot dog” photos are nothing special: there’s a picture of a burger, a pizza, a person, a car, and a house.
Yes, the training data set is only 10 images. This tool is certainly not an enterprise grade™ hot dog image classifier. But I think you’ll be surprised how well this neural network performs given the tiny training data set.
Below is the code that builds the training data. Assuming that your page has all the training data images loaded in HTML, this code will generate embeddings for every image in the `hot-dog` div and in the `not-hot-dog` div, and add them to a `trainingData` array
Training a Neural Network
A neural network is a common machine learning model. The particulars of how neural networks work is a bit too complex to fit in this tutorial. Suffice to say, a neural network is a model that takes in an array of numbers, and outputs another array of numbers. You train a neural network by giving it a set of example inputs and outputs, and the training algorithm adjusts the neural network to try to match the training data as closely as possible.
Brain.js is a neural network implementation in JavaScript. Brain.js 2.x now has browser support, so you can train a neural network in the browser too. Pull in brain.js using a script tag as follows.
Then, use `trainAsync()` to train a neural network on the training data from MediaPipe as follows.
Training the neural network is suprisingly fast. For me, it took only 9 training iterations to get to brain.js’ default error threshold of 0.005. But, still, for consistency, you can export the neural network as JSON and re-use it, so you don’t have to retrain your neural network every time the page loads.
Now, in order to classify whether an image is a picture of a hot dog or not, you need to use MediaPipe to generate embeddings for the user-specified image, and run those embeddings through the neural network using `net.run()`. `net.run()` will return a value between 0 and 1, where 0 means “this image is definitely not a hot dog” and 1 means “this image is definitely a hot dog”.
So now, you need to pull the image from the file input and display it in an img tag. Then, wait for the img tag to load, and pass that image through MediaPipe to get the embedding, and run the embedding through the neural network as follows.
And that’s it! Here’s the classifier app running on Netlify. Try it out. At the very least, the tool is capable of quickly identifying that a picture of a bunch of people bored in a meeting is not a picture of a hot dog.
Moving On
Back to the question of “why now?” with AI. Modern AI is not just about selling “47 ChatGPT Prompts to Make You More Productive” PDFs on LinkedIn.
The primary reason why this classifier was simple to build, fast to train, and decently good at its task is because of embeddings. And what makes embeddings so useful is that they represent an intermediary state of a much more sophisticated neural network.
So, instead of having to train a neural network for image processing from scratch, you can leverage an existing neural network that can pre-process very general input into an embedding, like MediaPipe or OpenAI, and build your own simple, highly specialized neural network on top of embeddings to solve your particular problem.
Most Recent Tutorials
What We’re Reading
What Is ChatGPT Doing … and Why Does It Work? Stephen Wolfram of Wolfram Mathematica fame explains the inner workings of ChatGPT in the style of an approachable college textbook. The description of embeddings as an intermediate state of a neural network came from this post.
Introducing Atlas Vector Search: Build Intelligent Applications with Semantic Search and AI Over Any Type of Data. MongoDB now supports vector search!
Vizzu - Library for animated data visualizations and data stories. Amazing looking lib for animated data visualizations. We still use ChartJS of course, but we’ll be tinkering with this one.