Automated Machine Learning in Stippled Images
There are many ways to attack a complex problem. A quick and dirty approach might yield something viable in a very short time frame but the solution might not look at all pretty. An empirical approach (say, making your decision of when to catch a bus today based upon when it actually visited your stop during the last three days) might require a bit more background work but can sometimes offer a more reliable, repeatable solution. A purely theoretical approach would be in stark contrast (say, trusting the bus schedule implicitly) and generally works best in an ideal world (also known to work well in Switzerland). For those of us living in the real world, the reason there are these and so many more approaches to solving problems is that no one method works best for all situations. Combining approaches is often a great strategy but here’s another idea often overlooked: could we come up with a solution that gets better and better with time?
For anyone who has lived in a colder climate and used public transportation (I have vivid memories of Chicago winters), you probably understand my analogy all too well. In winter, waiting at the bus stop for too long because you were too early or too late is not just bad, it hurts! Looking out the window for the bus then making a mad dash (the quick and dirty) is never pretty. Whatever the initial strategy, a person in a cold climate who regularly waits for a bus will continually tweak and adjust their solution for determining when to step out to the curb for their bus. Humans have a talent for this sort of thing. Unfortunately computers don’t inherently have such a talent but helping a computer to, on its own, learn over time and improve its solution for a complex problem is a possibility.
In a prior post we introduced the notion of needing to recognize when the same image appears on more than one website, even if it’s been resized or renamed or had its data format changed (e.g. jpeg, png, gif). To briefly recap: We need to recognize an image so that the metadata (which we deliver via Stipple dots) associated with that image can appear wherever that image appears. Our solution to this complex problem involves what we call image fingerprints. By comparing image fingerprints, we can very quickly determine whether two images with two different sizes from two different sites are in fact the same image.
We’re proud of our solution and of how very well it works. Still, there’s always the potential to improve. Better yet is that our algorithm for comparing image fingerprints is learning with each new image that it sees and it’s getting even better (smarter?) over time.
![]()
First Fingerprints taken 1859/60 by William James Herschel (1833-1917), public domain image hosted courtesy of wikipedia.
Imagine comparing two thumb prints from the same human, one taken right after the other. They’re never perfectly identical – one might be a little smudged in places – one might have had the thumb roll a bit further side-to-side – yet despite these irregularities there is a reliable science to matching them up which can be described as fuzzy matching. The same is true of our image fingerprints. In Stipple’s case, the especially fun twist is that we have taught our fuzzy matching algorithm to learn as it goes. With each new image it encounters, it is learning what combinations of details in our image fingerprints can more reliably indicate a match or non-match and which details are less indicative. Put another way: when we generate an image fingerprint from an image, it is always done in the same systematic way but the meaningful information that we can extract from those fingerprints is constantly growing. This is because our algorithm is learning new patterns hidden within the fingerprints via the power of machine learning. Additionally, the feedback from our learning algorithm has allowed us to develop new theoretical models of how our image fingerprints store “hidden” information. In this way, we are combining theoretical approaches with empirical approaches in our automated learning solution.
What else can we learn from these images as we go?
- This post was written by Davin Potts, Stipple’s Chief Science Officer.