History of product matching - part #4 (2020

As we already know, ever since 2014 Automatch was possible, but only in a limited set of cases. Industries like electronics and car parts which were standardized enough to enable automated product matching – but for industries like fashion and outdoors, any kind of automation seemed miles away.

The helping hand came from a brand-new direction: these are also the years of massive breakthroughs in Machine Learning algorithms and applications – what seemed to be impossible only a few years ago, became a new reality. Face recognition, processing handwriting, labeling scattered sets of data all slowly became industry standards.

Several E-commerce companies, as well as marketplaces and price comparison platforms recognized this big opportunity, and started working on AI product matching. Apart from highly skilled engineers, data scientists (mind you, in those days machine learning was very new, and very few engineers had skill sets needed).

Such machine learning projects required huge sets of learning data – but for most price monitoring and price comparison companies that was not a problem, as they already had big datasets of manually matched products, gathered in previous decade.

Walmart – back in 2017 this US retail giant has assembled a team to perform something that no one has done before – establish machine learning model applicable to product matching. Although the results were far from perfect, this research was a great foundation, both for future Walmart efforts and for other companies who wanted to benchmark their results. This post by David A. Abiola (Senior Data Engineer in Walmart)gives a comprehensive overview of different machine learning algorithms applied, and their results.
Price2Spy– the Serbian company joined the quest for machine learning solution to product matching problem in 2019. The method they used was, according to published documentation, very different to what Walmart did. Unfortunately, Price2Spy did not disclose the performance of their machine learning model, but the post by Misha Krunic (dated 2020) is worth studying, since Price2Spy claims that their product matching solution has been in production ever since
Shopee– in 2021 Marketplace leader from South-East Asia has decided to sponsor a Kaggle competition for product matching based on images. The total budget was 30K EUR, and the winner was a team which had product matching accuracy of 78%. Judging by the score, we doubt that the solution was put to production, though the solution offers great potential since it demonstrates that image-similarity alone is an important matching factor.

And how does one evaluate product matching solutions, exactly? 2 parameters hit the spot as most important ones

Product matching sensitivity – when it’s low, this means that a proper match has not been found. In the above listed solutions, product matching accuracy was ranging between 80% and 90%.
Product matching accuracy – we have already discussed how dangerous it is to match a wrong pair of products (for more dangerous than not matching them at all). Even when product matching accuracy gets to 95%, this still means that 5% of your matches are wrong (and you don’t have a way of knowing which 5% these are). The solutions mentioned above had accuracy between 90% and 95% – and this was recognized as the greatest downside of newly advised methods

Finally, the above experiments raised another important question – if and how machine learning models for product matching are generic? Can we build a single model for any industry / any language, or do we need to build specific models for each language / industry combination – one for English / electronics, another for German car parts, then Arabic baby equipment etc

As you can see the number of above combination quickly explodes (approximately 20 languages x 40 industries), there a universal – language-agnostic and industry-agnostic solution would be needed.

Fortunately, the technology has advanced, both in text processing (semantics, translation) and image processing – so the universal solution should be possible, though with a bit lower sensitivity / accuracy than for language/industry specific models. But I guess that’s a fair price to be paid. Our next and final post in our product matching history series will discuss the future solutions – where we think we belong 😊

<<< Part 3: Product Matching History (2016-2019)

>>> Final Part: Product Matching Future

History of product matching – part #4 (2020 – 2024): The age of Machine Learning and AI

0 thoughts on “History of product matching – part #4 (2020 – 2024): The age of Machine Learning and AI”

Leave a Reply Cancel reply