This Research in Brief summarizes various projects carried out by co-authors Yaniv Sheena and Oren Sar Shalom, along with their colleagues on the Relevance Foundations team at Meta.
In 2020, we launched Shops on Facebook and Instagram to make it easy for businesses to set up a digital storefront and sell online. Currently, Shops holds a massive inventory of products from different verticals and diverse sellers, where the data provided tend to be unstructured, multilingual, and in some cases missing crucial information.
Understanding these products’ core characteristics and encoding their relationships can help to unlock a variety of e-commerce experiences, whether that’s recommending similar or complementary products on the product page or diversifying shopping feeds to avoid showing the same product multiple times. To unlock these opportunities, we have established a team of researchers and engineers in Tel-Aviv with the goal of creating a product graph that accommodates different product relations. The team has already launched capabilities that are integrated in various products across Meta.
Our research is focused on capturing and embedding different notions of relationships between products. These methods are based on signals from the products’ content (text, image, etc.) as well as past user interactions (e.g., collaborative filtering).
First, we tackle the problem of product deduplication, where we cluster together duplicates or variants of the same product. Finding duplicates or near-duplicate products among billions of items is like finding a needle in a haystack. For instance, if a local store in Israel and a big brand in Australia sell the exact same shirt or variants of the same shirt (e.g., different colors), we cluster these products together. This is challenging at a scale of billions of products with different images (some of low quality), descriptions, and languages.
Next, we introduce Frequently Bought Together (FBT), an approach for product recommendation based on products people tend to jointly buy or interact with.
We developed a clustering platform that clusters similar items in real time. For every new item listed in the Shops catalog, our algorithm assigns either an existing cluster or a new cluster.
This process takes the following steps:
We specify two types of clustering spaces, based on business objectives:
For each clustering type, we train a model tailored for the specific task. The model is based on gradient boosted decision trees (GBDT) with a binary loss, and uses both dense and sparse features. Among the features, we use GrokNet embedding cosine distance (image distance), LASER embedding distance (cross-language textual representation), textual features like the Jaccard index, and a tree-based distance between products’ taxonomies. This allows us to capture both visual and textual similarities, while also leveraging signals like brand and category. Furthermore, we also experimented with SparseNN model, a deep model originally developed at Meta for personalization. It is designed to combine dense and sparse features to jointly train a network end to end by learning semantic representations for the sparse features. However, this model did not outperform the GBDT model, which is much lighter in terms of training time and resources.
Our models require training data sets for both clustering tasks: We send pairs of products to human raters to compose sets for training, validation, and evaluation. In addition, to obtain more relevant pairs with hard negatives, we utilize an active learning approach based on our existing retrieval mechanisms, followed by sampling by uncertainty and density (SUD).
To evaluate our approach, we formed a set consisting of ~100K pairs of products from the verticals Clothing & Accessories, Health & Beauty, and Home. Each pair was annotated by humans who marked whether the two products were different, exact duplications, or variants. We then measure precision and recall by inferring whether the products would reside in the same cluster, based on the above steps. Final results are pivoted by verticals, which tend to have different traits.
Pairwise similarity models performance: GBDT vs SparseNN
Clustering system-level performance by vertical
Since grouping together different products may cause unsatisfactory user experience, we tuned our models to be precision-oriented. Results suggest that we could solve a large portion of the problem but we still need to focus on improving recall. Further, we found that health & beauty products were more challenging and required better text understanding.
Analysis of past purchases shows that customers often look for multiple items in a short period of time, such that together they have a synergistic utility. A notable example is a pair of jeans, together with a belt and possibly a matching shirt. When a customer is currently viewing a certain product (dubbed seed product), our task is to help them find complementary products.
Arguably, the most standard method to find products that go together is to simply count co-purchases. That is, we observe the (normalized) number of customers who purchased the seed items and, shortly afterward, another candidate product. If this amount exceeds some threshold, we say that the candidate product makes a good FBT recommendation for the seed product. However, with the ever-increasing variety of products available on Shops on Facebook and Instagram, there is always an abundance of new products that haven’t been purchased in large numbers. Reducing the recommendation threshold results in an overwhelming amount of noise — and, in particular, substitute items tangled with complementary ones.
To remedy this, we apply a two-step solution. First, we work on the category level (rather on product level) to identify pairs of categories that go together. This aggregation solves the problem of purchase sparsity, and its output was further verified by expert taxonomists. Then it then allows us to resort to a simple count-based approach, setting a low threshold but considering only pairs that belong to categories that go together.
Yet, even with a low threshold, there are many products that aren’t covered by this method. To increase coverage, we apply the following steps:
As a training set for this model, we need a list of products that go together. To this end, we go over fashion images and extract the appeared products, assuming that products that appear in the same image make a good FBT recommendation.
To assess the performance of our approach, we conducted an experiment (A/B test) where we suggested a set of complementary items to buyers who considered a product (product page). We compared our approach with a baseline (control) consisting of suggestions that were hand-picked by sellers. FBT recommendation led to a 12 percent relative improvement in click-through rate, which proves the viability and effectiveness of that approach.
Our methods to incorporate product similarities have improved various consumer-facing applications in Shops. First, we launched clustering-based post ranking logic, which diversifies product search results. We also showed that similarities based on intentful user actions led to better recommendation compared to suggestions chosen by sellers. Finally, we constantly collaborate with different teams across Shops to leverage our signals and improve relevance. Through intensive A/B testing, we learned that capturing relationships between products is a significant step in unlocking better user experiences.
We’re currently developing a holistic model that considers simultaneously behavioral data like co-views, co-purchases (distinct users who are viewing or buying the same product), and the preferences of the users who interacted with each item, together with product information like image, textual description, price, and brand. These two types of modalities, buyer engagement and product information, are learned in a mutual reinforcement manner where one type of modality acts as the label for the other type. Concretely, given a seed product, the behavioral modality allows us to find two products such that one of them makes a better recommendation than the other, thereby allowing the side information to be learned using triplet loss. Likewise, the side information modality generates triplets that allow to improve the behavioral features.