More than 3 years have passed since last update.

目的

推薦システムに関する論文は大量にありますが、企業の開発者が実際に気にすることとしては「それ使えるの？」ということでしょう。もちろん、あるアプリケーションで適用できるからといって、他のアプリケーションに適用できるとは限りませんが、実例は大変参考になります。

実例

Youtube

Google Play

Wide & Deep Learning for Recommender Systems

An update on Pixie, Pinterest’s recommendation system
- グラフ構造を作成し、そこでランダムウォークを実施して、最も訪問回数を多いものをレコメンドしていく
- Pinterest object graph (the graph between Pins and boards)
  - how those Pins are organized based on the context people add as they save and the Pinner’s interests.
- Pixie then finds the Pins most relevant to the user by applying a random walk algorithm for 100,000 steps.
  - At each step, it selects a random neighbor and visits the node, incrementing node visit counts as it visits more random neighbors.
  - We also have a probability Alpha, set at 0.5, to restart at node Q so our walks do not stray too far.
  - We continue randomly sampling the neighboring boards and nodes for 100,000 steps.
  - Once the random walks are complete, we know the nodes which have been visited most frequently are the ones most closely related to the query node.
- Optimizing Pixie
  - Early Stopping
    - we keep walking until the rank 1,000 candidate gets at least 20 visits.
  - Graph Prunning
    - we can remove some of those edges to make Pixie suit our needs

Amazon

Two Decades of Recommender Systems at Amazon.com
- 色々と工夫が必要となりそうだが、item-based collaborative filteringは強力であるとのこと
- 途中 E_xyやN_xyが出てくるが、これは類似度計算を計算するために使っている。最終的にアイテム間の類似度行列を使ってアイテムのスコアを出すのがitem-based collaborative filteringということなのだろうか（類似度行列の計算方法は規定されない）。
- We found that, given enough data and a robust metric for the relatedness of items,
  ompatibility can emerge from people’s behavior, with the false signals failing away and the truly appropriate items surfacing.
- Relatedness scores need to strike a balance between popularity on one end and the power law distribution of unpopular items on the other. The chi-square score, [ N_XY − E_XY] / √E_XY , is an example that strikes such a balance.
- (the item-based collaborative filtering) (略) has been adapted to improve diversity and discovery, recency, time-sensitive or sequential items, and many other problems.
- Because of its simplicity, scalability, explainability, adaptability, and relatively high-quality recommendations, item-based collaborative filtering remains one of the most popular recommendation algorithms today.
Hierarchical Temporal-Contextual Recommenders

Instagram

Powered by AI: Instagram’s Explore recommender system
- word2vecを活用したアカウントのembeddingとuserのアクティビティ（いいね、保存etc）からuserを大雑把にまず絞り込む。絞り込まれたuserの投稿について、蒸留を用いたモデルを含めた3つのランキングモデルを活用し、[w_like * P(Like) + w_save * P(Save) - w_negative_action * P(Negative Action)]の値に基づいてランキングする。
- Account embeddings for personalized ranking inventory
  - As a result, content-based models have difficulty grasping such a variety of interest-based communities.
  - By applying the same techniques from word2vec, we can predict accounts with which a person is likely to interact in a given session within the Instagram app. If an individual interacts with a sequence of accounts in the same session, it’s more likely to be topically coherent compared with a random sequence of accounts from the diverse range of Instagram accounts.
  - Retrieving accounts that are similar to those that a particular person previously expressed interest in helps us narrow down to a smaller, personalized ranking inventory for each person in a simple yet effective way.
- Preselecting relevant candidates by using model distillation
  - 蒸留したモデルを使って、より複雑なモデルで評価するmediaを絞り込む
- How we built Explore
  - Exploreをthe candidate generation stageとthe ranking stageに分ける
- Candidate Generation
  - Then, we use account embeddings techniques to identify accounts similar to the seed accounts. Finally, based on these accounts, we’re able to find the media that these accounts posted or engaged with.
  - In addition to blocking likely policy-violating content and misinformation, we leverage ML systems that help detect and filter content like spam.
  - Then, for every ranking request, we identify thousands of eligible media for an average person, sample 500 candidates from the eligible inventory, and then send the candidates downstream to the ranking stage.
- Ranking candidates
  1. First pass: the distillation model mimics the combination of the other two stages, with minimal features; picks the 150 highest-quality and most relevant candidates out of 500.
  2. Second pass: a lightweight neural network model with full set of dense features; picks the 50 highest-quality and most relevant candidates.
  3. Final pass: a deep neural network model with full set of dense and sparse features. Picks the 25 highest-quality and most relevant candidates (for the first page of Explore grid).
  - We downrank posts from the same author or same seed account by adding a penalty factor, so you don’t see multiple posts from the same person or the same seed account in Explore.
  - We rank the most relevant content based on the final value model score of each ranking candidate in a descendant way.

Netflix

Netflix Technology Blog
Deep Dive into Netflix’s Recommender System
Netflix Recommendations: Beyond the 5 stars (Part 1)
Netflix Recommendations: Beyond the 5 stars (Part 2)
Learning a Personalized Homepage
- Currently, the Netflix homepage on most devices is structured with videos (movies and TV shows) organized into thematically coherent rows presented in a two-dimensional layout.
- Why Rows Anyway?
  - This allows members to either dive deeper and look for more videos in the theme or to skip them and look at another row.
  - Thus, each row can offer a unique and personalized slice of the catalog for a member to navigate.
- Page Generation Flow
  1. we start by finding candidate groupings that are likely relevant for a member based on the information we know about them.
  2. This also involves coming up with the evidence (or explanations) to support the presentation of a row, for example the movies that the member has previously watched in a genre.
  3. we filter each group to handle concerns like maturity rating or to remove some previously watched videos.
  4. we rank the videos in each group according to a row-appropriate ranking algorithm, which produces an ordering of videos such that the most relevant videos for the member in a group are at the front of the row.
  5. From this set of row candidates we can then apply a row selection algorithm to assemble the full page.
  6. we do additional filtering like deduplication to remove repeat videos and format rows to the appropriate size for the device.
- Page-level algorithmic challenge
  - we need to balance several factors that often compete for precious screen real estate.
    - accurate
      - We want our recommendations to be accurate in that they are relevant to the tastes of our members
    - diverse
      - they also need to be diverse so that we can address the spectrum of a member’s interests versus only focusing on one.
    - depth vs breadth
      - We want to be able to highlight the depth in the catalog we have in those interests and also the breadth we have across other areas to help our members explore and even find new interests.
    - fresh
      - We want our recommendations to be fresh and responsive to the actions a member takes, such as watching a show, adding to their list, or rating
    - stability
      - we also want some stability so that people are familiar with their homepage and can easily find videos they’ve been recommended in the recent past.
    - task
      - we need to be able to place task-oriented rows, such as “My List,” in amongst the more discovery-oriented rows.
  - However, by presenting a two-dimensional navigation layout, a member can scroll vertically to easily skip over entire groups of content that may not match their current intent and then find a more relevant set, which they can then scroll horizontally to see more recommendations in that set
  - This allows for coherent, meaningful individual rows to be selected while maintaining the diversity of the videos shown on the whole page, and thus lets the member have both relevance and diversity.
- Building a page algorithmically
  - rule-based approach
    - Here a set of rules define a template that dictates for all members what types of rows can go in certain positions on the page.
    - it ignored many aspects we consider important for the quality of the page, such as the quality of the videos in the row, the amount of diversity on the page, the affinity of members for specific kinds of rows, and the quality of the evidence we can surface for each video.
    - It also made it hard to add new types of rows, because for a new row to succeed it would need to not only contain a relevant set of videos in a good order but also be placed appropriately in the template.
  - row-ranking approach
    - we could leverage a lot of existing recommendation or learning-to-rank approaches by developing a scoring function for rows, applying it to all the candidate rows independently, sorting by that function, and then picking the top ones to fill the page.
    - However, doing this would lack any notion of diversity, so someone could easily get a page full of slight variations of their interests, such as many rows each with different variants of comedies: late-night, family, romantic, action, etc.
  - stage-wise approach
    - a scoring function that considers both a row as well as its relationship to both the previous rows and the previous videos already chosen for the page.
    - In this case, one can take a simple greedy approach and pick the row that maximizes this function as the next row to use and then re-score all the rows for the next position taking that selection into account.
    - increased computational cost
  - page-wise approach
    - defining a full-page scoring function, we can try to optimize it by choosing rows and videos appropriately to fill the page.
    - Since a page layout is defined in a discrete space, directly optimizing a function that defines the quality of the whole page is a computationally prohibitive integer programming problem.
- Machine Learning for page generation
  - While we could use heuristics or intuition for building such a scoring function and tune it using A/B testing, we prefer to learn a good function from the data so that we can easily incorporate new data sources and balance the various different aspects of a homepage.
  - To do this, we can use a machine learning approach to create the scoring function by training it using historical information of which homepages we have created for our members, what they actually see, how they interact, and what they play.
  - There is a large set of features that we could potentially use to represent a row for our learning algorithms.
    - we can use any features of those videos in the row representation, either by aggregating across the row or indexing them by position.
    - we have many different recommendation approaches, so we can include them as different features to learn an ensemble of them at the page level.
    - Diversity can also be additionally incorporated into the scoring model when considering the features of a row compared to the rest of the page by looking at how similar the row is to the rest of the rows or the videos in the row to the videos on the rest of the page.
  - presentation bias
    - a member can only play from a row on the homepage that we’ve chosen to display, which can have a huge impact on the training data.
  - position biases
    - the position of a row on the page can greatly affect whether a member actually sees the row and then chooses to play from it.
- Page-level metrics
  - Of fundamental importance in page generation is how to evaluate the quality of the pages produced by a specific algorithm during offline experimentation.
  - With such page-level metrics defined, we can use them to evaluate changes in any of the algorithmic approaches used to generate the page, not just the algorithms for ordering the rows, but also the selection, filtering, and ranking algorithms, or any of the input data that they use.
Personalized Page Generation for Browsing Recommendations
- row内とrow同士の二つのランキングでページが構成される
- Approach to Recommendation(row内)
  - Top Picks
  - Personalized Genres
    - How are they generated?
      - Implicit
        
        Based on recent plays, ratings and other interaction
      - Explicit
        
        Taste preference
  - Similarity
    - Recommend Videos similar to one you've liked
    - 協調フィルタリング
  - rating予測よりもlearning to rankの方が性能が良い
- Balancing a Personalized Page
  - Accurate vs Diverse
  - Discovery vs Continuation
  - Depth vs Coverage
  - Freshness vs Stability
  - Recommendations vs Tasks
- Personalized Page Generation
  - Rule-based
  - Row-independent
  - Stage-wise
  - Page-wise
- obey Constrains
  - Device
    - Screen Size
    - Processing limitation
  - Certain rows may be required
    - Continue Watching
    - My List

Spotify

Music recommendation at Spotify
- Areas of research expertise
  - Machine learning
  - Information retrieval
  - Evaluation
  - Language technologies
  - Content analysis
  - Algorithmic bias
  - Human computer interaction
  - Recommender systems
  - User modeling
Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits

Gunosy

Wantedly

Wantedly Visitの機械学習を使ったレコメンデーション

Mercari

ナレッジグラフを使った解釈可能な推薦システム

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

推薦システム実例まとめ

目的

実例

Youtube

Google Play

Pinterest

Amazon

Instagram

Netflix

Spotify

Gunosy

Wantedly

Mercari