Here’s the way I have seen it work best in the past: ! [[ Pasted image 20230618141851.png ]] These models can start off very simple, but just tagging a User with term X and a piece of content with term X doesn’t work well for a few reasons:

  1. Usually, you have far more taxonomy terms than you want to expose to users. For example, Netflix has about 3000 very specific categories that it creates by assembling multiple individual tags together and then recommending to you based on viewing history. These categories are things like “Action & Adventure based on real life from the 1980s.” It’s not tenable to make an end user OR an author grapple with thousands of terms to figure out which precise one they’re interested in or apply to this document. The domain of technical skills is very broad and overlapping, and our content set is very large, that’s a recipe for bad [[ | ]]tagging. For reference, Netflix has those 3000 categories for about 6000 pieces of content. Their taxonomy is designed to make their content set seem larger and have fewer edges than it does in reality. We need to have a taxonomy that helps users sift through lots of similar content to find the right thing for them. Our ratio should be much lower than 1:2, but given how much more content we have, we should give up ideas of only having 100 (or whatever small number) of skills if we’re doing this for real.
  2. Direct associations between User=Term=Content means that the only kind of relationship recommendation we have available to us is a match: This person likes X, therefore we show them a thing about X. That prevents us from experimenting and iterating on how the recommendations work, because the relationships are very simple and direct. If instead, we say, This user is X. We hypothesize that people who are X will benefit from Y. This content is Y. We can see whether that’s true and improve upon it by changing the hypothesis rather than retagging everyone and everything.

Thinking about the space in terms of these kinds of models is also essential if we want to bring other data sources for better recommendations. For instance, it’s one thing if we define what a skill is and what how they should relate to each other. In the strategy doc and discussion of our goals in this space, the goal is to also ingest job information from LinkedIn. To do that, we need to explain to our systems what a job is and how skills relate to jobs, so it can ingest that and relate it back to content and people. By building these models instead of blunt-force tagging everything, we can make our strategy modular and composable.

Ideally, you invest in personalization because you want to see behavior change. (Otherwise, it’s not really worth how expensive and time consuming it is.) Separating the description of the user from the description of the resource means we can use different content to nudge the same user in a different direction, depending on the context. To build on the coupon example from the slide, offers can be designed to acquire a new customer, maximize an existing customer’s purchases, or retain a customer who may be at risk of no longer purchasing a product. Similarly, we are probably investing in this because we want to be able to give a user the right content to help them feel confident acquiring a new product, prompting them to keep pursing a certification instead of giving up, etc.