Autotagging

Created on June 16, 2023 Last updated on January 6, 2026

Permanent Note

Maturity: done

We use a taxonomy management tool called PoolParty for content classification. It has an “extractor” module that processes text we give it and matches strings in that text with concepts in our taxonomy. For autotagging, we give it the title and body text of articles on the site, and it processes the text and returns the 1 or 2 products that score most highly. Overall, it has a very high level of success (our median agreement between a taxonomist tagging articles and PoolParty tagging articles is 90%), but it isn’t perfect and does make mistakes. These mistakes happen mostly with very generically named products like “Azure Files” which get applied when they’re not relevant, but an article mentions “Azure” and “files” repeatedly.

Notes mentioning this note

Metadata

Generally defined as “data about data.” Metadata can be human-applied data, such as a title, description, content tags; or machine-applied,...

Content classification

The process of systematically describing, organizing, and providing access to an information object according to established criteria.