Autotagging

We use a taxonomy management tool called PoolParty for content classification. It has an “extractor” module that processes text we give it and matches strings in that text with concepts in our taxonomy. For autotagging, we give it the title and body text of articles on the site, and it processes the text and returns the 1 or 2 products that score most highly. Overall, it has a very high level of success (our median agreement between a taxonomist tagging articles and PoolParty tagging articles is 90%), but it isn’t perfect and does make mistakes. These mistakes happen mostly with very generically named products like “Azure Files” which get applied when they’re not relevant, but an article mentions “Azure” and “files” repeatedly.