The complexity and accuracy of today’s search engine algorithms are a tribute to the science of information retrieval. The size of the search engine databases and the speed with which they return results are engineering marvels. In the midst of all of the incredible engineering science, there are still huge challenges. “Thin content” and the malicious propagation of false information and fake news threaten the very basis of the search ecosystem. Weeding out “thin content” and false information is a very difficult task.
Google’s Panda and more recent Fred updates have tried to address the problem of “thin content.” Defined as pages that have little or no value, thin content pages are typically:
- Automatically generated content
- Thin affiliate pages
- Content scraped from other sources. For example: Scraped content or low-quality guest blog posts
- Doorway pages
Additional examples of “thin content” are e-commerce product pages with little or no descriptive content or a few lines that could be found on any site vending the product. All of these produce a less-than-satisfactory user experience; however, they do not spread offensive or disturbing false information. The Google updates directed at algorithmically rooting out thin content have been somewhat successful in and attacking the “thin content” problem.
“Thin content” is just the pointy tip of another huge iceberg. The development of “fake news” and the willful propagation of false information into search engines is a threat to the whole search marketing ecosystem. Bolt onto this that some of the information is not just false, but it is offensive (contains racial and ethnic slurs) and often disturbing (advocates violence or provides how-to information on bomb-making and such).
How Does This Threaten the Ecosystem?
Research studies have shown that naïve consumers often equate those sites/pages that appear at the top of the search results as being authoritative, the best vendor or the correct answer to their query. When the page shown as the top result promulgates a fake story or contains offensive material, the searcher is in a quandary. Is the information accurate, or is the search engine not to be trusted? Huge volumes of false information can crowd out quality, accurate information. When advertisements run on the pages showing disturbing or offensive information, advertisers pull back on their advertisements, threatening to choke off the engine’s economic lifeblood — its advertising revenues.
Send in the Troops
Google has for several years maintained a human task force, some 10,000 contract employees strong, whose members review and rate sites according to a very specific 160-page set of guidelines. (Opens as a PDF)
These guidelines provide interesting reading for those who want to understand how content and pages are viewed. Raters must pass a rigorous test that ensures that they understand the 160 pages of guidelines. Once on-task, reviewing results of actual searches, the raters do not directly influence specific site results, but rather provide data that can feed into the ongoing algorithm development.
Google has used its army of raters for several years. Over time, the guidelines have changed. For example, when Google placed increased emphasis on ensuring the quality of pages with impact on your money or your life (YMYL), the guidelines were enhanced for how to evaluate this type of page. With the growth of mobile, the guidelines included information on reviewing mobile content. Therefore, it should come as no surprise that Google has given specific guidelines for how to judge and evaluate offensive or disturbing content. It is a fine line that must be tread between providing accurate general interest information on disturbing topics such as genocide, the Holocaust, human trafficking and providing malicious or dubious information. The results of these human raters will ultimately be used to train the search engine to present quality information. The future is in their human hands, not in the magic of the algorithm, for they are its guidance system.