Search Needs Computational Linguistics to Solve Its Problems

The increased use of mobile devices means search must learn to answer questions posed in natural language. Research and tech development at Google on natural language processing is filtering into the search results. So SEOs need to step beyond the keyword into computational linguistics.

As users have become increasingly dependent on their digital devices, they expect to search on them using more natural language to shape the queries. Search is deeply embedded in the fabric of our lives, and we expect more from it than previously.

We spend hours on our mobile devices every day and have devices that rely on natural language processing in our homes to turn the television on or entertain us. Every search is a quest, and users are constantly looking for and expect answers.

The terrain and contours of most e-commerce quests are reasonably easy to interpret, and SEOs have carefully developed methods for identifying keywords and concepts that apply to the most important quests that buyers/searchers will undertake for the products on offer.

Does this extend far enough? Not hardly.

We must stay with our consumers and develop an understanding of the challenges of search and how they are being addressed by those who build and operate search technology.

What’s Going On?

Each day, Google processes billions of searches and has publicly noted that 15% of those queries were previously unseen. This means that Google has no history of what the most relevant pages are to deliver for the query. These queries represent unfamiliar terrain, and Google has built ways to navigate this space.

What Needs to Happen?

The increased use of mobile devices that encourage the use of natural language means search must learn to answer questions posed in natural language. Current research and technology development at Google on natural language processing is filtering into the search results. SEOs need to step beyond the keyword into — are you ready — the arcane science of computational linguistics.

Computational linguistics is an interdisciplinary field that studies language from a computational perspective. Computational linguists build statistical or rule-based models and approaches to linguistic problems, such as natural language and search. The huge computational power available today has opened the door for rapid advances in the last five years. It is time for SEOs to integrate some of these learnings into their SEO practice.

Improving Natural Language Search

In October 2019, Google announced that it would be launching worldwide the BERT algorithm. BERT, short for Bidirectional Encoder Representations from Transformers, is a neural network-based technique for natural language processing (NLP) pre-training. Training and tuning are very important steps in developing working search algorithms. (For more on the science, see this Google blog.)

Google expects this improved model to impact 10% of all searches. It will be particularly helpful for improving queries written or spoken in natural, conversational language.

Some savvy searchers search in keyword-ese, putting long strings of disconnected words together in their queries. By keyword stuffing their query, they hope to get better results.

My own research has shown that the most frequent queries are multiple nouns strung together with an occasional adjective for refinement — long (adjective) house (noun) coat (noun). This is a simple example, but queries that are questions are much more difficult to parse. BERT will go a long way toward eliminating the need to use keyword-ese.

BERT is not to be confused with Google’s improved AI-based system of neural matching that is used to understand how words and concepts relate to one another, a super-synonym system. Combine BERT with the other advances, and we can surely expect better quality results.

Search, as a Study, Is Not Static

SEOs need to learn as much as they can about these technologies. Although it seems — at first blush — that we cannot optimize for it, we can create better content that reacts better to the new technology, watch our performance metrics to see how much and if we are improving, and then make more changes as needed. Now, isn’t that optimizing?