How Google Search (Probably) Uses Machine Learning
November 14, 2019
Blog
The most important changes to Google have been the updates to crawling, indexing and ranking, which is how the site delivers high-quality search results when web users type in queries.
Google is 21-years-old, and it’s safe to say that the web’s biggest and best search engine looks nothing like it did back in 1998 — even the iconic logo has a different font and color scheme. However, the most important changes to Google have been the updates to crawling, indexing and ranking, which is how the site delivers high-quality search results when web users type in queries.
Since so many updates occur every year, it is difficult to choose just one update that rises above the rest — unless you count the update on Oct. 26, 2015, when machine learning RankBrain popped into existence.
Google has always looked ahead to take advantage of cutting-edge technologies in its myriad web services, including search, so it should hardly be surprising that it was among the first web companies to integrate machine learning into its algorithms. RankBrain is now a key component of Google’s core algorithm, and it assists with analyzing data from crawling and page indexing to ensure that search results are in the best possible order for each query-maker.
A Primer on Machine Learning
Machine learning is a complex concept that many people pair with Artificial Intelligence. In truth, machine learning is an aspect of AI, but it isn’t by any means a method for machines to gain anything close to sentience. Instead, machine learning is a new, improved way to analyze data by automating model building. In other words, it allows a program to gain new information without requiring programmers to alter the code themselves or supervise the program at every step.
There are many variations of machine learning, and more machine learning tools emerge every day. Today, nearly all major web services take advantage of machine learning, but as a web pioneer, Google was the first and has pushed the boundaries of the art and science.
A Primer on Crawling, Indexing, and Ranking
The search process begins with crawling, a practice that entails discovering new and updated content around the web. Google’s crawling programs, commonly called spiders, follow links and review pages to identify new information worthy of offering in search results. Crawling requires substantial resources, so Google typically assigns a crawl budget to each website, limiting how frequently and thoroughly the site is revisited by spiders. The budget is determined by speed and importance; a site on a fast server with thousands or millions of backlinks will receive more attention from Google than a site that loads slowly and only has a dozen or so links.
Once a website is crawled, it is indexed, meaning its content is analyzed and catalogued to better understand what each page is about. In the past, website owners were expected to register their websites with Google or another listing service before they would be indexed, but these days, Google’s spiders are fast and smart enough to find new content relatively soon after it is published. Still, there are ways to improve page indexing, such as using text to provide content context and submitting an XML sitemap through Google’s Search Console.
Finally, after indexing, Google serves search results, ranking returned webpages by their relevancy to the search query. This is where RankBrain applies its machine learning power.
RankBrain
RankBrain is Google’s best-known machine learning tool, and it is used to help Google better understand the connections between different concepts and entities to ensure that Google’s users see the best possible search results. In the beginning, RankBrain was given a fundamental understanding of entities, which is a concept that is singular, well-defined and distinguishable — like movie names or dates. Then, the program was tasked with training itself to recognize unknown entities on the web as well as training itself to understand relationships between entities and search requests, so it could scour Google’s index for the best results.
A good example of how RankBrain uses machine learning is how the program learns synonyms. Humans intuitively know when two words are synonymous and when they are not, but RankBrain is consistently learning the nuances of synonyms to improve search. For instance, “fix” and “replace” are synonyms when discussing a part of a machine but not when discussing the entire machine — “How to replace laptop screen” doesn’t mean the same thing as “How to replace car,” and RankBrain knows this. Thus, Google users who make these different searches will receive different types of results, one for DIY fixing and one with steps for selling and shopping.
What RankBrain Means
The beauty of RankBrain, and machine learning in general, is that the algorithm is improving search results every hour of every day. That is especially important for the average web user, who wants nothing more or less than exactly what they are looking for. However, RankBrain is also useful for website owners and creators, who can relax on keywords and metadata and focus more on producing high-quality, valuable content that users are searching for.
RankBrain was Google’s first foray into using machine learning for search, and it certainly won’t be the last. Google has positioned itself as a machine-learning-first company. In the myriad changes to Google coming in the next decade, more machine learning tools are sure to be plentiful.