[Confirmed] Google's Massive Data Leak, AI Overview Improvements & Algorithm Recovery Advise
SEO TL;DR #26 - 03/06/2024
[Confirmed] Google’s Biggest Data Leak
Rand Fishkin and Mike King unveiled a significant data leak revealing Google Search's internal ranking features and signals.
While Google has confirmed the leak, they caution against:
Making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information.
While this could be the biggest SEO story of all time, the advice coming off the back of it is mostly unchanged. Many of the ranking features in this leak have already been theorised, tested and put into practice.
Below are the most significant themes that have come out of the leak and how to use the information to get ahead.
NavBoost
NavBoost is a re-ranking system used by Google Search that adjusts the ranking of search results based on user click behaviour (something Google has openly denied it does). NavBoost operates by analysing click logs to determine which search results users are likelier to click on and then re-ranks the search results accordingly. Essentially, if your result is being clicked on more times than the result above, Google will reward that.
NavBoost scores queries based on user intent, meaning it evaluates how well a search result matches the user's underlying intent behind their query. This involves analysing the language and context of the query to predict what the user is looking for, beyond just the keywords used.
💡 Takeaway: Understanding and aligning your content with user intent is critical for SEO. For example, if video content performs well in your vertical, producing anything else will likely receive less engagement in the SERPs.
Conducting thorough keyword intent analysis can help you design content briefs that directly address what users are looking for.
Focus on creating engaging content that keeps users on the page longer. NavBoost may interpret this as a sign of high relevance and satisfaction, which can boost rankings.
Cookie History
Google utilises cookie history, logged-in Chrome data, and pattern detection (referred to in the leak as “unsquashed” clicks versus “squashed” clicks) as effective means for fighting manual & automated click spam.
Squashed clicks are click signals that have undergone a process called "squashing," which likely involves normalising or adjusting the data to mitigate the impact of anomalies. This helps provide a more accurate representation of user interaction by reducing the influence of exceptionally high or low click rates.
Unsquashed clicks are raw click data that have not been processed. They represent the actual number of clicks a search result receives without any adjustments.
The leak emphasises that Google's use of these click signals, including squashed and unsquashed clicks, plays a crucial role in the ranking algorithms, particularly within the NavBoost system.
💡 Takeaway: NavBoost backs the best practices SEOs have, or should have, been doing for years - emphasising the importance of user engagement with search results. Here are a few ways you can leverage the information:
Improve Click-Through Rates (CTR): Focus on creating compelling titles and meta descriptions to increase the likelihood of users clicking on your search results. High CTR signals that your content is relevant and engaging.
Optimise for User Experience: Ensure your content meets user expectations and needs. A positive user experience, indicated by longer time spent on the page and lower bounce rates, can improve rankings.
A/B Testing: Experiment with different headlines, descriptions, and page layouts to determine which variations yield higher engagement. Continuous optimisation based on user behaviour data can enhance performance in search rankings.
Quality Content: Consistently produce high-quality, relevant content that answers users' queries effectively.
Post-Click Behavior
Google examines clicks and engagement on searches both during and after the main query (referred to as a “NavBoost query”). For example
Someone searches for “Elon Musk” but doesn’t see the Tesla website
They immediately change their query to “Tesla” and click through to tesla.com
tesla.com (and websites mentioning Tesla) will receive a boost in the search results for the keyword “Elon Musk”.
Site Authority
Something else Google has denied it uses, but is the reason tools such as Ahrefs, Semrush and Moz use metrics like Domain Authority to rank and compare websites.
Again, just because site authority is mentioned in the leak doesn’t mean this is an active factor, or if it is, it’s impossible to tell how much weight it carries. Still, there is no denying that websites that gain links from other relevant, trustworthy sites typically do better than sites with few links.
Panda
The Panda algorithm, developed under the direction of Amit Singhal, is designed to evaluate site quality based on user behaviour and external links.
Scoring modifier: Panda primarily functions as a scoring modifier that can be applied at the domain, subdomain, or subdirectory level. This modifier is based on distributed signals related to user behaviour and external links.
Modification factor: The system generates a modification factor (M) for groups of resources. This factor is calculated using a ratio of the number of independent links (IL) to the number of reference queries (RQ).
Panda focuses on driving successful clicks using broader queries and earning more link diversity. Quality content that attracts diverse links and queries will help maintain or improve rankings.
💡 Takeaway: Focus on building your site’s Domain Authority. Create high-quality, informative content that people would naturally link to, AKA organic backlinks. Google says you should ask this of your content when assessing its helpfulness, which could hint at some truth to this:
Is this the sort of page you'd want to bookmark, share with a friend, or recommend?
Consistently producing relevant content on your site helps maintain and grow your domain's perceived authority.
Prioritise acquiring high-quality backlinks from authoritative sites rather than focusing solely on the number of links.
Other factors
Minor penalties for domain names that exactly match unbranded search queries; e.g. mens-luxury-watches[.com] or seo-agency-newcastle[.co.uk]. This newer “BabyPanda” score, and spam signals are also considered during the quality evaluation process.
NavBoost geo-fences click data, considering country and state/region levels and mobile versus desktop usage. However, if Google lacks data for specific regions or user agents, they may apply the process universally to the query results.
During the Covid-19 pandemic, Google employed whitelists for websites that could appear high in the results for Covid-related searches
Similarly, during democratic elections, Google employed whitelists for sites that should be shown (or demoted) for election-related information
For far more in-depth coverage, read Mike King’s article: Secrets from the Algorithm.
AI
Google’s AI Overviews: here to stay
Google's Head of Search, Liz Reid, reassured everyone that AI Overviews are here to stay. AI-generated summaries have been met with mixed reactions, with some weird and wonderful examples, prompting Google to update its help documentation and make adjustments.
Reid addressed these issues, acknowledging that while AI Overviews mostly work well, they have encountered some problems. These rare instances occur in less than one in every 7 million queries, and there are many fake screenshots.
Barry Schwartz summarised what she said over at Search Engine Roundtable:
Searchers like the AI Overviews and are engaging more with them and the publishers referenced in them.
AI Overviews work very differently than chatbots and other LLM products.
AI Overviews are integrated into core search and only show information backed by top web results.
AI Overviews generally don't “hallucinate” or make things up in the ways that other LLM products might.
When AI Overviews get it wrong, it’s usually for other reasons: misinterpreting queries, misinterpreting a nuance of language on the web, or not having much great information available.
AI Overview accuracy rate is as good as featured snippets
There can be "data voids" and "information gaps" where Google might cite pages it should not, like satire documents (like in the case of "How many rocks should I eat?"
The “Ray Update”
The “Ray Update” by Mike King on X refers to the improvements made to AI Overviews, influenced mainly by Lily Ray’s critiques. Ray highlighted numerous flaws, pushing Google to enhance the reliability of these summaries by doing the following:
Google won't individually fix each AI Overview that goes bad; it updates its models to improve what went wrong, so it works for other queries too.
Google built a better detection mechanism for nonsensical queries that shouldn’t show an AI Overview and limited the inclusion of satire and humour content.
Google said it already has strong guardrails in place for topics like news and health. For example, Google said it aims to not show AI Overviews for hard news topics, where freshness and factuality are essential.
Content SEO
Advise on recovering from a core update
Google Search Analyst John Mueller recently responded to a site owner who experienced a significant drop in traffic in September 2023 and saw no improvement despite hiring an SEO expert.
Mueller's advice was straightforward: there’s no single fix, and sometimes, moving on to a new project might be best. To paraphrase John’s response:
Server downtime most likely has nothing to do with the changes you're seeeing.
Long downtime is terrible for search, since a lot of pages will fall out of the search results during that time. It takes time for them to be reindexed again.
There will be more core updates, so there's room to grow again, but you really need to rethink your site's strategy to get into a good spot.
Recovering is not about dialing back the ads from infinity to infinity-1
or disavowing 5 links
or buying 5 links
or switching to another SEO plugin
He ended by saying that sometimes, it might be best to recognise when a site has had its good run and consider applying the lessons learned to a new project or professional work with clients. Basically, let it die 😬
This approach can help maintain peace of mind and ensure continued growth and learning in the field of SEO.