Google announced they are rolling out a new search algorithm change that helps make the search results “fresher.” The big news here is that besides for the results being fresher, the results will change for about 35% of all searches.

Caffeine Was Infrastructure, This Is Algorithmic

Fresher results can make for more relevant results, which is why Google moved over to the caffeine infrastructure last year. That was only an infrastructure change, to make sure Google can index, crawl and return results faster. Now Google changed their search algorithm to show fresher results, fresher than ever before.

Google said:

We completed our Caffeine web indexing system last year, which allows us to crawl and index the web for fresh content quickly on an enormous scale. Building upon the momentum from Caffeine, today we’re making a significant improvement to our ranking algorithm that impacts roughly 35 percent of searches and better determines when to give you more up-to-date relevant results for these varying degrees of freshness.

35% Of The Searches Are Impacted

That is larger than the Panda update which impacted 12% of the searches conducted.

What type of searches does it impact? Google said:

  • Recent events or hot topics. For recent events or hot topics that begin trending on the web, you want to find the latest information immediately. Now when you search for current events like [occupy oakland protest], or for the latest news about the [nba lockout], you’ll see more high-quality pages that might only be minutes old.
  • Regularly recurring events. Some events take place on a regularly recurring basis, such as annual conferences like [ICALP] or an event like the [presidential election]. Without specifying with your keywords, it’s implied that you expect to see the most recent event, and not one from 50 years ago. There are also things that recur more frequently, so now when you’re searching for the latest [NFL scores], [dancing with the stars] results or [exxon earnings], you’ll see the latest information.
  • Frequent updates. There are also searches for information that changes often, but isn’t really a hot topic or a recurring event. For example, if you’re researching the [best slr cameras], or you’re in the market for a new car and want [subaru impreza reviews], you probably want the most up to date information.

Postscript From Danny Sullivan: Had a chance to get some questions answered from Google now, plus some addition issues, below….

Freshness Ranking Not New, Just Apparently Improved

It’s not new for Google to do a boost of fresh content. “Query Deserved Freshness” is a content ranking factor that dates back to 2007. The Caffeine update of last year made it possible, Google said, to gather content even faster, which in turn could potentially be ranked better.

So what’s different now? Apparently, freshness is getting even more rewarded, having an impact on one out of three searches. That’s huge — though it’s unclear what it was before. For all we know, 35% of searches were already being impacted by freshness ranking. The previous number was never stated (and yes, we’re checking with Google on this).

Postscript: Google says the change is providing “fresh” content for twice as many queries as before. In other words, the old “freshness” algorithm had an impact on about 17.5% of queries. Now it impacts double that figure, 35%.

Potential For “Freshness” Spam

There are potential downsides. Sometimes you do want to reward fresh content. But what’s fresh? If someone simply makes a small change to a page, does that give it a fresh boost? If someone reposts exactly the same content on a new page a day or two after initially posting it, is that fresh? Is when the page was first found define freshness, or is the first modified date used?

Does this open Google up to an even worse situation than can already happen with Google News now, where publishers file and refile stories in an effort to win the freshness race there, since the latest versions of stories often get top billing.

Rewarding freshness potentially introduces huge decreases in relevancy, new avenues for spamming or getting “light” content in. Most likely, Google’s going to use a combination ofsearch ranking factors to help qualify when it wants to trust something is both fresh and good.

Google wouldn’t say how “freshness” is being determined, but it did tell us in response to questions that being fresh wasn’t the only thing being rewarded:

Freshness is one component, but we also look at the content of the result, including topicality and quality.

Postscript: Google now tells us that one of the freshness factors — the way they determine if content is fresh or not — is the time when they first crawled a page. So if you publish a page, and then change that page, it doesn’t suddenly become “fresh.”

Freshest Info Still Missing: Twitter

Also unclear is the situation with Twitter. The largest amount of “fresh” information on the web are tweets. Despite the growth of Google+, the volume of tweets happening far eclipses the content there.

Google has been without timely access to tweets since July. It simply cannot crawl Twitter fast enough without receiving the “firehose” of Twitter data to keep up. Today’s announcement does nothing to solve this. Google is only introducing a ranking change, not an indexing change that brings in more tweets.

I asked about this issue, how Google still lacks the Twitter firehose and was told:

Often times when there’s breaking news, microblogs are the first to publish. We’re able to show results for recent events or hot topics within minutes of the page being indexed, but we’re always looking for ways we can serve you relevant information faster and will work to continue improving

35% Change Doesn’t Mean 35% Improvement

A final but important caveat. It’s important not to misinterpret the percentage Google gave out — a 35% change to its results — to mean they are 35% improved.

I saw this the first time we saw Google start talking about a percentage change to it search results, when the Panda Update was said to create at 12% impact. Some assumed that meant a 12% improvement. It didn’t.

We have no commonly accepted way of rating search engine result quality in a numeric fashion. No third party measures if Google or Bing’s results are “90%” good, for example. This means there’s no way to say whether something has improved by a particular percentage.

Google is clear what it means when it puts these percentages out. I’ve never seen them say that they’re to be interpreted as some type of improvement metric. But people do make that mistake — and shouldn’t.