View unanswered posts | View active topics It is currently Sat Jul 31, 2010 3:08 am



This topic is locked, you cannot edit posts or make further replies.  [ 38 posts ]  Go to page 1, 2  Next
 Google uses several algorithms? 
Author Message
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post Google uses several algorithms?
One of Google's top people (I think it was Matt Cutts) said not too long ago that Google uses several different algorithms at random. I've always found it to be a bit far fetched and I always thought it was a bit of a smoke screen - until last night.

My rankings for 'search engine optimization' are different across the datacenters:- #22 in one group of datacenters, #27 in another group, and #40/41 in another group. Those rankings vary slightly from time to time, but there are several distinct groups.

So last night I checked the allinanchor:, allintitle:, and allintext: results for a sample from each group and I was surprised by what I found.

The #22 group
allinanchor: #39
allintitle: not in the top 1000
allintext: not in the top 1000

The #27 group
allinanchor: #37
allintitle: >300 (didn't check down to 1000)
allintext: #32

The #40/41 group
allinanchor: #61
allintitle: #57
allintext: #58

The #22 group is ranked higher than the #27 group even though its allinanchor: is ranked lower, and it isn't even in the top 1000 for allintitle: and allintext:. Also, the allinanchor:, allintitle: and allintext: rankings are very different across the groups - so different that it doesn't make sense unless different algorithms are being used.

I'm now inclined to think that Google really does use different algorithms randomly - the random part being which datacenter provides the surfer with the results at any given time, and they change all the time.

_________________
PhilC
Hidden Text
Search Engine Optimization articles and tools :: PageRank explained


Thu Feb 17, 2005 1:25 pm
Profile WWW
Professional / Mod
Professional / Mod

Joined: Wed Sep 03, 2003 7:18 am
Posts: 8366
Location: Malaysia
Post 
That would be a strange way to run a search engine in that it would be throwing relevancy out the window most of the time.

It coud also be a temporary condition given that the recent update has dragged on for some time.

I am more inclinced to believe that the allinanchor: search has been "fixed" just like the Google Link: search and posted to that effect some time before the recent update.and the same thing may be true of the allintitle and allintext searches, after all they like the link search are of no real use to anyone besides Webmasters.

The same results you have reported would also be true if a different index was being used at each group of data centers.

_________________
Expert SEO services

Buy Cheap Used Cars

Kids bedding & Baby bedding sets


Thu Feb 17, 2005 2:33 pm
Profile WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
Mel wrote:
The same results you have reported would also be true if a different index was being used at each group of data centers.

True, but the top sites are the same through all the datacenters and I think they are pretty much the same basic index, but I can examine them all.

My inclination (imagination) is towards the idea that Google uses the various datacenter groups to try out different algorithms. To my imagination, the #40 group looks like normal Google, and the #22 group looks a bit like like a trial - that's assuming that the allinanchor: etc. are working as normal.

Mel wrote:
That would be a strange way to run a search engine in that it would be throwing relevancy out the window most of the time.

Not if the different algos produced decent relevancy. As it is, we do get results from the various datacenters, so, whatever the reason for the differences (different algos, mid-update, whatever), we are receiving them all whatever their relevancy is like.

Incidentally, I saw these big difference (#23 to #30+ to #40+) before the recent big changes (update). I didn't check all the datacenters at the time, but the differences were there.


Last edited by PhilC on Thu Feb 17, 2005 3:23 pm, edited 1 time in total.



Thu Feb 17, 2005 2:44 pm
Profile WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
I'm just thinking back a bit.

Google have burned themselves in the past by rolling out overall algo changes; e.g. Florida, and that wasn't the only occassion. Since then they've added a load more datacenters, and it could be that they are avoiding burning themselves again by testing significant algo changes 'live' in datacenter groups, rather than install them on all datacenters simultaneously. Just a thought.


Thu Feb 17, 2005 2:49 pm
Profile WWW
Intermediate member

Joined: Thu Sep 23, 2004 1:46 pm
Posts: 51
Location: KY
Post 
Wouldn't these differences exist even if using the same algo but using a different timeslice. Example: datacenter A does a full spider/update then datacenter B begins update. Even a 12 hour lag in time could result in subtle differences you have observed. Just a thought no supporting evidence.

_________________
Adult Toys & Video


Thu Feb 17, 2005 8:19 pm
Profile YIM WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
I can't imagine that each datacenter, or group of datacenters, has its own spiders. They indexes would get way off if they did.


Thu Feb 17, 2005 9:45 pm
Profile WWW
Full member

Joined: Fri Jan 28, 2005 12:25 am
Posts: 185
Post 
Phil I'm not sure I understand fully what you mean above but I certainly get google bots from different IPs visiting my main sites in the same cycle and these are different geo locations according to sitestats.


Thu Feb 17, 2005 11:51 pm
Profile
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
Yes it uses quite a few different IPs. What I meant was that Each datacneter, or group of datacenters, can't really do its own independant independant crawling, parsing and storing sytem or the various indexes would tend to become too dissimilar. I can't see Google doing that.


Thu Feb 17, 2005 11:58 pm
Profile WWW
Contributor
Contributor

Joined: Mon Jul 14, 2003 8:29 pm
Posts: 1405
Post 
your findings are plausible mainly because of what the Google chap said. He wouldn't just make stuff up for the sake of it... So I guess there's a lot of mileage in this idea

_________________
Royton


Fri Feb 18, 2005 3:45 pm
Profile WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
The #22 and #27 groups merged yesterday, and became one larger #27 group. Also 3 datacenters crossed over to a different group. It's not compatible with my first thoughts - that different datacenters simply employ different algorithms.

If different algorithms are used, then more than one of them appear to reside in each datacenter. I do think that different algos are used though, because of the very distinct ranking groups. Ranking them like that must mean different algos or different indexes, and I don't care for the idea of different, automomous indexes - especially when 2 groups merged and 3 datacenters crossed over.

It could be thought that the #22 group's indexes were simply updated to be the same as the (newer?) #27 groups indexes, but that doesn't appear to be case because 1 datacenter moved the other way - from the #27 group to the ~#40 group.


Fri Feb 18, 2005 3:57 pm
Profile WWW
Full member

Joined: Sun Mar 14, 2004 4:44 pm
Posts: 245
Post 
Hello,

A little off-topic. PhilC, what do you use to monitor so many datacenters. I have a tool that monitors at most ten. :)

_________________
| O-Zone | Travel to Romania |


Fri Feb 18, 2005 7:29 pm
Profile YIM WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
I do it myself.


Fri Feb 18, 2005 11:24 pm
Profile WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
I've been watching 45 datacenters in the last few days and I've found that the results from some of them change. But they are not updating in the sense that they are having a new index uploaded to them, because they can return one set of results one minute, another set a minute later, and then back to the orginal set a minute after that.

I can think of 2 reasons for those changes. One is multiple algorithms. The other is that we don't always receive the results from the datacenter where we request them from - we are sometimes switched to a different datacenter.

To my mind, the second one has a ring of truth to it - and it makes some sense. Google normally shares the load of searches between datacenters, and we often see that in action between one page of results and the next. It also makes sense to share the load when a specific datacenter is being searched. Bearing in mind that Google is directing searches to the various DCs, at the time that we search a particular DC, it could well be fully loaded, and it might pass us on to another DC.

If that's happening, it means that, when we use tools to search specific DCs, we can't just assume that the results they show are from the DCs that we think they are from. And when a few DCs appear to be dancing they may not be dancing at all.


Mon Feb 21, 2005 1:32 am
Profile WWW
Professional / Mod
Professional / Mod

Joined: Wed Sep 03, 2003 7:18 am
Posts: 8366
Location: Malaysia
Post 
Yep the load balancing idea seems to make sense Phil and of course that has to be one of the reasons for having so many data centers.

But I think that Google is also still tweaking the results of the last update, which further complicates things, and they just might be doing one thing on one data center and another somewhere else.


Mon Feb 21, 2005 4:10 am
Profile WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
Switching us to another DC when we search a particular DC could also be because the one we search is stood down very briefly while some sort of update is being uploaded.

There is quite a bit of activity in the DCs, some of which I am putting down to being switched to other DCs, and some of it appears to be changes/updates.


Mon Feb 21, 2005 1:03 pm
Profile WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
The DCs continue to change but there are still 2 distinct groups. Since the last post, the #27 group has gone to #28 - #29 - #30 - #28 - #31 and now to #28 again. The #40 group has made similar moves but all its DCs are now suddenly at #31.

Somebody at WPW made an interesting observation - the 2 groups return greatly different numbers of results. For instance, a search on the word 'the' returns ~3 billion results in the A group (#27), and 8 billion results in the B group (which contains some sub-groups).

It could be indicative of them having different indexes, or it could mean that a different algo is being used, although it's hard to imagine what the difference in algo could be that would produce such a huge difference in the number of results for the single word 'the'. I suppose an algo that sort of pre-selects pages could account for it. For instance, and expert system sort of pre-selects pages in that only those pages that are linked to by certain other pages are considered. All other pages are not considered regardless of any on-page and off-page relevancy. It's a sort of pre-selection.


Fri Feb 25, 2005 1:35 pm
Profile WWW
Full member

Joined: Wed Oct 13, 2004 3:20 pm
Posts: 752
Post 
Thanks for new info.
What do you consider as # of the group? where this goups are listed?

Thanks


Fri Feb 25, 2005 1:39 pm
Profile
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
I don't understand the question


Fri Feb 25, 2005 1:41 pm
Profile WWW
Professional / Mod
Professional / Mod

Joined: Wed Sep 03, 2003 7:18 am
Posts: 8366
Location: Malaysia
Post 
Hi Phil
I notice for some terms I played with that the groups do not appear to be clustered by index size.

I find different rankings for the same keywords on DCs that report the same numbers for a the search (8,000,000,000 vs 2,820,000,000)

This would seem to me to imply that there are both different indexes and different algorithms in play among the various DCs.


Fri Feb 25, 2005 2:10 pm
Profile WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
Hi Mel,

Group B has several sub-groups in it, which do produce different rankings. Group A's DC sets seem to be pretty consistant most of the time, although I do occassionally find variations in them. Also a few of group A's DCs switch to the other group and back, but I put those switches down to being redirected at the time.

I'm sure that there are different algos in the DCs. The apparent index sizes suggest that there are different indexes in the 2 groups, although it's possible that a different algo causes it. It would be quite a surprise if almost half of the DCs contained an index that is less than half the size of the one in the other DCs.


Fri Feb 25, 2005 2:21 pm
Profile WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
If anyone is interested in examining all 45 DCs, the different results show better the further down the results you go. It's a bit like a pendulum - at the top there is very little movement, but at the bottom there is a lot of movement.

For instance, I'm looking at the results for a particular searchterm on page 3, and I see only 2 variations of results across all 45 DCs, although there have been more until today. If I go down to the 10th results page, I see 5 variations for the same searchterm.

Here are the 45 datacenters that I know about (split into DC sets).
* = group A. Others are group B which contains some sub-groups.
+ = usually group A but changes often, which is probably due to redirecting.

64.233.161.99
64.233.161.104
64.233.161.105
64.233.161.147

64.233.167.99
64.233.167.104
64.233.167.147

64.233.171.99 *
64.233.171.104 *
64.233.171.105 *
64.233.171.147 *

64.233.179.99 *
64.233.179.104 *

64.233.183.99
64.233.183.104

64.233.185.99 *
64.233.185.104 *

64.233.187.99 +
64.233.187.104 +

64.233.189.104 +

66.102.7.99 *
66.102.7.104 *
66.102.7.105 *
66.102.7.147 *

66.102.9.99
66.102.9.104

66.102.11.99
66.102.11.104

216.239.37.99 *
216.239.37.104 *
216.239.37.105 *
216.239.37.147 *

216.239.39.99 *
216.239.39.104 *

216.239.53.99 +
216.239.53.104 +

216.239.57.98
216.239.57.99
216.239.57.104
216.239.57.105
216.239.57.147

216.239.59.99
216.239.59.104
216.239.59.105

216.239.63.104

216.239.39.99
216.239.39.104

216.239.53.99
216.239.53.104

216.239.57.98
216.239.57.99
216.239.57.104
216.239.57.105
216.239.57.147

216.239.59.99
216.239.59.104
216.239.59.105

216.239.63.104


Fri Feb 25, 2005 2:33 pm
Profile WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
I'd like a bit of help. If you read this, would you search www.google.com for 'search engine optimization' (no quotes) and tell me if you see High Rankings at #4. I'm seeing it at #7 at the moment. It's at #7 in group B DCs, but it's at #4 in group A DCs, and I want to know quickly if group A DCs are actually used to serve normal results.

I will keep checking myself, but it will be much quicker if people would also look for me.

Another indicator is that group B DCs show ~7,790,000 results for that searchterm, and group A DCs show ~5,590,000.

>>> click here to do the search <<<

The reason that I'm wondering about this is that the guy at WPW (see earlier post) has been watching 2 DC groups for 2 weeks I've been watching them for a week, and I'm beginning to wonder if we should start thinking in terms of 2 Googles.


Fri Feb 25, 2005 8:43 pm
Profile WWW
Full member

Joined: Wed Oct 13, 2004 3:20 pm
Posts: 752
Post 
its 7


Sat Feb 26, 2005 12:30 am
Profile
Professional / Mod
Professional / Mod

Joined: Wed Sep 03, 2003 7:18 am
Posts: 8366
Location: Malaysia
Post 
Its #7 from here Phil


Sat Feb 26, 2005 1:37 am
Profile WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
A couple of people on Janeth's forum have seen #4s, but I'd like to know if those DCs deliver results as 'normally' as the others do. So I'd appreciate a quick search from time to time - looking for #4s.

Since I posted, many of the #4s have gone over to #7s so there are a lot of them now. But I'm sure they are only temporary redirections - it's not uncommon.


Sat Feb 26, 2005 2:41 am
Profile WWW
Member

Joined: Thu Feb 17, 2005 1:40 am
Posts: 23
Post 
highrankings on #4 out of 6.020.000 results

_________________
FasTrackSeo
collects, organizes and displays 16 fields of data
for current top-10 pages in Google, Yahoo!, MSN, & AOL
Unique and Brandnew - Trial Download


Sat Feb 26, 2005 3:03 am
Profile
Professional / Mod
Professional / Mod

Joined: Wed Sep 03, 2003 7:18 am
Posts: 8366
Location: Malaysia
Post 
I am now seeing highrankings at #4


Last edited by Mel on Sat Feb 26, 2005 4:46 am, edited 1 time in total.



Sat Feb 26, 2005 3:32 am
Profile WWW
Founder
Founder

Joined: Thu Nov 21, 2002 1:22 am
Posts: 11147
Post 
I'm still seeing it at #7, but a few people have seen it at #4 (right now it can even be #3). Most of the DCs have had it at #7 since I asked for people to search and, since a few people have seen it at #4, it should be safe to say that the #4 DCs do produce results as normally as the rest of them.

So I'm beginning to think in terms of 2 or more Googles. It sounds silly, huh? But the different groups are effectively different engines - different index size, or different algorithms, or both. The difference being that we don't get the choice as to which we search. I wish I'd been watching all the DCs from a long time ago.


Sat Feb 26, 2005 3:47 am
Profile WWW
Contributor
Contributor

Joined: Thu Jul 15, 2004 1:34 pm
Posts: 1476
Location: New Zealand
Post 
It's #4 for me again Phil. That makes it two times I've seen it at #4 - and one at #7.

_________________
Snowblind \m/
SBD Directory Script | Add URL | Cheap Site Hosting


Sat Feb 26, 2005 8:37 am
Profile
Full member

Joined: Wed Oct 13, 2004 3:20 pm
Posts: 752
Post 
here its 4 - 64.233.171.99, 64.233.171.104, 64.233.171.105, 64.233.171.147...
and many many more.. you can check it in my dance tool.

I think its 50%/50%

Wlas


Sat Feb 26, 2005 12:43 pm
Profile
Display posts from previous:  Sort by  
This topic is locked, you cannot edit posts or make further replies.   [ 38 posts ]  Go to page 1, 2  Next

Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Latest Blogs
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by STSoftware for PTF

phpBB SEO