One of Google's top people (I think it was Matt Cutts) said not too long ago that Google uses several different algorithms at random. I've always found it to be a bit far fetched and I always thought it was a bit of a smoke screen - until last night.
My rankings for 'search engine optimization' are different across the datacenters:- #22 in one group of datacenters, #27 in another group, and #40/41 in another group. Those rankings vary slightly from time to time, but there are several distinct groups.
So last night I checked the allinanchor:, allintitle:, and allintext: results for a sample from each group and I was surprised by what I found.
The #22 group allinanchor: #39
allintitle: not in the top 1000
allintext: not in the top 1000
The #27 group allinanchor: #37
allintitle: >300 (didn't check down to 1000)
allintext: #32
The #40/41 group allinanchor: #61
allintitle: #57
allintext: #58
The #22 group is ranked higher than the #27 group even though its allinanchor: is ranked lower, and it isn't even in the top 1000 for allintitle: and allintext:. Also, the allinanchor:, allintitle: and allintext: rankings are very different across the groups - so different that it doesn't make sense unless different algorithms are being used.
I'm now inclined to think that Google really does use different algorithms randomly - the random part being which datacenter provides the surfer with the results at any given time, and they change all the time.
Joined: Wed Sep 03, 2003 7:18 am Posts: 8366 Location: Malaysia
That would be a strange way to run a search engine in that it would be throwing relevancy out the window most of the time.
It coud also be a temporary condition given that the recent update has dragged on for some time.
I am more inclinced to believe that the allinanchor: search has been "fixed" just like the Google Link: search and posted to that effect some time before the recent update.and the same thing may be true of the allintitle and allintext searches, after all they like the link search are of no real use to anyone besides Webmasters.
The same results you have reported would also be true if a different index was being used at each group of data centers.
The same results you have reported would also be true if a different index was being used at each group of data centers.
True, but the top sites are the same through all the datacenters and I think they are pretty much the same basic index, but I can examine them all.
My inclination (imagination) is towards the idea that Google uses the various datacenter groups to try out different algorithms. To my imagination, the #40 group looks like normal Google, and the #22 group looks a bit like like a trial - that's assuming that the allinanchor: etc. are working as normal.
Mel wrote:
That would be a strange way to run a search engine in that it would be throwing relevancy out the window most of the time.
Not if the different algos produced decent relevancy. As it is, we do get results from the various datacenters, so, whatever the reason for the differences (different algos, mid-update, whatever), we are receiving them all whatever their relevancy is like.
Incidentally, I saw these big difference (#23 to #30+ to #40+) before the recent big changes (update). I didn't check all the datacenters at the time, but the differences were there.
Last edited by PhilC on Thu Feb 17, 2005 3:23 pm, edited 1 time in total.
Thu Feb 17, 2005 2:44 pm
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
I'm just thinking back a bit.
Google have burned themselves in the past by rolling out overall algo changes; e.g. Florida, and that wasn't the only occassion. Since then they've added a load more datacenters, and it could be that they are avoiding burning themselves again by testing significant algo changes 'live' in datacenter groups, rather than install them on all datacenters simultaneously. Just a thought.
Wouldn't these differences exist even if using the same algo but using a different timeslice. Example: datacenter A does a full spider/update then datacenter B begins update. Even a 12 hour lag in time could result in subtle differences you have observed. Just a thought no supporting evidence.
I can't imagine that each datacenter, or group of datacenters, has its own spiders. They indexes would get way off if they did.
Thu Feb 17, 2005 9:45 pm
WebTone
Full member
Joined: Fri Jan 28, 2005 12:25 am Posts: 185
Phil I'm not sure I understand fully what you mean above but I certainly get google bots from different IPs visiting my main sites in the same cycle and these are different geo locations according to sitestats.
Thu Feb 17, 2005 11:51 pm
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
Yes it uses quite a few different IPs. What I meant was that Each datacneter, or group of datacenters, can't really do its own independant independant crawling, parsing and storing sytem or the various indexes would tend to become too dissimilar. I can't see Google doing that.
Thu Feb 17, 2005 11:58 pm
yonnermark
Contributor
Joined: Mon Jul 14, 2003 8:29 pm Posts: 1405
your findings are plausible mainly because of what the Google chap said. He wouldn't just make stuff up for the sake of it... So I guess there's a lot of mileage in this idea
The #22 and #27 groups merged yesterday, and became one larger #27 group. Also 3 datacenters crossed over to a different group. It's not compatible with my first thoughts - that different datacenters simply employ different algorithms.
If different algorithms are used, then more than one of them appear to reside in each datacenter. I do think that different algos are used though, because of the very distinct ranking groups. Ranking them like that must mean different algos or different indexes, and I don't care for the idea of different, automomous indexes - especially when 2 groups merged and 3 datacenters crossed over.
It could be thought that the #22 group's indexes were simply updated to be the same as the (newer?) #27 groups indexes, but that doesn't appear to be case because 1 datacenter moved the other way - from the #27 group to the ~#40 group.
Fri Feb 18, 2005 3:57 pm
razvan
Full member
Joined: Sun Mar 14, 2004 4:44 pm Posts: 245
Hello,
A little off-topic. PhilC, what do you use to monitor so many datacenters. I have a tool that monitors at most ten.
I've been watching 45 datacenters in the last few days and I've found that the results from some of them change. But they are not updating in the sense that they are having a new index uploaded to them, because they can return one set of results one minute, another set a minute later, and then back to the orginal set a minute after that.
I can think of 2 reasons for those changes. One is multiple algorithms. The other is that we don't always receive the results from the datacenter where we request them from - we are sometimes switched to a different datacenter.
To my mind, the second one has a ring of truth to it - and it makes some sense. Google normally shares the load of searches between datacenters, and we often see that in action between one page of results and the next. It also makes sense to share the load when a specific datacenter is being searched. Bearing in mind that Google is directing searches to the various DCs, at the time that we search a particular DC, it could well be fully loaded, and it might pass us on to another DC.
If that's happening, it means that, when we use tools to search specific DCs, we can't just assume that the results they show are from the DCs that we think they are from. And when a few DCs appear to be dancing they may not be dancing at all.
Mon Feb 21, 2005 1:32 am
Mel
Professional / Mod
Joined: Wed Sep 03, 2003 7:18 am Posts: 8366 Location: Malaysia
Yep the load balancing idea seems to make sense Phil and of course that has to be one of the reasons for having so many data centers.
But I think that Google is also still tweaking the results of the last update, which further complicates things, and they just might be doing one thing on one data center and another somewhere else.
Mon Feb 21, 2005 4:10 am
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
Switching us to another DC when we search a particular DC could also be because the one we search is stood down very briefly while some sort of update is being uploaded.
There is quite a bit of activity in the DCs, some of which I am putting down to being switched to other DCs, and some of it appears to be changes/updates.
Mon Feb 21, 2005 1:03 pm
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
The DCs continue to change but there are still 2 distinct groups. Since the last post, the #27 group has gone to #28 - #29 - #30 - #28 - #31 and now to #28 again. The #40 group has made similar moves but all its DCs are now suddenly at #31.
Somebody at WPW made an interesting observation - the 2 groups return greatly different numbers of results. For instance, a search on the word 'the' returns ~3 billion results in the A group (#27), and 8 billion results in the B group (which contains some sub-groups).
It could be indicative of them having different indexes, or it could mean that a different algo is being used, although it's hard to imagine what the difference in algo could be that would produce such a huge difference in the number of results for the single word 'the'. I suppose an algo that sort of pre-selects pages could account for it. For instance, and expert system sort of pre-selects pages in that only those pages that are linked to by certain other pages are considered. All other pages are not considered regardless of any on-page and off-page relevancy. It's a sort of pre-selection.
Fri Feb 25, 2005 1:35 pm
CEREBRU
Full member
Joined: Wed Oct 13, 2004 3:20 pm Posts: 752
Thanks for new info.
What do you consider as # of the group? where this goups are listed?
Thanks
Fri Feb 25, 2005 1:39 pm
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
I don't understand the question
Fri Feb 25, 2005 1:41 pm
Mel
Professional / Mod
Joined: Wed Sep 03, 2003 7:18 am Posts: 8366 Location: Malaysia
Hi Phil
I notice for some terms I played with that the groups do not appear to be clustered by index size.
I find different rankings for the same keywords on DCs that report the same numbers for a the search (8,000,000,000 vs 2,820,000,000)
This would seem to me to imply that there are both different indexes and different algorithms in play among the various DCs.
Fri Feb 25, 2005 2:10 pm
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
Hi Mel,
Group B has several sub-groups in it, which do produce different rankings. Group A's DC sets seem to be pretty consistant most of the time, although I do occassionally find variations in them. Also a few of group A's DCs switch to the other group and back, but I put those switches down to being redirected at the time.
I'm sure that there are different algos in the DCs. The apparent index sizes suggest that there are different indexes in the 2 groups, although it's possible that a different algo causes it. It would be quite a surprise if almost half of the DCs contained an index that is less than half the size of the one in the other DCs.
Fri Feb 25, 2005 2:21 pm
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
If anyone is interested in examining all 45 DCs, the different results show better the further down the results you go. It's a bit like a pendulum - at the top there is very little movement, but at the bottom there is a lot of movement.
For instance, I'm looking at the results for a particular searchterm on page 3, and I see only 2 variations of results across all 45 DCs, although there have been more until today. If I go down to the 10th results page, I see 5 variations for the same searchterm.
Here are the 45 datacenters that I know about (split into DC sets).
* = group A. Others are group B which contains some sub-groups.
+ = usually group A but changes often, which is probably due to redirecting.
I'd like a bit of help. If you read this, would you search www.google.com for 'search engine optimization' (no quotes) and tell me if you see High Rankings at #4. I'm seeing it at #7 at the moment. It's at #7 in group B DCs, but it's at #4 in group A DCs, and I want to know quickly if group A DCs are actually used to serve normal results.
I will keep checking myself, but it will be much quicker if people would also look for me.
Another indicator is that group B DCs show ~7,790,000 results for that searchterm, and group A DCs show ~5,590,000.
The reason that I'm wondering about this is that the guy at WPW (see earlier post) has been watching 2 DC groups for 2 weeks I've been watching them for a week, and I'm beginning to wonder if we should start thinking in terms of 2 Googles.
Fri Feb 25, 2005 8:43 pm
CEREBRU
Full member
Joined: Wed Oct 13, 2004 3:20 pm Posts: 752
its 7
Sat Feb 26, 2005 12:30 am
Mel
Professional / Mod
Joined: Wed Sep 03, 2003 7:18 am Posts: 8366 Location: Malaysia
Its #7 from here Phil
Sat Feb 26, 2005 1:37 am
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
A couple of people on Janeth's forum have seen #4s, but I'd like to know if those DCs deliver results as 'normally' as the others do. So I'd appreciate a quick search from time to time - looking for #4s.
Since I posted, many of the #4s have gone over to #7s so there are a lot of them now. But I'm sure they are only temporary redirections - it's not uncommon.
Sat Feb 26, 2005 2:41 am
sleepy
Member
Joined: Thu Feb 17, 2005 1:40 am Posts: 23
highrankings on #4 out of 6.020.000 results
_________________ FasTrackSeo collects, organizes and displays 16 fields of data
for current top-10 pages in Google, Yahoo!, MSN, & AOL
Unique and Brandnew - Trial Download
Sat Feb 26, 2005 3:03 am
Mel
Professional / Mod
Joined: Wed Sep 03, 2003 7:18 am Posts: 8366 Location: Malaysia
I am now seeing highrankings at #4
Last edited by Mel on Sat Feb 26, 2005 4:46 am, edited 1 time in total.
Sat Feb 26, 2005 3:32 am
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
I'm still seeing it at #7, but a few people have seen it at #4 (right now it can even be #3). Most of the DCs have had it at #7 since I asked for people to search and, since a few people have seen it at #4, it should be safe to say that the #4 DCs do produce results as normally as the rest of them.
So I'm beginning to think in terms of 2 or more Googles. It sounds silly, huh? But the different groups are effectively different engines - different index size, or different algorithms, or both. The difference being that we don't get the choice as to which we search. I wish I'd been watching all the DCs from a long time ago.
Users browsing this forum: No registered users and 0 guests
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum