It is a value of a page's importance for a particular topic based on the linkages between on-topic pages. Normal PageRank is a value of a page's overall importance based on the linkages between all pages, regardless of topic or anything else. For Topic Sensitive PageRank, it is necessary to pre-determine a range of specific topics and, for each page in the index, to pre-calculate a Topic Sensitive PageRank for each topic.
The nature of PageRank means that it cannot be calculated on-the-fly at the time of the search because when a search is made, every on-topic page in the index needs to be considered for the rankings. If the TSPRs were not pre-calculated, Google wouldn't know which pages were on-topic and it would need to find out by using a normal algorithm, and then the TSPRs would need to be calculated for all the returned pages, or perhaps a sub-set of them. That's much too time consuming.
So TSPRs need to be pre-calculated and each page in the index would have a very large number of PageRanks - the normal PageRank, and a PageRank for each supported topic. That's a lot of additional storage space, but I guess that's not a problem.
How Topic Sensitive PageRank fits in with Florida
The biggest way that it fits in is that, like the expert system theory, TSPR cannot produce a set of results for every search query. It can only produce a set of results for its pre-defined topics. We know that the results for different searchterms are produced by different algorithms - the Florida algo and the old algo. Topic Sensitive PageRank appears to match that particular Florida effect.
Another way that Dan Thies fits TSPR with the Florida effects is that Florida results are returned for more general queries but not for more specific queries, but I'm not so sure that that's likely to happen with TSPR. When Topic Sensitive PageRank was first conceived, programmed and tested, it used the DMOZ top level categories as topics (Arts, Business, Computers, etc). Those one-word topics are as 'general' as it is possible to be, and it isn't difficult to produce a TSPR for every page in the index for each topic, or to produce a highly relevant set of results for each one-word topic.
But when an additional search word is added to a one-word topic, e.g. "uk holidays", then although it is still a very general searchterm, it would require a great many pre-defined topics, and TSPRs for each page in the index (including TSPR0s), to cover all the possible 2-word topics based on the word "holidays". That's just for a single one-word topic, but there are hundreds or even thousands of those one-worders.
I used the "uk holidays" example because I know that it produces Florida results. The 'holidays' based 2-worders may be limited so that every town and city in the world isn't included in the topics database but, even so, when you add up all the 2-worders for every one-word topic, there are still thousands upon thousands of 2-word search terms that would need to be pre-defined topics for TSPR to be a useful search algorithm.
Dan suggested that each word in the searchterm might be in the topics database and that, for a given search query, a "distance" between them can be calculated and acted upon. I don't like this at all because it would mean that topics are created on-the-fly and, if that is so, where are the Topic Sensitive PageRanks for them? As was mentioned earlier, they can't be calculated on-the-fly. Either a TSPR for each page exists in advance for a specific topic, or it doesn't. If it doesn't exist then it can't be calculated at the point of searching, so there is no need to measure distances between words in a topics database.
He also suggested that Google might have combined their CIRCA technology (aquired when they purchased Applied Semantics) with TSPR. The idea is that the CIRCA technology is capable of deciding what a person is searching for from the words typed into the search box. The technology then selects a suitable topic and produces a set of results for it. Again, I don't like this because it would require topics to be created on-the-fly, or a topic to be often selected that is merely a close match for the searchterm.
and finally...
I'm not saying that Dan's theory is wrong. I'm saying that I'm inclined to think that it is wrong because there are parts of it that don't seem to add up. If I am correct in my assertion that PageRanks (topic sensitive or not) cannot be calculated on-the-fly, then it would require a very large database of pre-defined topics and, for each of those topics, a Topic Sensitive PageRank for every page in Google's index.
It takes Google around a week to calculate the normal PageRank for every page in the index. There just isn't the time for that to be done for the many thousands of necessary topics. Yes, for each topic, the calculations would be done with a comparitively small number of pages, albeit several million in some cases, and yes, the number of iterations could be reduced for the TSPR calculations, but why do that when it means ending up with inaccurate PageRanks? Besides, it requires a certain number of iterations to begin to get close to the final figures, so the number of them can't be reduced too much. I really don't believe that the computing time could be reduced sufficiently to make it possible to calculate Topic Sensitive PageRank on-the-fly at the point of search.
Last edited by PhilC on Tue Jan 13, 2004 12:22 pm, edited 1 time in total.
Sun Jan 11, 2004 4:19 pm
rustybrick
Member
Joined: Sun Jan 11, 2004 6:05 pm Posts: 6 Location: New York, USA
Well I think the assumption is that the CIRCA technology provides a mechanism for apply topic sensitive to the page rank.
No one will argue it requires a very smart algorithm to accomplish this but maybe it is was finally achieved.
I do not know enough about the details of the algorithm but Teoma has done it, so why can't Google?
Sun Jan 11, 2004 6:09 pm
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
I don't know how Teoma works, but TSPR is ordinary PageRank, which requires a reasonable number of iterations to get any meaningful figures. The "topic" part is just that it is calculated using much fewer pages then the normal PageRank.
The original idea of TSPR is to pre-calculate the values for each topic. Dan's idea is to do it on-the-fly and, in terms of time, I just don't think it can be done on-the-fly.
For it not to be done on-the-fly requires pre-set topics. CIRCA could decide on a suitable topic from the words provided in the searchterm, but Dan seemed to be suggesting something different - that topics are created from the searchterm words on-the-fly, based on the "distance" between them in the topic words database.
If topics are created on-the-fly, then Topic Sensitive PageRanks must also be calculated on-the-fly, and I don't believe that can happen - it takes too long.
Sun Jan 11, 2004 6:18 pm
rustybrick
Member
Joined: Sun Jan 11, 2004 6:05 pm Posts: 6 Location: New York, USA
Interesting...
Sun Jan 11, 2004 11:13 pm
Mel
Professional / Mod
Joined: Wed Sep 03, 2003 7:18 am Posts: 8366 Location: Malaysia
I was under the impression that CIRCA was used to understand the topic of the page not the search, and that would seem to fit in with the way it is used for Adsense:
Quote:
Applied Semantics' products are based on its patented CIRCA technology, which understands, organizes, and extracts knowledge from websites and information repositories in a way that mimics human thought and enables more effective information retrieval
If I've understood Dan's article correctly, he suggests that it is being used to understand the searchterm's topic, but the 'new algo' explanation is very limited in the article. It's more of a small overview.
Dan Thies wrote:
What CIRCA allows Applied Semantics (and Google) to do is to identify concepts related to specific words and phrases.
Mon Jan 12, 2004 12:02 pm
DanThies
Member
Joined: Tue Jan 13, 2004 2:33 am Posts: 34
Phil:
It's entirely possible that I am completely, 100% wrong. I am certain that I am at least partially wrong. I would have to be, since I am speculating. I am also happy to have this conversation with someone who really understands PageRank in the first place.
Let me try to clarify things a bit. This report was created as a mini-update for my book's readers, so a lot of detail is left out.
For the sake of argument, let's say that Google is capable of calculating a set of topic-sensitive PageRank (TSPR) scores. Maybe it's 2 topics, maybe it's 16, 100, whatever. For each topic, they'd need to have a TSPR for each page, as Phil has pointed out. Maybe they'd only calculate TSPR for pages above a certain threshold in PageRank.
Topics are not search terms.
The original paper on TSPR (the link is in my report) describes the use of 16 topics, representing the top-level categories of DMOZ. You could do any search query, and bias the results by one of the 16 topics. You didn't have to search for the word "Arts" to use TSPR, you could use the TSPR for "Arts" to slant the results of any search toward "Arts."
But users don't say "I'm searching for these words using this topic. This was noted in my report. If they can't come up with a topic (or topics) to use, they can't use TSPR. So without some other mechanism, you would only be able to use TSPR when someone indicates the topic - for example, by "searching the web" from a directory page.
That's where Applied Semantics / CIRCA could come into play.
There's at least one example in my report, and I don't want to type it in again, but they *could* use CIRCA to determine how your search phrase is related to some topic or set of topics. They can also tell you how closely related, represented with a numeric value - a distance between your search phrase and a topic for which they have calculated TSPR.
The greater the 'semantic distance' (why not coin any extra term) between your search phrase and a topic, the less impact TSPR would have on the results, and the more influence that the generic PageRank would have.
That's the theory, in an even smaller nutshell, with more detail, I hope.
A few more quick hits to address specific questions:
- CIRCA could also be at play in determining the topics of pages for calculating TSPR. As someone pointed out, they're already doing this with Adsense.
- Taher Haveliwala is also one of the founders of Kaltix, the company formed by the people who had figured out how to calculate PageRank really fast, which was acquired by Google approximately 18 seconds after it was founded.
- To make TSPR work, you wouldn't need an exact PageRank for the topics, a fast approximation would do, since it's not the only value used in returning results, but is instead used to bias the results.
- For fun, look at nearly identical content on different web sites, that display Adsense. Depending on the type of site it's published on, you can get very different types of ads. An easy source of 'nearly identical content' is articles (like those I publish), which are frequently published on numerous web sites.
I'm glad you stopped by because I would have liked to discuss it with you, but I'm banned from the Highrankings forum where I think you are an administrator ( http://www.webworkshop.net/seoforum/viewtopic.php?t=129 ). I even tried to find an email address on your site, but I couldn't find one with your name in it.
I still think that your TSPR idea is one of the two theories that has a chance of being right - the other being the "expert system" that I put forward here. In fact, I'd added a bit at the bottom of my article about your idea ( http://www.webworkshop.net/florida-update.html#latest ). Like you, I also pointed out flaws in the other common theories. So we are looking at Florida with very similar minds - that Google really does use two different algorithms depending on the searchterms, and that the early explanations, "seo filter", etc. were flawed. So on with the current discussion...
I've re-read Taher H. Haveliwala's paper again and I found that I'd misunderstood it. I'd assumed that each page in Google's index had to have a pre-computed TSPR value for each supported topic. But Taher H. Haveliwala wrote in his paper:-
Quote:
An approach for enhancing rankings by generating a PageRank vector for each possible query term was recently proposed ... with favorable results. However, the approach requires considerable processing time and storage, and is not easily extended to make use of user and query context.
It sounds like I was mistaken, because he's writing off the idea, but later in the paper he wrote:-
Quote:
In our approach to topic-sensitive PageRank, we precompute the importance scores offline, as with ordinary PageRank. However, we compute multiple importance scores for each page; we compute a set of scores of the importance of a page with respect to various topics.
Now it appears that I was correct. Confused, aren't I? I'm pretty sure that I'm suffering from the fact that I'm not a mathematician - specifically, I don't understand the mathematical use of the word "vector", as in "topic-sensitive PageRank vector". According to definitions found at Google, it has a number of meanings, including "A quantity having both magnitude and direction, e.g. displacement, velocity, acceleration and force". I rather fancy that that is the meaning in Taher's paper.
He continues...
Quote:
At query time, these importance scores are combined based on the topics of the query to form a composite PageRank score for those pages matching the query. This score can be used in conjunction with other IR-based scoring schemes to produce a final rank for the result pages with respect to the query.
and elsewhere...
Quote:
....we assume a user with a specific information need issues a query to our search engine in the conventional way, by entering a query into a search box. In this scenario, we determine the topics most closely associated with the query, and use the appropriate topic-sensitive PageRank vectors for ranking the documents satisfying the query. This ensures that the ``importance'' scores reflect a preference for the link structure of pages that have some bearing on the query.
I was correct after all. However, I was mistaken that a TSPR is required for each page for every possible topic that is to be supported. In fact, the pre-defined topics only need to be of a more general nature because they are only used to 'bias' the results. "Bias" is a word that both Taher and you used.
The overall effect is to bias ("reflect a preference") the importance of each page in the results set towards the relevant topic(s) for the searchterm and its context. Now I think I'm getting somewhere
So onto your paper:-
Well, now that I have a better (but imperfect) understanding of the way that TSPR works, I don't see any immediately obvious flaws in your theory. It accounts for what I see as the single most important change since Florida - that results sets are compiled in different ways depending on the searchterm, and that the Florida serps are not a 'standard' results set to which one or more of the various suggested filters have been applied. I think that, like the 'expert system' theory ;), the TSPR theory stands a good chance of being correct.
Tue Jan 13, 2004 6:41 pm
I, Brian
Full member
Joined: Sat Dec 06, 2003 9:20 am Posts: 328 Location: Yorkshire, UK
One of the fascinating things here is that TSPR would have remarkably similar results to the Hilltop theory that I have certainly favoured so far.
However, what I would like to see Dan address is why .edu , .gov and directory sites have such elvated rankings in the affected Floridan results - as these are implicitly symptoms of a hilltop system, and are even singled out in the Hilltop paper. In what way would you see TSPR cause these sort of sites to suddenly rank abnormally high? And, in what way would TSPR differ most significantly from a Hilltop dominated algo?
As for themes and context - I do not at all doubt that this should be a part of SEO now.
How'd you get run off from the HR forums, Phil? I didn't know about that. I thought we only had one individual who was actually banished.
You, Brian: (did I get that right?)
Hilltop is similar to Topic-Sensitive PageRank, but I rejected the idea of Hilltop a few days into this thing, for several reasons. I don't think you'd see all these resource & directory type pages ranking as well with Hilltop, for example.
Hilltop is five years old. If it were anything more than an interesting idea, someone would have implemented it by now. For those who remember how crazy Altavista's results looked a couple years ago, is it possible that was an attempt to implement Hilltop?
The main reason, though, is that they'd be throwing PageRank out the window with Hilltop. If they were going to do that, there wouldn't have been much reason for them to create and acquire Kaltix last year, just a few short months before the big change.
PageRank is good, TSPR fixes its flaws. The problem is getting the topics right, getting the semantics right, and getting the balance right.
I'm sure the "onmousedown" code in Google's SERPs has already been discussed in these forums. Google is tracking clicks now, so they are getting live user feedback constantly.
They've said there are more factors to be added to the algorithm, and this must surely be one of them.... one more reason not to click on your competitor's organic listing, if you're super paranoid.
As far as why .edu, .gov, and directory pages are showing up more often (can anyone validate that this is true?), show me a specific SERP and let's walk through it.
There are two kinds of linking relationships on the web - natural and artificial. EDU and GOV sites, as well as directories, are going to have a huge advantage in natural linking from relevant pages. EDU and GOV web sites also tend to be extremely well structured, with related content clustered together, which is also an advantage.
It takes a long time to walk through web graphs, but it's very enlightening, especially after you've been at it for 18-20 hours. Try it - around the 19th hour of squinting at web graphs, a strange calm will overcome you, and you can actually SEE THE WEB.
Random additional note: With very generic search terms, Google seems to have a much better mix of the possible meanings than before. The "real estate" SERPs used to be dominated by residential real estate agents, now you see commercial real estate, real estate investing, real estate law, training, etc. etc. mixed into a lot of them.
Wed Jan 14, 2004 12:03 am
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
DanThies wrote:
Hilltop is similar to Topic-Sensitive PageRank, but I rejected the idea of Hilltop a few days into this thing, for several reasons. I don't think you'd see all these resource & directory type pages ranking as well with Hilltop, for example.
Oddly enough, one of my reasons for liking the 'expert system' idea is because expert pages would tend to link more to resource pages than to commercial pages, which is what people have been seeing in the serps.
That wouldn't be true of directory pages, though, but I haven't seen a preponderance of directory pages around the top of the serps. I see many website pages listed that are in Google's directory, but that's different. How did Google include those pages before Florida? Did they work out the serps and then add the Directory description for each page that, coincidentally, was in their directory, or did they arbitrarily add some of them according to algorithms? I don't think we know the answer to that. But it's perfectly feasible that Google simply adds the directory description to each page that is selected for the serps and, coincidentally, is in the directory. I see no reason why that doesn't happen with the Florida results.
Personally, I've never suggested that it's Hilltop. I've said along that it looks like an 'expert-based system', or words to that effect, that may have been developed from Hilltop. Why wouldn't an expert system have been implemented back when Hilltop was devised? Because that was very soon after Google was launched and, at that time, they were doing very well. It's only in more recent times that the relevancy of the serps provided by one or two other engines has caught up, or almost caught up, and Google needed to do something drastic to move ahead again.
So I still think that an expert system could account for all the reported Florida effects, and stands a good chance of being correct.
Wed Jan 14, 2004 12:30 am
DanThies
Member
Joined: Tue Jan 13, 2004 2:33 am Posts: 34
PhilC wrote:
So I still think that an expert system could account for all the reported Florida effects, and stands a good chance of being correct.
I agree. Whatever they're doing, it has to fit in seamlessly with what they were doing before. It doesn't have to be Topic-Sensitive PageRank. It could be something simpler, or something more complicated.
Wed Jan 14, 2004 12:51 am
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
I thought a little more...
Directory pages (that's pages in a directory, and not pages that are linked to from a directory) are very likely to be selected as 'expert pages'. In fact, they are ideal. I see very many pages that are linked to from Google's directory around the top of the serps. And, if they are in Google's directory, they are also in DMOZ and many smaller sites' directories. That could amount to quite a few 'expert' pages linking to them, and could account for why there are so many up at the top.
Just more food for thought
If we ever find out what Florida really is, we'll probably all have been way off the mark But I still think that an expert system and TSPR are the two most likely candidates that have surfaced so far.
Wed Jan 14, 2004 12:58 am
rustybrick
Member
Joined: Sun Jan 11, 2004 6:05 pm Posts: 6 Location: New York, USA
Question,
What are the core differences in bullet format between both your theories (TSPR and Expert Theory)?
I know the expert theory is based on hilltop but in essence they seem the same. I am going to re-read the Hilltop report but I hope to get a quick answer here.
Also if I may, and I know I said this a hundred times, how does Teoma's Subject Specific Popularity(wrote a summary on it after being upset with Google's results mid December only to now realize that Google is moving towards that direction) differ from these theories as well?
This is an excellent thread.
Wed Jan 14, 2004 1:45 pm
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
An expert system compiles the results set from the pages that are linked to from on-topic 'expert' pages. The expert pages are contained in a database. TSPR compiles the results set in the (or a) 'normal' way, but includes a bias towards one or more relevant, pre-defined topics.
I think that's the only core difference.
Last edited by PhilC on Thu Jan 15, 2004 2:36 am, edited 4 times in total.
Wed Jan 14, 2004 2:57 pm
rustybrick
Member
Joined: Sun Jan 11, 2004 6:05 pm Posts: 6 Location: New York, USA
Thanks Phil.
Wed Jan 14, 2004 3:03 pm
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
I'm chewing over why many pages that were dumped into the void would come back into similar ranking to what they had before. My first thought is that turning down the degree of TSPR bias might cause it, but I'm not so sure.
Because pages came back to similar rankings, it appears that once the initial results set has been selected (thousands of pages - not just the displayable 1000), then the old algo kicks in. Would that happen with TSPR? Maybe yes, maybe no. I need to think some more
Any thoughts?
Wed Jan 14, 2004 3:14 pm
K_D
Member
Joined: Wed Jan 14, 2004 4:29 pm Posts: 5
Hilltop doesn't make sense for google, I'll explain why at the bottom, but I looked for empirical evidence.
Google "birthday balloons"
The top results are not those pages listed in the google/dmoz/directory.net/etc directories (http://directory.google.com/Top/Shopping/Gifts/Balloons) which would likely even be the initial expert pages.
I believe google is detecting affiliated links, which is why some industries (massively crosslinked) are being slammed. Strong internal linking structures (i.e. bizrate) would not suffer this and gain well compared to the others.
I like the science of TSPR, and the taking offline of datacenters could be caused by calculating TSPR but the algorithm is in direct conflict with the concept I stated just above and would (in a small way) help those who are manipulating links and greatly help the smaller cross-linked sites that litter certain categories - and those are the terms and sites most affected.
If google were to depart from PR, there is one or two possibilities that are interesting but Hilltop isn't the algo of the future and I don't see why they'd make a drastic change in that direction.
KD
Wed Jan 14, 2004 4:53 pm
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
K_D wrote:
Hilltop doesn't make sense for google, I'll explain why at the bottom, but I looked for empirical evidence.
Google "birthday balloons" The top results are not those pages listed in the google/dmoz/directory.net/etc directories (http://directory.google.com/Top/Shopping/Gifts/Balloons) which would likely even be the initial expert pages.
That's a good comment K_D. But there are a lot more expert pages out there than the ones in the Google directory. Also, is "birthday balloons" a category in the Google (DMOZ) directory? I haven't checked but it doesn't seem likely. Whether it's a category or not, there will probably be a directory page (maybe more than one) that contains both words, but it may not contain enough instances of both words to qualify it as an expert page for that phrase. Even if it has enough instances it may not have enough (any) links that are associated with both words.
Directory pages look like they should be good expert pages, but many are not. It all depends on what words are on the page, and where they are positioned on the page.
For instance, suppose there is a directory category for "balloons". The page's Title would contain the word "balloons" as would one or two other important parts of the page. So, if it has enough outbound links that fulfill the required criteria, and that go to pages about balloons, it will make a good expert page for "balloons". But, if it only contains one two instances of "birthday", and those instances are not in important parts of the page, it won't make a good expert page for "birthday balloons".
For other searchterms, I do see many top listings for pages that are linked to from the Google directory.
K_D wrote:
I believe google is detecting affiliated links.....
If that's the big Florida change, how would it account for a BBC page going from #800+ to #1 for "uk holidays" (it's now #2)? That's just an example of a vast number of pages leapfrogging their way up the serps because of Florida, many of which weren't even in the top 1000. If Google were spotting and removing pages that are associated with affiliate links, it would cause the pages that are left to bunch up and fill the empty spaces - not to leapfrog (e.g. not to shoot from #800+ to #1).
Thu Jan 15, 2004 2:01 am
DanThies
Member
Joined: Tue Jan 13, 2004 2:33 am Posts: 34
Directories may have some value, but most directory pages make lousy experts IMHO. Take a look at any randomly selected page from DMOZ. You will find broken links, sites that have been sold for their PageRank, etc.
Thu Jan 15, 2004 6:07 am
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
IMHO, directory pages do make good expert pages, simply because they are about specific topics and have sufficient outbound links that lead to on-topic pages. In that respect, I do think they generally make good experts for their specific topics.
In the Hilltop paper, Krishna Bharat wrote:
This allows us to distinguish between random collections of links and resource directories.
He thought that resource directories are good 'experts', and the directory pages we are talking about are surely resource directories. That extract is taken from a part of the paper where he was discussing the idea of having broad topic classifications - of the kind that CIRCA could provide:-
In the Hilltop paper, Krishna Bharat wrote:
If a broad classification (such as Arts, Science, Sports etc.) is known for every page in the search engine database then we can additionally require that most of the k non-affiliated URLs discovered in the previous step point to pages that share the same broad classification. This allows us to distinguish between random collections of links and resource directories.
But I don't think that directory pages make good experts for variations of their specific topics, such as "birthday balloons".
Dan, did you give any consideration as to why pages that return from the void, often return to the same or similar rankings to what they had before Florida? It could be coincidence, but it could also be a good clue as to what Florida is about.
Stemming means that more pages get into the results set, which would account for getting back to 'similar' rankings rather than to the same rankings. But I'm not thinking of stemming. I'm thinking that the old algo is in there, ranking the returning pages somewhere near to where they were before. It's almost as though the results set is compiled in a new way, and then the old algo is applied to them. Red herring?
Thu Jan 15, 2004 2:05 pm
K_D
Member
Joined: Wed Jan 14, 2004 4:29 pm Posts: 5
Affiliated links is definitely not the only change but I do think that it might be a significant part of all this.
The BBC page is gaining its PR and relevant links from internal links, which seem to be immune to the affiliated links negative and obviously there are many other factors, but it would be great if we can figure out if it's a major theme.
Does anyone have the pre-florida results for UK Holidays?
The top sites that I checked had little or no reciprocal links, I would love to determine if that was a major commonality or if affiliated links aren't an important factor in Florida.
Thu Jan 15, 2004 3:15 pm
K_D
Member
Joined: Wed Jan 14, 2004 4:29 pm Posts: 5
BTW, if affiliated links are a factor, google not finding them and/or google adjusting the threshold can account for jumping back to similiar positioning.
Around 80% of the pre-Florida listings disappeared in the post-Florida results. That's an awful lot of affiliate link related pages ;)
The idea of reciprocal links not being effective any more has been put forward before. The idea being that they are specifically detected and downgraded at least.
Thu Jan 15, 2004 3:59 pm
I, Brian
Full member
Joined: Sat Dec 06, 2003 9:20 am Posts: 328 Location: Yorkshire, UK
No affiliate links here, yet my non-commerical sites were variably hit.
As for .gov .edu and directories - it's not just the expert pages, remember, but "authority sites". Dierctories wouldn't simply be experts - they would be authorities.
Hilltop may be old - and maybe we're not seeing the exact same feature - but there's a lot of what is happening now reflecting something of the Hilltop system.
However, it must be said, that even though I favour a view of Hilltop being involved, I see it as one of a handle of complicating new factors. I remember writing in John Scott's IMR a somewhat naive commentary about Google trying tio understand the semantics of a page - that was before I even knew what Applied Semantics was.
Whatever Florida was, I'll guess at multiple factors: Hilltop, stemming, and even a grammar-based semanticcs system, seem suggested to myself - in some shape of form - but the effects seem diluted in current searches.
Last edited by I, Brian on Thu Jan 15, 2004 9:36 pm, edited 1 time in total.
Thu Jan 15, 2004 6:58 pm
SEO-Guy
Member
Joined: Sun Dec 07, 2003 3:07 pm Posts: 5 Location: Canada
Hi All,
Dan ... love the report. Well done! TSPR is the leading contender as far as I'm concerned.
Phil ... your insights are always great.
The one thing I'm suprised nobody has touched on is the fact that it now seems that links to the sites themselves, rather than to the specific optimized page within the site, are more important. I'm seeing literally hundreds of examples of sites with no particular expertise in the subject rank extremely high with one page that makes only a passing reference to the term. If this is true .. this is HUGE!
To me this suggests that Google is now considering the site itself the 'authority' (based probably on links from appropriately clustered sites ie. TSPR), and not the page. The implication is that sites with numbers of high inbound links to their main pages convey this new found authority on subpages of their sites ... thus I'm seeing lots of companies with pages hosted with their isp, and using a folder within their isp's domain (rather than their own) rank higher than much more approappriate sites (perform searches for 'toronto personal injury lawyers', and 'electronic muscle stimulator' and you'll see a few examples).
This would appear to be supported by our own client base. Those of our clients who are industry leaders (ie. lead their industries in terms of the number of inbound links they have ... and of course from relevant sources), stayed the course with respect to placement within the serps. Most others appear to have lost some of their placement to sites with no particular authority on the subject, but instead had a page that did make a passing reference to the subject, and had a root page with many many inbound links.
Thoughts?
Jeff
_________________ Jeff Q
SearchEnginePeople.com
Thu Jan 15, 2004 8:39 pm
K_D
Member
Joined: Wed Jan 14, 2004 4:29 pm Posts: 5
I meant affiliated as in cross-linked, not as in member of an affiliate program.
I was wrong in my alt-theory, though.
I dug deeper, specifically in UK holidays, so thanks, Phil.
Thu Jan 15, 2004 9:34 pm
K_D
Member
Joined: Wed Jan 14, 2004 4:29 pm Posts: 5
Jeff,
Could be that or could be that those are the people who aren't overoptimizing (in whatever the now-killer method is).
Thu Jan 15, 2004 9:46 pm
PhilC
Founder
Joined: Thu Nov 21, 2002 1:22 am Posts: 11147
Here's a nice little bit of info, courtesy of Dan Thies:-
Some people were saying that they changed page Titles from something like "word keyword keyword" to "keyword keyword word" and, after the pages had been crawled, they were dropped from the Florida results. They put it down to an optimization filter. So they changed the Titles back and, after being crawled again, the pages returned.
So Dan ran a couple of tests and found the same thing but, instead of changing the Titles back, he waited and, on each test, the pages returned to the Florida serps a week or two later all by themselves. The other people simply hadn't waited to see what would happen.
The 'possibility' is that when a page is changed and then crawled, Google recognizes that it has been changed and removes its TSPR set, and it drops out of the Florida serps. Sometime after that, a new TSPR set is calculated for the page, and it returns to the serps.
It's not the only explanation but it's a very tidy one. The testing has been extremely limited so it isn't anything like conclusive that changed pages drop from the serps.
If changed pages are dropped from the Florida serps until something is recalcuated for them, I would find it difficult to fit that into an Expert System. Changed pages could certainly be flagged as 'new' when any change is detected (by checksum or whatever), but I see no reason to do it in an expert system. With TSPR, it would almost be essential.
I'm going to do some tests. If anyone else does any, please let us know what you find.
We need to know if all changed pages are dropped from the Florida results, regardless of whether or not they come back later on. We also need to know if changes made to specific parts of a page cause the drop, or if a change anywhere in the page will cause it. Any tests along those lines will be very useful.
I have to say that, if altered pages are dropped from the Florida results, I would have to lean towards TSPR as being the most likely theory to be correct.
Fri Jan 16, 2004 2:18 am
DanThies
Member
Joined: Tue Jan 13, 2004 2:33 am Posts: 34
I see this thread is getting a lot of views. I would caution everyone following this discussion that we don't know anything.
The sensible thing to do about Google is stick to what we know works. For me, that means battling for relevance instead of rankings.
If you just have to "take action," take the top 100 results for that keyword you covet so much, pick the quality sites out of that bunch, make sure your site can stand among them, then get linked up with them.
In many ways, it really doesn't matter "what" they're doing. I am not really interested in trying to reverse engineer Google. Fortunately, I suspect Google has finally put reverse engineering out of reach anyway.
Users browsing this forum: No registered users and 0 guests
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum