#1 2007-02-06 08:06:17
- TheGypsy
- Member

- From: Where U least expect me
- Registered: 2006-07-12
- Posts: 2383
- I've been thanked 73 times.
-

It's a Phrase Based Search world
OK Gang.. Got yer thinking Caps on??? Good, cause yer gonna need them…
I have been a fan of relevance for some time as far as SEO activities go (on page and off page). A year ago it was Latent Semantic Analysis and Google’s LSI . Then a strange thing happened. I started seeing more and more Google Patents on ‘Phrase Based Indexing and Retrieval’ ( I lovingly call PaIR). The more they put out, the more I paid attention.
Good bye LSI and Hello PaIR!
You see, the LSI model is quite limited and so it makes complete sense to move to the PaIR model which is far more comprehensive and flexible than what LSA can accomplish.
ENTER PaIR -
is to identify valid (actual/real) phrases in a given document collection (or web pages in our case). The goal being to classifying each potential phrase as either “a good phrase or a bad phrase” depending on it’s usage and frequency; then using those ‘good’ phrases in predicting the usage of other ‘good phrases’ in the collection of web pages.
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. A spam document is identified based on the number of related phrases included in a document.
or even better it is used -- To identify phrases that have sufficiently frequent and/or distinguished usage in the document collection to indicate that they are "valid" or "good" phrases
Not only should it lead to better results, but the added layer of Spam detection should make life much harder on Spammers (Markov needs to start working out).
Ok, who’s ready for some heavy lifting?
Originally I wrote ‘ Phrase Based Optimization ’ – this is the more technical analysis and may not be for everyone.
Next was ‘ Phrase Based Indexing and Retrieval ’ – which breaks down what it means to us in some simpler terms.
Recently I covered ‘ Spam Detection in a Phrase Based Indexing and Retrieval system ’ – which highlights some ways of using PaIR for Spam detection purposes….
Strangely, I stumbled across some evidence of PaIR in the recent demolition of GoogleBombs – Our pal here Jaan ( IncredibleHelp ) posted some info as wellwhich led me to Stoney (DeGetyr) and we’re following up my little find….
So there U have it.. some of the latest goodies on the radar…
Last edited by TheGypsy (2007-02-06 08:09:38)
David - PRR10 -
Affordable SEO services | Internet Business Development | Custom Web site design | Learn SEO | SEO Blog
Offline
#2 2007-02-06 08:12:33
- Ryan_steyn
- Member

- From: South Africa, Port Elizabeth
- Registered: 2006-08-23
- Posts: 1665
- I've been thanked 25 times.
-
Re: It's a Phrase Based Search world
Originally I wrote ‘ Phrase Based Optimization ’ – this is the more technical analysis and may not be for everyone.
Next was ‘ Phrase Based Indexing and Retrieval ’ – which breaks down what it means to us in some simpler terms.
k, so im gonna study these two, print em, and remember them to the word.
Strangely, I stumbled across some evidence of PaIR in the recent demolition of GoogleBombs – Our pal here Jaan ( IncredibleHelp ) posted some info as wellwhich led me to Stoney (DeGetyr) and we’re following up my little find…
Would this be useful in context to the previous two? and in such be worthwhile memorising with the above or would they be classified as still up for debate?
Lalibela Game Reserve in malaria free South Africa
"Humans are by far the most fascinating creatures, in a universe with no boundaries and a world with so much unfound wonder we are the only entities capable of creating boredom"
Offline
#3 2007-02-06 08:20:27
- TheGypsy
- Member

- From: Where U least expect me
- Registered: 2006-07-12
- Posts: 2383
- I've been thanked 73 times.
-

Re: It's a Phrase Based Search world
Ryan_steyn wrote:
Originally I wrote ‘ Phrase Based Optimization ’ – this is the more technical analysis and may not be for everyone.
Next was ‘ Phrase Based Indexing and Retrieval ’ – which breaks down what it means to us in some simpler terms.k, so im gonna study these two, print em, and remember them to the word.
Strangely, I stumbled across some evidence of PaIR in the recent demolition of GoogleBombs – Our pal here Jaan ( IncredibleHelp ) posted some info as wellwhich led me to Stoney (DeGetyr) and we’re following up my little find…
Would this be useful in context to the previous two? and in such be worthwhile memorising with the above or would they be classified as still up for debate?
The last one is simply some 'potential' evidence of a PaIR system being involved in the recent GoogleBomb defusing that's all.. a little conspiracy theory....
David - PRR10 -
Affordable SEO services | Internet Business Development | Custom Web site design | Learn SEO | SEO Blog
Offline
#4 2007-02-06 08:26:14
- Ryan_steyn
- Member

- From: South Africa, Port Elizabeth
- Registered: 2006-08-23
- Posts: 1665
- I've been thanked 25 times.
-
Re: It's a Phrase Based Search world
Conspiracy, i like conspiracies....
eg.... why is it that a man must go through the effort of lifting the toilet seat and putting it down, where does meeting half way come in? surely the effort taken to lift it is equal to (if not less than) the effort to drop it, so why to woman get to complain about it being up while men are frowned upon when complaining about it being down? Now thats a conspiracy 
Lalibela Game Reserve in malaria free South Africa
"Humans are by far the most fascinating creatures, in a universe with no boundaries and a world with so much unfound wonder we are the only entities capable of creating boredom"
Offline
#5 2007-02-06 08:26:32
- jmac
- Member
- Registered: 2006-08-19
- Posts: 82
- I've been thanked 1 times.
-
Re: It's a Phrase Based Search world
It makes a lot of sense when combined with the Scottish Web Design experiment. However, it does muddy the waters for the beginners like me. Whilst writing 'less than stuffed' content is what we've learned, what I'm now learning is that we need to broaden the scope to related items whilst maintaining the main topic. I think.... or did I get it wrong again.
lol
Offline
#6 2007-02-06 08:27:37
- jmac
- Member
- Registered: 2006-08-19
- Posts: 82
- I've been thanked 1 times.
-
Re: It's a Phrase Based Search world
Ryan_steyn wrote:
Conspiracy, i like conspiracies....
eg.... why is it that a man must go through the effort of lifting the toilet seat and putting it down, where does meeting half way come in? surely the effort taken to lift it is equal to (if not less than) the effort to drop it, so why to woman get to complain about it being up while men are frowned upon when complaining about it being down? Now thats a conspiracy
Just learn to sit - it'll save you having to clean up when you get much older!!! 
Offline
#7 2007-02-06 08:41:18
- Ryan_steyn
- Member

- From: South Africa, Port Elizabeth
- Registered: 2006-08-23
- Posts: 1665
- I've been thanked 25 times.
-
Re: It's a Phrase Based Search world
LOL
oh my word, what is the world coming to when the hardest i laugh is on this site...
Lalibela Game Reserve in malaria free South Africa
"Humans are by far the most fascinating creatures, in a universe with no boundaries and a world with so much unfound wonder we are the only entities capable of creating boredom"
Offline
#8 2007-02-06 08:41:27
- TheGypsy
- Member

- From: Where U least expect me
- Registered: 2006-07-12
- Posts: 2383
- I've been thanked 73 times.
-

Re: It's a Phrase Based Search world
jmac wrote:
It makes a lot of sense when combined with the Scottish Web Design experiment. However, it does muddy the waters for the beginners like me. Whilst writing 'less than stuffed' content is what we've learned, what I'm now learning is that we need to broaden the scope to related items whilst maintaining the main topic. I think.... or did I get it wrong again.
lol
Think of it this way.....
Over a given document or set of documents ( web pages and web sites) there will be a statistical average for usage of related phrases to a given term.
Now when it indexes your page it compares it to this statistical average of related phrase usage.
For example
President of the United States
it would also expect to find things such as
George Bush
Bill Clinton
the White House
Democratic Party
and so on..
From this AVERAGE ( of topical documents in the Index) a threshold/baseline is set.
YOUR document (web page) is weighted and scored against the basline. It is also compared to the thresholds ( too high or low can be flagged as Spam)
APPARENTLY - the average site has between 8-20 'good phrases' whereas a Spam document(s) can have between 100-1000 'good phrases'
Is that getting U somewhere?
Last edited by TheGypsy (2007-02-06 08:42:29)
David - PRR10 -
Affordable SEO services | Internet Business Development | Custom Web site design | Learn SEO | SEO Blog
Offline
#9 2007-02-06 08:51:25
- Ryan_steyn
- Member

- From: South Africa, Port Elizabeth
- Registered: 2006-08-23
- Posts: 1665
- I've been thanked 25 times.
-
Re: It's a Phrase Based Search world
APPARENTLY - the average site has between 8-20 'good phrases' whereas a Spam document(s) can have between 100-1000 'good phrases'
Now is that per page or website? I can see that confusion getting messy 
Lalibela Game Reserve in malaria free South Africa
"Humans are by far the most fascinating creatures, in a universe with no boundaries and a world with so much unfound wonder we are the only entities capable of creating boredom"
Offline
#10 2007-02-06 08:55:39
- TheGypsy
- Member

- From: Where U least expect me
- Registered: 2006-07-12
- Posts: 2383
- I've been thanked 73 times.
-

Re: It's a Phrase Based Search world
Ryan_steyn wrote:
APPARENTLY - the average site has between 8-20 'good phrases' whereas a Spam document(s) can have between 100-1000 'good phrases'
Now is that per page or website? I can see that confusion getting messy
BOTH.
IF YOU READ the articles I posted ( kinda why I did eh)... it looks at documents as well as sets of documents (remember there are PDFs and other documents on hte web than Web Pages).
but YES, it looks at the aggregate as well as the individual documents
David - PRR10 -
Affordable SEO services | Internet Business Development | Custom Web site design | Learn SEO | SEO Blog
Offline
#11 2007-02-08 02:01:34
- Ryan_steyn
- Member

- From: South Africa, Port Elizabeth
- Registered: 2006-08-23
- Posts: 1665
- I've been thanked 25 times.
-
Re: It's a Phrase Based Search world
yeah yeah... im still getting to the reading
... no time yet... but when i do, i will know
me thinks
Lalibela Game Reserve in malaria free South Africa
"Humans are by far the most fascinating creatures, in a universe with no boundaries and a world with so much unfound wonder we are the only entities capable of creating boredom"
Offline
#12 2007-02-08 02:45:41
- nybc
- Member
- Registered: 2006-12-15
- Posts: 162
- I've been thanked 6 times.
-
Re: It's a Phrase Based Search world
I understood that LSI influenced the SERP of my site, while PaIR can identify me as a spammer, right?
This brings up one question for me: how much filtering could a visitor take?
Last edited by nybc (2007-02-08 02:50:41)
Offline
#13 2007-02-08 07:40:13
- TheGypsy
- Member

- From: Where U least expect me
- Registered: 2006-07-12
- Posts: 2383
- I've been thanked 73 times.
-

Re: It's a Phrase Based Search world
Not really.
LSI is a technology Google picked up in 2003 for use with AdWords/Adense programs. It is what assists it in placing RELEVANT Ads on the site and so forth. There were grumblings that G was using it in the Organic SERPs as well.
It is though a limited approach to IR (indexing and retrieval).
ENTER – PaIR.
In all likelihood, LSI never made it into the system IMO. It seems FAR more likely (there are NO Semantic based patents for example) at this point that some type of PaIR based model has been layered onto the infrastructure for indexing and ranking ‘relevance’ factors.
So, PaIR is likely at work NOT LSI.
The related patents deal with everythin from ranking to duplicate and Spam document identification and filtering. The PaIR model can do FAR more than a LSI model can…. Thus my growing interest ( not to mention 5+ Google Patents on it in the last year.. that kinda peaked my interest as well)
nybc wrote:
This brings up one question for me: how much filtering could a visitor take?
No comprende me amigo?
Last edited by TheGypsy (2007-02-08 07:41:12)
David - PRR10 -
Affordable SEO services | Internet Business Development | Custom Web site design | Learn SEO | SEO Blog
Offline
#14 2007-02-08 12:03:20
- atwhatcost
- Member

- From: Philadelphia, PA
- Registered: 2004-10-20
- Posts: 677
- I've been thanked 11 times.
-

Re: It's a Phrase Based Search world
OK, so my brain is back well enough to take on investigating your links, Dave. Now that I have, got a few questions --
1.) SEO was done (you might want to see my last question, since your answer will indicates if this is even a relevent question anymore) by setting up our sites to fit what the SEs consider important – good domain names, page names, KW phrase optimizing, meta tags, Internal links, BLs verses RL, headers and alt images – as well as, setting it up to be user friendly. On the grand scheme of things, is there a relative hierarchy for all these elements, or are all these things too important to decide which is/are the most important to SE? If there is a relative and necessary hierarchy, what is it?
2.) What’s anchor phrase and/or anchor text?
3.) Gypsy wrote (in his second article mentioned):
“What’s a ‘Good Phrase’? The classification for possible phrases as either a good phrase or a bad phrase is when the possible phrase; ‘appears in a minimum number of documents, and appear a minimum number of instances in the document collection’. What that number is, we don’t know. Those are the ‘dials’ the Search Gods themselves only have access to. It is almost looking at a Phrase Density over the aggregate of documents (the web site). Also, a BAD phrase is not one with dirty words; it is simply a phrase with too low a frequency count to make the ‘good’ list.”
(I might be panicking prematurely, but, well, can’t help it!) Ah, man, this stinks! Websites generally are about some central point. By the very nature of content, or, at least, relevant content, some phrases are going to come up repeatedly! If they’re terms that produce no synonyms from a thesaurus, apparently we aren’t supposed to mention the main subject matter?! All this time, I’ve been trying to optimize to filter my sites away from the commonplace merchants. I apparently can’t compete with them, and won’t even show up without them. Do we now disregard all the stuff we learned about optimizing – kill as many of the KW phrases as possible, particularly in alt tags, headers, domain names, and even in the content? My site (which is about to be relocated to a new domain name at another web host) has 10 pages and a link, therefore between the title on each page and the pages’ menu, I’ve already topped out to the 8-20 “good phrase.” Add to that my main KW phrase, both in content and in image names, now easily goes well past 100! Am I missing something, or did Google just create an entirely different IR (Indexing and Retrieval) formula?! 
Lynn -- Teddy Bear sites --
http://spauldingtbear.tripod.com/spauld … index.html
http://spauldingtbear.bravejournal.com
and the related Web Ring, http://g.webring.com/hub?ring=teddybeardens
Offline
#15 2007-02-08 13:41:23
- TheGypsy
- Member

- From: Where U least expect me
- Registered: 2006-07-12
- Posts: 2383
- I've been thanked 73 times.
-

Re: It's a Phrase Based Search world
New formula? Not at all.
To begin with…. We must always preface any discussions on Patents with the ol caveat that folks don’t necessarily USE that which they apply for.
That said, there sure are a lot of PaIR patents.. and a major lack of LSI related Organic Search work.. to make one wonder if this is related to their methods of judging relevance in the weighting/ranking process
Ok… so……
1. is there a relative hierarchy for all these elements, or are all these things too important to decide which is/are the most important to SE?
It’s not that some areas of technical discussions such as these DO NOT fit into some hierarchy, it’s more that they are all encompassing. That is to say that a PaIR system is used at indexing as well as retrieval. It is used to weight and rank as well as detection Spam and duplicate content. It can be tailored to Organic Search or to Personalized search ( latest G wahoo)
So where we would have methodologies ( KW research, On page optimization, Link building) to satisfy the ‘love’ the SEs are looking for – a concept such as PaIR is the sum of those parts.
One would adapt ALL areas of their optimization to make the most use of the technology/methodology.
2. That is the actual TEXT in the link (inbound or outbound). Ie; like out forum Sig.
3. I believe part of your problem is “I’ve already topped out to the 8-20 “good phrase.”” – YOU have decided what is and isn’t a ‘good phrase’ – that would be calculated on common phrases of other sites already indexed. It is creating an expectation of occurrences, based on the AVERAGE (basically) of the inferences on other known and trusted sites/pages.
or did Google just create an entirely different IR (Indexing and Retrieval) formula?!
Yes and no. ha ha ha ha. They are developing a new technology. In all likelihood it would merely be layered over the existing infrastructure (remember Big Daddy?) and then they can slowly ‘play with the dials’ as to bring it online in concert with other methods already in place. Not ‘throwing out the baby with the bath water’ if ya know what I mean….
How can it be utilized in everyday SEO… many, many ways… I am starting some writings on KW/Phrase research and working my way up… ( Charles has the Whip ).
..but does that clear anything up??? Please? … or more 10 page emails are in the offing? I can take it ya know….
David - PRR10 -
Affordable SEO services | Internet Business Development | Custom Web site design | Learn SEO | SEO Blog
Offline
#16 2007-02-10 08:53:31
- TheGypsy
- Member

- From: Where U least expect me
- Registered: 2006-07-12
- Posts: 2383
- I've been thanked 73 times.
-

Re: It's a Phrase Based Search world
Well hey folks... seems they FINALLY started thinking about PaIR over at WebMaster World.... the first thread on it just peek out.. yesterday..
So, how's it feel to be ' ahead of the curve'??
He he
David - PRR10 -
Affordable SEO services | Internet Business Development | Custom Web site design | Learn SEO | SEO Blog
Offline
#17 2007-02-10 09:28:47
- DMX
- Member

- From: Southwest, U.S.
- Registered: 2006-10-23
- Posts: 378
- I've been thanked 1 times.
-
Re: It's a Phrase Based Search world
Well written and interesting. Thanks for posting them.
Odd coincidence that google and others would be refining the way documents are sorted or indexed for retrieval using roughly the same information they would to determine if it was duplicate content elsewhere, making a number of these threads a different facet of the same mechanism..
How clever. We're pretty well being "forced" to write original, interesting and unique content. Hate that.
Offline
#18 2007-02-10 10:47:18
- TheGypsy
- Member

- From: Where U least expect me
- Registered: 2006-07-12
- Posts: 2383
- I've been thanked 73 times.
-

Re: It's a Phrase Based Search world
Well, the main problem folks are having – is the ‘how-where’- . It is NOT replacing the existing infrastructure – it is in concert with existing algorithmic IR methods. Certainly, over time, as the dials are turned in PaIRs favor, the other IR methods could be devalued.
Almost like the ‘onion’ analogy – in reverse – adding layers.
DMX wrote:
Odd coincidence that google and others would be refining the way documents are sorted or indexed for retrieval using roughly the same information they would to determine if it was duplicate content elsewhere, making a number of these threads a different facet of the same mechanism..
How clever. We're pretty well being "forced" to write original, interesting and unique content. Hate that.
PaIR is a very comprehensive methodology which can deal with a great many areas of the IR process including (covered in the Patents);
The documents deal with;
Indexing
Weighting Ranking
Duplicate content
Spam detection and weighting (yes - not all spam is bad?)
Back Links/Link Profiles
Personalized Search
And much more.
As far as being ‘forced to write original content’ – I fishing hope so. One of my FAVORITE parts of all this is I can see it being a more effective method than is currently in place for searchers, webmasters and SEOs alike. IMO tho….
David - PRR10 -
Affordable SEO services | Internet Business Development | Custom Web site design | Learn SEO | SEO Blog
Offline
#19 2007-02-10 14:38:10
- atwhatcost
- Member

- From: Philadelphia, PA
- Registered: 2004-10-20
- Posts: 677
- I've been thanked 11 times.
-

Re: It's a Phrase Based Search world
First off, I swear it was no longer then 8 pages – not 10! 
Second, hate to say this, but I think I’m more lost, instead of less. Sure, the caveat is applicable – "just because they have all those patents…" – BUT then again, since they have all those patents, it, at the very least, implies a hunkering down on the whole concept. They bought a freaking company for those LSI concepts, and yet there are more patents for PaIRing. Seems to me that has to mean Google is serious about developing this PaIR concept. Also, seems to me, you think the same thing, or you wouldn’t be trying so hard to get this concept across to us.
Before getting into the crux of my dilemma, I need to answer,
I believe part of your problem is “I’ve already topped out to the 8-20 “good phrase.”” – YOU have decided what is and isn’t a ‘good phrase’ – that would be calculated on common phrases of other sites already indexed. It is creating an expectation of occurrences, based on the AVERAGE (basically) of the inferences on other known and trusted sites/pages.
What is “It,” (first word in the last sentence), my assumption or this new thing, PaIR?
Yes, I’m guessing on my phrase being considered a “good phrase,” but, hey now, got any better phrase for websites about anything to do with teddy bears that doesn’t include the obvious “teddy bear” phrase? Then again, that could be my downfall – I don’t yet understand the concept of “good phrase” as much as I think I do. How lost am I?
That being said, my dilemma – the “good phrase” patent, ‘appears in a minimum number of documents, and appear a minimum number of instances in the document collection’ is the opposite of KW phrase optimizing. Unless I’ve totally missed the implications of that patent (which I have no problem believing I have), to put that up along side the KW phrase optimizing, is mathematically the same as multiplying the entire answer for an equation with zero – it cancels out the entire algorithm that it had before, bringing back no results! Seems to me something is vastly wrong with that patent for good phrases, because, when someone is searching for a particular type of site online, we want maximum, not minimum, results. Realizing that Google definitely is NOT dumb, what am I missing?!
Lynn -- Teddy Bear sites --
http://spauldingtbear.tripod.com/spauld … index.html
http://spauldingtbear.bravejournal.com
and the related Web Ring, http://g.webring.com/hub?ring=teddybeardens
Offline
#20 2007-02-10 14:43:07
- atwhatcost
- Member

- From: Philadelphia, PA
- Registered: 2004-10-20
- Posts: 677
- I've been thanked 11 times.
-

Re: It's a Phrase Based Search world
How clever. We're pretty well being "forced" to write original, interesting and unique content. Hate that.
DMX, actually, might not be as hard as you think. I graduated college almost 30 years ago with a degree in communications. With that (and needing marketing for my own business), I tend to observe marketing trends. One trend that’s been going downhill quickly over the last 30 years is advertisement. Back in the olden days, when I was young, commercials and print ads actually told benefits of the product. Now, if they even mention what product they’re selling, it’s just shown with a lot of glitz and glamour, or tough looking jocks performing with the “wonderful” catch all phrase, that just almost forces me to leap out of my chair, “Just Do It” 
It’s not just sneaker companies either. When was the last time you were told some wonderful benefit to a mid-size car (“everybody is buying it” does not qualify as a benefit), or, yeesh, even what the heck that perfume smells like? (OK, I’ll buy a small bottle of White Diamonds, if it comes with that pair of earrings Liz Taylor is giving away for luck!) Want to give “original” content to sell your products online? Just find out “original” content used 40-50 years ago! Good solid copy that actually gives consumers’ benefits! All you need to do is update it, maybe.
Lynn -- Teddy Bear sites --
http://spauldingtbear.tripod.com/spauld … index.html
http://spauldingtbear.bravejournal.com
and the related Web Ring, http://g.webring.com/hub?ring=teddybeardens
Offline
| Never |
- Sponsored Results
|
|