Tuesday, February 10, 2009

How Search Engines Rank Web Pages

Search for anything using your favorite crawler-based search engine. Nearly instantly, the search engine will sort through the millions of pages it knows about and present you with ones that match your topic. The matches will even be ranked, so that the most relevant ones come first.

Of course, the search engine don't always get it right. Non-relevant pages make it through, and sometimes it may take a little more digging to find what you are looking for. But, by and large,PageRank and Beyond: The Science of Search Engine Rankings
As WebCrawler founder Brian Pinkerton puts it, "Imagine walking up to a librarian and saying, 'travel.' They’re going to look at you with a blank face."

OK -- a librarian's not really going to stare at you with a vacant expression. Instead, they're going to ask you questions to better understand what you are looking for.

Unfortunately,don't have the ability to ask a few questions to focus your search, as a librarian can. They also can't rely on judgment and past experience to rank web pages, in the way humans can.

So, how do crawler-based search engine go about determining relevancy, when confronted with hundreds of millions of web pages to sort through? They follow a set of rules, known as an algorithm. Exactly how a particular search engine's algorithm works is a closely-kept trade secret. However, all major search engines follow the general rules below.

Location, Location, Location...and Frequency
One of the main rules in a ranking algorithm involves the location and frequency of keywords on a web page. Call it the location/frequency method, for short.

Remember the librarian mentioned above? They need to find books to match your request of "travel," so it makes sense that they first look at books with travel in the title. Search engines operate the same way. Pages with the search terms appearing in the HTML title tag are often assumed to be more relevant than others to the topic.

search engine will also check to see if the search keywords appear near the top of a web page, such as in the headline or in the first few paragraphs of text. They assume that any page relevant to the topic will mention those words right from the beginning.

Frequency is the other major factor in how search engines determine relevancy. A search engine will analyze how often appear in relation to other words in a web page. Those with a higher frequency are often deemed more relevant than other web pages.

Spice In The Recipe
Now it's time to qualify the location/frequency method described above. All the major search engines follow it to some degree, in the same way cooks may follow a standard chili recipe. But cooks like to add their own secret ingredients. In the same way, search engines add spice to the location/frequency method. Nobody does it exactly the same, which is one reason why the same search on different search engines produces different results.

To begin with, some search engines index more web pages than others. Some search engines also index web pages more often than others. The result is that no search engine has the exact same collection of web pages to search through. That naturally produces differences, when comparing their results.

search engine may also penalize pages or exclude them from the index, if they detect search engine "spamming." An example is when a word is repeated hundreds of times on a page, to increase the frequency and propel the page higher in the listings. Search engines watch for common spamming methods in a variety of ways, including following up on complaints from their users.

Off The Page Factors
Crawler-based search engines have plenty of experience now with webmasters who constantly rewrite their web pages in an attempt to gain better rankings. Some sophisticated webmasters may even go to great lengths to "reverse engineer" the location/frequency systems used by a particular search engine. Because of this, all major search engines now also make use of "off the page" ranking criteria.

Off the page factors are those that a webmasters cannot easily influence. Chief among these is link analysis. By analyzing how pages link to each other, a search engine can both determine what a page is about and whether that page is deemed to be "important" and thus deserving of a ranking boost. In addition, sophisticated techniques are used to screen out attempts by webmasters to build "artificial" links designed to boost their rankings.

Another off the page factor
is click through measurement. In short, this means that a search engine may watch what results someone selects for a particular search, then eventually drop high-ranking pages that aren't attracting clicks, while promoting lower-ranking pages that do pull in visitors. As with link analysis, systems are used to compensate for artificial links generated by eager webmasters.

Friday, January 16, 2009

Six Basic SEO Tips for the 2009 New Year

I look at a lot of Web sites to determine what SEO opportunities exist for potential clients. There is such a huge disparity between the issues that I see, I thought it might benefit some people if we covered some basic SEO tips. And that way, if I do a review of your site next month at OMS, you will be able to explore more advanced issues and get more out of your experience.

I typically define search engine optimization issues in three categories: “on page”, “off page” and “site wide” elements. On page optimization refers to optimizing the physical elements of the page including textual content, heading tags, page titles, meta descriptions, meta keyword tags. Off page optimization refers to links, both internal and external. Site wide optimization refers to the technical issues that can affect the engines ability to index and rank you site which includes but is not limited to duplicate content issues, flash and java script issues, URL and file structure, redirect issues, etc.

So, here are my best basic SEO tips by category search engine optimization issues:

On Page

1) Unique titles and Meta descriptions that are keyword focused. It’s important to remember that any page of your site could be the first page a user sees on your site. Give them enough keyword focused information to understand the content of the page. Additionally, it’s good to understand from a search engine’s perspective, if you don’t have time to make these fields unique for a given page of your site, how important could that page be.

2) Don’t worry about keyword density. Here’s a hint, there is no magic number. If there was, the community of SEO geeks like me would discover it quickly because it’s an easy to metric to calculate. It’s more important to just make sure it’s in your content. A ratio of 1% to 8% is acceptable although 1% maybe a little low for competitive keywords. Anything over 8% usually begins to reek like spam and influence user experience. There are exceptions but in most cases, if every 10th word of your document is the same word or phrase, you could be spamming.

Off Page

3) Make your global navigation template is keyword focused. Obviously you don’t wanna get carried away and have links that span the whole page…lol. But remember that every link is a vote. Even internal links. And so, the links that appear on your global navigation serve as votes from each page of your site. So they tend to carry a fair amount of weight. Make sure that those link texts contain the primary keywords for the pages that they link to.

4) Homepage Logo Link. If you site is one of the 99% of sites that has a logo in the top left corner with a link that points back to your homepage, make sure that link is working for you by including a 4 to 7 word alt tag that is keyword focused.

Site Wide

5) Make sure you navigation links are being indexed. Go to Google and type cache:www.YourDomain.com . Then in the resulting page’s header, click on “Text-only version” in the top right corner. Can you see your navigation links on the resulting text page. If not, you may be coding them in Flash or Java script which is completely invalidating those hugely important links.

6) Don’t use 302 redirects. A lot of systems use these redirects by default. They cause major problems for search engines like Yahoo and MSN and at best are inconsistent in Google. You shouldn’t have any problems if you use a 301 redirect, especially for vanity domains that could cause potential duplicate content problems.

These are some basic tips that will get your search engine optimization campaign headed in the right direction for 2009. I hope you find some value in them and I look forward to seeing everyone next month here in San Diego!