Home | Solutions | Advertise | Web Traffic | E-Marketing | Specials | Learning | Prices | Virus Removal |
PageRank Explained Not long ago, there was just one well-known PageRank Explained paper, to which most interested people referred when trying to understand the way that PageRank works. In fact, I used it myself. But when I was writing the PageRank Calculator, I realized that the original paper was misleading in the way that the calculations were done. It uses its own form of PageRank, which the author calls "mini-rank". Mini-rank changes Google's PageRank equation for no apparent reason, making the results of the calculations very misleading. Even though the author abandoned mini-rank as a result of this and another paper, the original, unchanged paper is still available on the web. So if you come across a PageRank Explained paper that uses "mini-rank", it has been superseded and is best ignored. What is PageRank? PageRank is Google's way of deciding a page's importance. It matters because it is one of the factors that determine a page's ranking in the search results. It isn't the only factor that Google uses to rank pages, but it is an important one. From here on in, we'll occasionally refer to PageRank as "PR". Notes How is PageRank calculated? PR(A) = (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn)) That's the equation that calculates a page's PageRank. It's the original one that was published when PageRank was being developed, and it is probable that Google uses a variation of it but they aren't telling us what it is. It doesn't matter though, as this equation is good enough. In the equation 't1 - tn' are pages linking to page A, 'C' is the number of outbound links that a page has and 'd' is a damping factor, usually set to 0.85. We can think of it in a simpler way. A page's PR = 0.15 + 0.85 * (a "share" of the PR of every page that links to it) "share" = the linking page's PageRank divided by the number of outbound links on the page. A page "votes" an amount of PageRank onto each page that it links to. The amount of PageRank that it has to vote with is a little less than its own PageRank value (its own value * 0.85). This value is shared equally between all the pages that it links to. From this, we could conclude that a link from a page with PR4 and 5 outbound links are worth more than a link from a page with PR8 and 100 outbound links. The PageRank of a page that links to yours is important but the number of links on that page is also important. The more links there are on a page, the less PageRank value your page will receive from it. If the PR value differences between PR1, PR2, PR10 were equal then that conclusion would hold up, but many people believe that the values between PR1 and PR10 (the maximum) are set on a logarithmic scale, and there is very good reason for believing it. Nobody outside Google knows for sure one way or the other, but the chances are high that the scale is logarithmic, or similar. If so, it means that it takes a lot more additional PageRank for a page to move up to the next PageRank level that it did to move up from the previous PageRank level. The result is that it reverses the previous conclusion, so that a link from a PR8 page that has lots of outbound links is worth more than a link from a PR4 page that has only a few outbound links. Whichever scale Google uses, we can be sure of one thing a link from another site increases our site's PageRank. Just remember to avoid links from link farms. Note that when a page votes its PageRank value to other pages, its own PageRank is not reduced by the value that it is voting. The page doing the voting doesn't give away its PageRank and end up with nothing. It isn't a transfer of PageRank. It is simply a vote according to the page's PageRank value. It's like a shareholders meeting where each shareholder votes according to the number of shares held, but the shares themselves aren't given away. Even so, pages do lose some PageRank indirectly, as we'll see later. Ok so far? Good. Now we'll look at how the calculations are actually done. For a page's calculation, its existing PageRank (if it has any) is abandoned completely and a fresh calculation is done where the page relies solely on the PageRank "voted" for it by its current inbound links, which may have changed since the last time the page's PageRank was calculated. The equation shows clearly how a page's PageRank is arrived at. But what isn't immediately obvious is that it can't work if the calculation is done just once. Suppose we have 2 pages, A and B, which link to each other, and neither have any other links of any kind. This is what happens: Step 1: Calculate page A's PageRank from the value of its inbound links Step 2: Calculate page B's PageRank from the value of its inbound links Now that both pages have newly calculated PageRank values, can't we just run the calculations again to arrive at accurate values? No. We can run the calculations again using the new values and the results will be more accurate, but we will always be using inaccurate values for the calculations, so the results will always be inaccurate. The problem is overcome by repeating the calculations many times. Each time produces slightly more accurate values. In fact, total accuracy can never be achieved because the calculations are always based on inaccurate values. 40 to 50 iterations are sufficient to reach a point where any further iteration wouldn't produce enough of a change to the values to matter. This is precisely what Google does at each update, and it's the reason why the updates take so long. One thing to bear in mind is that the results we get from the calculations are proportions. The figures must then be set against a scale (known only to Google) to arrive at each page's actual PageRank. Even so, we can use the calculations to channel the PageRank within a site around its pages so that certain pages receive a higher proportion of it than others. NOTE · They quote the same, published equation - but then change it from PR(A) = (1-d) + d(......) to PR(A) = PR(A) + (1-d) + d(......) It isn't correct, and it isn't necessary. · We will be looking at how to organize links so that certain pages end up with a larger proportion of the PageRank than others. Adding to the page's existing PageRank through the iterations produces different proportions than when the equation is used as published. Since the addition is not a part of the published equation, the results are wrong and the proportioning isn't accurate. According to the published equation, the page being calculated starts from scratch at each iteration. It relies solely on its inbound links. The 'add to the existing PageRank' idea doesn't do that, so its results are necessarily wrong. Internal linking Fact: The maximum amount of PageRank in a site increases as the number of pages in the site increases. The more pages that a site has, the more PageRank it has. Again, by using a pencil and paper and the equation, you can come to the same conclusion. Bear in mind that the only pages that count are the ones that Google knows about. Fact: By linking poorly, it is possible to fail to reach the site's maximum PageRank, but it is not possible to exceed it. Poor internal linkages can cause a site to fall short of its maximum but no kind of internal link structure can cause a site to exceed it. The only way to increase the maximum is to add more inbound links and/or increase the number of pages in the site. Cautions: Whilst I thoroughly recommend creating and adding new pages to increase a site's total PageRank so that it can be channeled to specific pages, there are certain types of pages that should not be added. These are pages that are all identical or very nearly identical and are known as cookie-cutters. Google considers them to be spam and they can trigger an alarm that causes the pages, and possibly the entire site, to be penalized. Pages full of good content are a must. What can we do with this 'overall' PageRank? For the examples, we are going to ignore that fact, mainly because other 'PageRank Explained' type documents ignore it in the calculations, and it might be confusing when comparing documents. The calculator operates in two modes: Simple and Real. In Simple mode, the calculations assume that all pages are in the Google index, whether or not any other pages link to them. In Real mode the calculations disregard unlinked-to pages. These examples show the results as calculated in Simple mode. Let's consider a 3-page site (pages A, B and C) with no links coming in from the outside. We will allocate each page an initial PageRank of 1, although it makes no difference whether we start each page with 1, 0 or 99. Apart from a few millionths of a PageRank point, after much iteration the end result is always the same. Starting with 1 requires less iteration for the PageRanks to converge to a suitable result than when starting with 0 or any other number. The site's maximum PageRank is the amount of PageRank in the site. In this case, we have 3 pages so the site's maximum is 3. At the moment, none of the pages link to any other pages and none link to them. If you make the calculation once for each page, you'll find that each of them ends up with a PageRank of 0.15. No matter how much iteration you run, each page's PageRank remains at 0.15. The total PageRank in the site = 0.45, whereas it could be 3. The site is seriously wasting most of its potential PageRank. Example 1 Page A = 0.15 Page A has "voted" for page B and, as a result, page B's PageRank has increased. This is looking good for page B, but it's only iteration - we haven't taken account of the Catch 22 situation. Look at what happens to the figures after more iteration. After 100 iterations the figures are: Page A = 0.15 It still looks good for page B but nowhere near as good as it did. These figures are more realistic. The total PageRank in the site is now 0.5775 - slightly better but still only a fraction of what it could be. NOTE Example 2 Page A = 1 Now we've achieved the maximum. No matter how much iteration are run, each page always ends up with PR1. The same results occur by linking in a loop. E.g. A to B, B to C and C to D. This has demonstrated that, by poor linking, it is quite easy to waste PageRank and by good linking, we can achieve a site's full potential. But we don't particularly want all the site's pages to have an equal share. We want one or more pages to have a larger share at the expense of others. The kinds of pages that we might want to have the larger shares are the index page, hub pages and pages that are optimized for certain search terms. We have only 3 pages, so we'll channel the PageRank to the index page - page A. It will serve to show the idea of channelling. Example 3 Page A = 1.85 And after 100 iterations, the results are: Page A = 1.459459 In both cases the total PageRank in the site is 3 (the maximum) so none is being wasted. Also in both cases you can see that page A has a much larger proportion of the PageRank than the other 2 pages. This is because pages B and C are passing PageRank to A and not to any other pages. We have channeled a large proportion of the site's PageRank to where we wanted it. Example 4 Page A = 1.425 By comparison to the 1-iteration figures in the previous example, page A has lost some PageRank, page B has gained some and page C stayed the same. Page C now shares its "vote" between A and B. Previously A received all of it. That's why page A has lost out and why page B has gained and after 100 iterations: Page A = 1.298245 When the dust has settled, page C has lost a little PageRank because, having now shared its vote between A and B, instead of giving it all to A, A has less to give to C in the A-->C link. So adding an extra link from a page causes the page to lose PageRank indirectly if any of the pages that it links to return the link. If the pages that it links to don't return the link, then no PageRank loss would have occurred. To make it more complicated, if the link is returned even indirectly (via a page that links to a page that links to a page etc), the page will lose a little PageRank. This isn't really important with internal links, but it does matter when linking to pages outside the site. Example 5: new pages Let's add 3 new pages to Example 3. Three new pages but they don't do anything for us yet. The small increase in the Total, and the new pages' 0.15, are unrealistic as we shall see. So let's link them into the site. Link each of the new pages to the important page; page A. Notice that the Total PageRank has doubled, from 3 (without the new pages) to 6. Notice also that page A's PageRank has almost doubled. There is one thing wrong with this model. The new pages are orphans. They wouldn't get into Google's index, so they wouldn't add any PageRank to the site and they wouldn't pass any PageRank to page A. They each need to be linked to from at least one other page. If page A is the important page, the best page to put the links on is, surprisingly, page A. You can play around with the links but, from page A's point of view, there isn't a better place for them. It is not a good idea for one page to link to a large number of pages so, if you are adding many new pages, spread the links around. The chances are that there is more than one important page in a site, so it is usually suitable to spread the links to and from the new pages. You can use the calculator to experiment with mini-models of a site to find the best links that produce the best results for its important pages. Examples summary Inbound and Outbound links Since we are unlikely to get a definitive answer from Google, it is reasonable to assume that a page can cast only one vote for another page, and that additional votes for the same page are not counted. When a page links to itself, is the link counted? Dangling links A dangling link is a link to a page that has no links going from it, or a link to a page that Google hasn't indexed. In both cases Google removes the links shortly after the start of the calculations and reinstates them shortly before the calculations are finished. In this way, their effect on the PageRank of other pages in minimal. The results shown in Example 1 (right diag.) are wrong because page B has no links going from it, and so the link from page A to page B is dangling and would be removed from the calculations. The results of the calculations would show all three pages as having 0.15. It may suit site functionality to link to pages that have no links going from them without losing any PageRank from the other pages but it would be waste of potential PageRank. Take a look at this example. The site's potential is 5 because it has 5 pages, but without page E linked in, the site only has 4.15. Link page A to page E and notice that the site's total has gone down very significantly. But, because the new link is dangling and would be removed from the calculations, we can ignore the new total and assume the previous 4.15 to be true. That's the effect of functionally useful, dangling links in the site. There's no overall PageRank loss. However, some of the site's potential total is still being wasted, so link Page E back to Page A and now we have the maximum PageRank that is possible with 5 pages. Nothing is being wasted. Although it may be functionally good to link to pages within the site without those pages linking out again, it is bad for PR. It is pointless wasting PageRank unnecessarily, so always make sure that every page in the site links out to at least one other page in the site. Inbound links The linking page's PageRank is important, but so is the number of links going from that page. For instance, if you are the only link from a page that has a lowly PR2, you will receive an injection of 0.15 + 0.85(2/1) = 1.85 into your site, whereas a link from a PR8 page that has another 99 links from it will increase your site's PageRank by 0.15 + 0.85(7/100) = 0.2095. Clearly, the PR2 link is much better - or is it? Once the PR is injected into your site, the calculations are done again and each page's PageRank is changed. Depending on the internal link structure, some pages' PageRank is increased, some are unchanged but no pages lose any PR. It is beneficial to have the inbound links coming to the pages to which you are channeling your PageRank. A PageRank injection to any other page will be spread around the site through the internal links. The important pages will receive an increase, but not as much of an increase as when they are linked to directly. The page that receives the inbound link makes the biggest gain. It is easy to think of our site as being a small, self-contained network of pages. When we do the PageRank calculations we are dealing with our small network. If we make a link to another site, we lose some of our network's PageRank, and if we receive a link, our network's PageRank is added to. But it isn't like that. For the PageRank calculations, there is only one network - every page that Google has in its index. Iteration of the calculation is done on the entire network and not on individual websites. Because the entire network is interlinked, and every link and every page plays its part in iteration of the calculations, it is impossible for us to calculate the effect of inbound links to our site with any realistic accuracy. Outbound links When PR leaks from a site via a link to another site, all the pages in the internal link structure are affected. (This doesn't always show after just iteration). The page that you link out from makes a difference to which pages suffer the most loss. Without a program to perform the calculations on specific link structures, it is difficult to decide on the right page to link out from, but the generalization is to link from the one with the lowest PageRank. Many websites need to contain some outbound links that are nothing to do with PageRank. Unfortunately, all 'normal' outbound links leak PageRank. But there are 'abnormal' ways of linking to other sites that don't result in leaks. PageRank is leaked when Google recognizes a link to another site. The answer is to use links that Google doesn't recognize or count. These include form actions and links contained in JavaScript code. Form actions Example: To be really sneaky, the action attribute could be in some JavaScript code rather than in the form tag, and the JavaScript code could be loaded from a 'js' file stored in a directory that is barred to Google's spider by the robots.txt file. JavaScript Like the form action, it is sneaky to load the JavaScript code, which contains the urls, from a separate 'js' file, and sneakier still if the file is stored in a directory that is barred to googlebot by the robots.txt file. So how much additional PageRank do we need to move up the toolbar? The Google toolbar range is from 1 to 10. (They sometimes show 0, but that figure isn't believed to be a PageRank calculation result). What Google does is divide the full range of actual PR on the web into 10 parts - each part is represented by a value as shown in the toolbar. So the toolbar values only show what part of the overall range a page's PageRank is in, and not the actual PageRank itself. The numbers in the toolbar are just labels. Whether or not the overall range is divided into 10 equal parts is a matter for debate - Google aren't saying. But because it is much harder to move up a toolbar point at the higher end than it is at the lower end, many people (including me) believe that the divisions are based on a logarithmic scale, or something very similar, rather than the equal divisions of a linear scale. Let's assume that it is a logarithmic, base 10 scales, and that it takes 10 properly linked new pages to move a site's important page up 1 toolbar point. It will take 100 new pages to move it up another point, 1000 new pages to move it up one more, 10,000 to the next, and so on. That's why moving up at the lower end is much easier that at the higher end. In reality, the base is unlikely to be 10. Some people think it is around the 5 or 6 mark, and maybe even less. Even so, it still gets progressively harder to move up a toolbar point at the higher end of the scale. Note that as the number of pages on the web increases so does the total PageRank on the web, and as the total PageRank increases, the positions of the divisions in the overall scale must change. As a result, some pages drop a toolbar point for no 'apparent' reason. If the page's actual PageRank was only just above a division in the scale, the addition of new pages to the web would cause the division to move up slightly and the page would end up just below the division. Google's index is always increasing and they re-evaluate each of the pages on more or less a monthly basis. It's known as the "Google dance". When the dance is over, some pages will have dropped a toolbar point. A number of new pages might be all that is needed to get the point back after the next dance. The toolbar value is a good indicator of a page's PageRank but it only indicates that a page is in a certain range of the overall scale. One PR5 page could be just above the PR5 division and another PR5 page could be just below the PR6 division - almost a whole division (toolbar point) between them. Tips On Domain names and Filenames If you think about it, how can a spider know the filename of the page that it gets back when requesting www.domain.com? It can't. The filename could be index.html, index.htm, index.php, default.html, etc. The spider doesn't know. If you link to index.html within the site, the spider could compare the 2 pages but that seems unlikely. So they are 2 urls and each receives PageRank from inbound links. Standardizing the home page's URL ensures that the PR it is due isn't shared with ghost urls. When this article was first written, the non-www URL had PR4 due to using different versions of the link URLs within the site. It had the effect of sharing the page's PageRank between the 2 pages (the 2 versions) and, therefore, between the 2 sites. That's not the best way to do it. Since then, I've tidied up the internal linkages and got the non-www version down to PR1 so that the PageRank within the site mostly stays in the "www." version, but there must be a site somewhere that links to it without the "www." that's causing the PR1. Imagine the page, www.tarktech.com/index.htm. The index page contains links to several relative urls; e.g. solutions.htm and specials.htm. The spider sees those urls as www.tarktech.com/solutions.htm and www.tarktech.com/specials.htm. Now let's add an absolute url for another page, only this time we'll leave out the "www" part google.com/google.htm. Now this page links back to the index page so the spider sees the index pages as google.com/index.htm. Although it's the same index page as the first one, to a spider, it is a different page because it's on a different domain. Now look what happens. Each of the relative urls on the index page is also different because it belongs to the google.com Google. Consequently, the link structure is wasting a site's potential PageRank by spreading it between ghost pages. Adding new pages So, although adding new pages does increase the total PageRank within the site, some of the site's pages will lose PageRank as a result. The answer is to link new pages is such a way within the site that the important pages don't suffer, or add sufficient new pages to make up for the effect (that can sometimes mean adding a large number of new pages), or better still, get some more inbound links. Miscellaneous The Google toolbar It's important to know this so that you can avoid exchanging links with pages that really don't have any PageRank of their own. Before making exchanges, search for the page on Google to make sure that it is indexed. How do you know if your site has been indexd by google? Sub-directories ODP and Yahoo! Google spiders the directories just like any other site and their pages have decent PageRank and so they are good inbound links to have. In the case of the ODP, Google's directory is a copy of the ODP directory. Each time that sites are added and dropped from the ODP, they are added and dropped from Google's directory when they next update it. The entry in Google's directory is yet another good, PageRank boosting, inbound link. Also, the ODP data is used for searches on a myriad of websites - more inbound links! Listings in the ODP are free but, because sites are reviewed by hand, it can take quite a long time to get in. The sooner a working site is submitted, the better.
|
About Us | Services | Contact Us | Resources | FAQ | Books | Links | Reviews | Computers | Translation |
© Copyright 1996 - 2010. TarkTech Solutions. All Rights Reserved. |
This web site is best viewed in 1068 by 768 pixels |