tags:Google
Understanding Google’s Bigdaddy Rollout
February 20th, 2006, by Rich.
Several news and technical commentary sites have been pronouncing the arrival of Google’s Bigdaddy datacentre over the last few weeks, and looking at the stats for this site over the last few days I might be seeing it’s effect – hits from Google are up. After a little digging, I reckon the rollout is 50% complete.
Bigdaddy’s Calling Card
What these two pie chars show is that during a nine week period covering December and January, 22% of the readers who arrive at this site do so because they’ve searched in Google and seen a result that’s looked like it’s worth clicking on. On the right hand side, we see that during this last week a statistically significant change can be observed, with 62% of people arriving here from a Google search. That’s a big change, and the overall site traffic has increased as a result.
So how can I be sure this change is it’s related to the Bigdaddy infrastructure? Well, obviously I can’t be 100% positive, but here are the clues:
- The increase was first noticable during the first weekend in February, approximately when Bigdaddy was scheduled to bubble through to the public.
- I’m seeing the IP addresses of those data centres as “hostnames” in my google analytics stats, which means that my content is cached in those datacentres.
What's very interesting (for me) is Matt Cutts comment about the switchover.
I’d expect a new data center to be converted to Bigdaddy roughly every 10 days or so. The more data centers there are using Bigdaddy, the odds of you hitting a Bigdaddy data center in the normal rotation go up.
Supposedly the switchover started in January, so this is an ongoing process, but just how near to completion is it?
Investigating Bigdaddy
The IP addresses from the logs gave me a starting point and there’s a fair bit of information about Google’s datacentres on the web, so for each datacentre I that discovered, I looked at one machine to see if the it was running the new infrastructure (based on the “sf giants” query, which returns sfgiants.com as the top result on Big Daddy infrastructures, but mlb.com on the old system). From that I deduced whether the data centre in question was running the old or Big Daddy infrastructure.
| ID | IP checked | Feb 24th |
Mar 1st |
July 5th |
Location |
|---|---|---|---|---|---|
| OD | 64.233.161.104 | old | old | BD | Mt. View, California |
| RN | 64.233.171.104 | old | old | BD | Mt. View, California |
| HS | 64.233.179.104 | old | old | BD | Mt. View, California |
| __ | 64.233.189.104 | old | old | BD | Hong Kong, China * |
| __ | 66.249.93.104 | old (!) | BD | BD | Paris, Illinois * |
| VA | 216.239.37.104 | old | old | BD | Herndon, Virginia |
| DC | 216.239.39.104 | old | old | BD | Washington, DC |
| CW | 216.239.57.104 | old | BD | BD | Palo Alto, California |
| PY | 64.233.167.104 | BD | BD | BD | Mt. View, California |
| MC | 66.102.7.104 | BD | BD | BD | Santa Clara, California |
| __ | 66.249.87.104 | BD | BD | – | Munich, Germany * |
| LM | 66.102.9.104 | BD | BD | BD | Dublin, ireland |
| KR | 66.102.11.104 | BD | BD | BD | Dublin, Ireland |
| __ | 72.14.207.104 | BD | BD | BD | Mt. View, California |
| IN | 216.239.53.104 | BD | BD | BD | Santa Clara, California |
| GV | 216.239.59.104 | BD | BD | BD | Dublin, Ireland |
| __ | 216.239.63.104 | BD | BD | BD | Mt. View, California |
* Physical locations (where they were not stated by their sources) were guessed using HostIP.info.
Understanding Bigdaddy Progress
So, nine machines appear to be using the big daddy infrastructure and eight are using the old system. Looks like they’re about 50% of the way through. Theoretically then, if the surge in readers is based on the Big Daddy chanegover, I’m going to see an even larger increase in traffic during the next two months as the rest of Google switches over. Of course, there are mitigating factors such as those potential users who don’t speak English, so who won’t find anything I write remotely comprehensible, then there’s the fact that the stuff that gets on this site is culturally rather “western/european” so it may have already had it’s boost, but so far, Big Daddy seems to have had a positive effect on traffic.
Bigdaddy’s Legacy
Hopefully these new visitors will find the content here useful, ’cause if they don’t it’s bandwidth and time on both sides that’s being wasted. Only time (and Google Analytics) will tell if Bigdaddy is as good at quality as it is on quantity.


Is “72.14.207.104″ BigDaddy?
I believe so (based on it’s response to the “sf giants” search) – it’s in the table above and marked as such.
It must be more than a new datacenter, because the algorithm should produce the same results whatever. Or do you think Google’s indexed more of your site now?
I use Google sitemaps, so I know that every last ascii character that I want indexed gets the full treatment already. I’m also quite careful with ensuring that I have 301 redirects to the boakes.org domain in cases where there could be confusion – this ensures that the pagerank between boakes.org and http://www.boakes.org is all heaped on the former, which I prefer, for brevity.
Part of Bigdaddy appears to be a change in the way that some redirects are handled, so there are certainly algorithm changes as well as infrastructure changes in the new datacentre setup.
There are other things to consider – I installed google analytics last year, so google now has a far better understanding of how visitors to this site behave, what they click on, etc, so that might have helped. If insight from the first analytics customers is part of the Bigdaddy rollout then it would help to explain the increase in readership. Also, when the Jagger update occurred, google derived traffic did drop slightly, so this might be redressing the balance from that.
Of course, all I can do is observe and speculate.
Snap. I also use Sitemaps and 301 redirects (but to http://www.stephennewton.com). I tried Google Analytics, but I didn’t like it and so I dumped it. I’d be surprised if Analytics helps as Google’s always keen to deny favouring those who use its products.
True, this is a noble stance, but they’d be nuts not to use the data to help calibrate their ranking algorithm, ’cause it’s the most accurate reflection of user behaviour after the search result link has been clicked. My previously published musings on Analytics.
Google Big Daddy Update…
The rise in referral traffic from Google seems to have stabilized, so I’ve re-run the analysis from last week and updated the big daddy status list to reflect the current status of the various datacentres; two of which appear to have switched ove…
Rich,
A question about your redirects. But first, thanks for the info on BD.
my site: chrisheath.us shows the same with or without the ‘www’
I tried adding a .htaccess file with the line: redirect 301 http://www.chrisheath.us http://chrisheath.us
that didn’t seem to work.
I’m on a virtual server (vps) and run apache. Any ideas? Or is there a redirect already going on since you get the same page for both?
TYIA,
Chris Heath
Hi Chris, the easy way to check for redirects is to connect directly to your server, speak HTTP at it, and see what comes out.
For some reason when doing this, I always get the mental image of leaning over the bar and suckling directly from the beertap rather than having the barman pour me a glass.
Fortunately the necessary HTTP is very brief and you prepare it in an editor beforehand. This example assumes you want to check boakes.org for redirects:
That’s two lines with text, then one blank line. If you then fire up your telnet software:
telnet boakes.org 80and then paste the prepared http code, and hit enter, you’ll either see an HTTP response that contains a redirect header, or you’ll see the requested page.
Now change the http request:
and reconnect:
telnet www.boakes.org 80…and compare with your previous output, one has a referral header and one doesn’t.
The important lines in the referral header are these two:
The first one contains a 301 result code that instrructs the browser that it needs to ask somewhere else. The location line tells the browser where to look.
Now; on your particular server I see that you’re running Plesk, and I noticed the following Plesk Server Admin (2.5) manual:
So it sounds like it’s “a feature”, and may be a cause in your redirect is failing.
From a Google perspective, the upshot of the same content being available from two uri’s has historically resulted in a split pagerank – but it sounds like BigDaddy may solve this.
Is big daddy over yet? We have been completely dropped from the search results. Can you suggest any way to put in a complaint with google? Where do we resubmit the website for crawling?