2007-06-05

Scraping

I lifted this code from Google Hacks about six months ago. It hasn't failed yet. Until today. As was pounded into my head over and over, scraping web pages is unreliable.

$ cat /home/galoot/bin/calc 
#!/usr/bin/php5
preg_match_all('{<b>.+= (.+?)</b>}',
file_get_contents('http://www.google.com/search?q=' .
urlencode(join(' ', array_splice($argv, 1)))), $matches);
print str_replace('<font size=-2> </font>', ',',
"\n{$matches[1][0]}\n\n");
;
?>
Okay.

$ calc 2000/364

5.49450549
So far so good.

$ calc 2000/366

5.46448087
That's right.

$ calc 2000/365

{Carlo <b>...
Heh. That's from the 8th hit for those search terms.



Back to the drawing board.