AI classification of websites
I see that Erich is also trying to do something with AI classification of web pages. It would be interesting to find out what algorithms he is using and what the validation testing results are - I just did my Masters in a similar direction. But alas, his blog has no comments :)
The thing i would suggest is trying out the algorithms using some existing tool sets, such as WEKA from the University of Waikato in New Zealand. I used it in my Master's thesis and got very nice results. I found that with non-topical categories tree based classifiers and support vector machine classifiers (in the SMO variation) produced acceptable results.