Tag Archives: Web-Crawling

How to get Data #2

Remember that I told one can either crawl vast amounts of data and then analyze it or crawl something that has already been analyzed? Well, I’ll make a further distinction
1-Asking people to send data(social networks)
2-Crawling huge amounts of data that have or have not been analyzed(search engines)
3-Crawling small on-demand data that have been analyzed(meta-search engines, flight price search engines)

That is different from my last post in that now 2 and 3 are different not only in that their data (has)/(has not) been tagged/classified but also that their data comes on-demand, an user in a website type 3 searchs for a term and THEN the website 3 looks for that on type 1 and 2 websites.

I just crawled a 1000 websites and still don’t see the tags I want… so this further narrowing of definition will help me… Also, wordpress is no longer a candidate, their tag-search result is in flash and extracting text from flash is well above my pay-grade 😉

Advertisements

Leave a comment

Filed under Uncategorized