Hard to believe

When I tried getting posts urls from wordpress reader I first tried finding post’s names on mozilla’s source code viewer, being unable to do that I conclude the website was made with flash/javascript and I would need a way around.

After months thinking about how to crawl wordpresses jumping from blog to blog or simply asking for bloggers to register their blogs or even asking random users to index a post I was quite upset. No alternative seemed good enough. Relevant content needs to be updated continually, needs to be fresh. I tried finding a feed for specific tags on wordpress.com, failing. I even thought today about filtering bing’s results for a given tag a selecting only *.wordpress.com websites, then showing to my users. But then I tried getting a wordpress.com/some_tag page using python script(not mozilla). And all posts urls showed up, as well as titles and even descriptions; a crawl-able website. Not only that, but I can also get older pages by adding “/page/x” where x is [2..Inf]

Unbelievable. That’s exactly what I wanted. Now let’s hope PHP can also get those crawler-friendly pages so I don’t need to pay a python-enabled host.

Advertisements

Leave a comment

Filed under Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s