On wordpress first:
I get followers and likes even though I have no views during the day and block viewers from reading the whole text on “reader”
On visual studio:
If one creates a raw socket, it doesn’t receive data from tcp/udp requests(neither from visual studio nor firefox)
If you create a non-blocking udp server, and try to modify a control without invoke the program simply closes without raising errors(It didn’t happen with the asynchronous tcp server I built yesterday)
About raw sockets again, it says on the tooltip that you need to create your own ip header, in practice it sends the header by itself.
“301 Moved Permanently” is what I get while trying to get wordpress.com/tag/X/ on php. However, Python urllib still works.
The tip I gave about getting wordpress /page/2 of tags no longer works…
I could pay some serious hosting but I’m under the impression they would make crawling impossible sooner rather than later. Also, without page2+ crawling there is no way I would be able to do any real search.
After months thinking about how to crawl wordpresses jumping from blog to blog or simply asking for bloggers to register their blogs or even asking random users to index a post I was quite upset. No alternative seemed good enough. Relevant content needs to be updated continually, needs to be fresh. I tried finding a feed for specific tags on wordpress.com, failing. I even thought today about filtering bing’s results for a given tag a selecting only *.wordpress.com websites, then showing to my users. But then I tried getting a wordpress.com/some_tag page using python script(not mozilla). And all posts urls showed up, as well as titles and even descriptions; a crawl-able website. Not only that, but I can also get older pages by adding “/page/x” where x is [2..Inf]
Unbelievable. That’s exactly what I wanted. Now let’s hope PHP can also get those crawler-friendly pages so I don’t need to pay a python-enabled host.