Explode function in VB:
Private Function phpishExplode(ByVal del As String, ByVal s1 As String)
Dim returnValue As New ArrayList
Dim lastMatchIndex As Integer = 0
For i As Integer = 0 To s1.Count – del.Length
Dim match As Boolean = True
For i2 As Integer = 0 To del.Length – 1
If s1(i + i2) <> del(i2) Then
match = False
If match = True Then
returnValue.Add(s1.Substring(lastMatchIndex, i – lastMatchIndex))
lastMatchIndex = i + del.Length
returnValue.Add(s1.Substring(lastMatchIndex, s1.Length – lastMatchIndex))
“301 Moved Permanently” is what I get while trying to get wordpress.com/tag/X/ on php. However, Python urllib still works.
The tip I gave about getting wordpress /page/2 of tags no longer works…
I could pay some serious hosting but I’m under the impression they would make crawling impossible sooner rather than later. Also, without page2+ crawling there is no way I would be able to do any real search.
After months thinking about how to crawl wordpresses jumping from blog to blog or simply asking for bloggers to register their blogs or even asking random users to index a post I was quite upset. No alternative seemed good enough. Relevant content needs to be updated continually, needs to be fresh. I tried finding a feed for specific tags on wordpress.com, failing. I even thought today about filtering bing’s results for a given tag a selecting only *.wordpress.com websites, then showing to my users. But then I tried getting a wordpress.com/some_tag page using python script(not mozilla). And all posts urls showed up, as well as titles and even descriptions; a crawl-able website. Not only that, but I can also get older pages by adding “/page/x” where x is [2..Inf]
Unbelievable. That’s exactly what I wanted. Now let’s hope PHP can also get those crawler-friendly pages so I don’t need to pay a python-enabled host.
Just discovered that my free web host won’t allow me to use php’s file_get_contents() on other websites.
That, coupled with this statistic makes going back to visual basic more appealing.
The creawler lib idea is probably not going to work, crawling and analysing are way too interwined, I have however separated the graph creation part in a file. the function Create_Edge(node,node,typeOfconnection) creates 2 nodes(or edit them) and 1 edge. following are one file that uses the library(graphTest.php) and the library(libGraph.php)
if (file_exists($nodeAfile) )
if (!file_exists($nodeBfile) )
Create_Edge($nodeB,$nodeA,$edgeType,TRUE);//says true, but actually, directed is false in both cases. One could call it round_trip
fwrite($fLog1,"Created edge: ".$nodeA."\x19".$edgeType."\x19".$nodeB."\n");
Also, create file: nodeA+edgeType+nodeB
Todo, separate $Direction and $RoundTrip
nodeA[Edge][nodeB]= $Direction(POINTER, ARROW, NONE)
if $Direction==NONE and RT==false
call create edge, with RT as true
So, my tagtree website is going, but now that I have crawled a few hundred posts I would like to connect them… meaning I need graphs and probably some visual information to make solid design decisions.
What surprises me however is the total lack of libraries to deal with graphs(at least in the default php, without extensions)… I have been thinking about developing my own graph library and even a crawler library that would help me focus on stuff that matters without recoding everything all the time.
The same can be said about composite hedges, something so basic to financial analysis that is lacking from websites I visited so far. The closest one can get is finance.yahoo, you can make a composite chart, but changing percentages of stocks in your portfolio or actually measuring the difference between two quotes/indexes is impossible(you can see them on naked eye, but no numeric value is given).
I could also make a website that does that.
They are both good at setting up a more productive enviroment for me and others but unlikely to earn me cash in and of themselves.