So, been thinking about going back to my idea of organizing tags, could put my recent experience with crawling to good use.
Monthly Archives: December 2013
I think I’ll give up on doujinpulse. The images are not showing properly on the eadulthost, plus the site is too bulky(I have foreseen that and I can correct it but what is the point if the images are going to get corrupted anyway?)
Also, I haven’t managed to make a single script that executes my crawler AND uploads it’s data. Plus, ftp-upload doesn’t seem to care about pre-existing files sharing the same name as the ones I’m trying to upload. Getting everything uploaded needlessly is a pain.
It’s not like nothing good has come out of this though, I have created a PHP that crawls and produces an HTML that shows crawled data. Kudos for me.
I’m tired but couldn’t sleep so I decided to write this.
What is next? I’m too tired to decide but probably continue to make crawlers. Particularly ones that store small, fresh data from manually selected sources.
Separation of crawler and webpage complete.
Now I just need to upload the website part AND make a script that automatically executes the crawler from my notepad and then uploads the updated website.
About 340 lines of code… about 1 third is old code commented out that I probably can remove safely… the rest will need my full attention… perhaps I should just re-write everything while consulting the old code…
I’ll have a python script looping…
calling os.system or whatever function should be used to execute bash commands and then input:
ftp-upload -h host.com -u user –password password ./dir1/*
That uploads all files from dir1 at once 🙂 Neat
Now comes the much harder part of separating crawling from content-viewing and running the former locally
And when I woke up couldn’t use my pen-drive-internet anymore…
I seriously need to separate the crawler part from the viewing part.
It’s on again now(still hosted on my machine) http://doujinpulse.eadulthost.com/
I will need some way to automatic upload local files by ftp though…
One thing I could would be to store the crawler/php part locally, have my website send me a post request, and then crawl and send
data back also by post requests. I will need to separate crawling scripts from the main website if I want to expand anyway.
Edit: Can’t use receive and parse $_POST without php