Build Thy Search: Planning a crawler
Thursday, December 23, 2010
For some time now, I have been planning on building a search engine dedicated to delivering torrent links, megavideo links and direct http and ftp downloads for movie titles. I am a movie addict and am doing this solely for my viewing pleasure, plus the fact that I might get a few dollars by selling this to someone.
So, I have begun to plan on how the crawler shall work. FYI, the crawler takes a list of urls and downloads them to my disk, finds more links in the pages and keeps any downloads found.
I've taken C++ as the choice of language here; I myself don't know why.
The bare functionality:
Fetch Webpage.
Scan for Links.
Keep download links in DB.
Add other links to the queue of URLs to be crawled.
I was thinking of using libcurl but apparently, the libcpp c++ wrapper for libcurl isn't around anymore. Frankly, I would hate to code with the wininet. I still in search for a perfect HTTP library.