Before a while I discovered the website crawling framework Scrapy as mentioned in my earlier post Nice Python website crawler framework. Now I wrote a little website crawler using this genius framework. Here are the steps I’ve taken to get my blog https://www.ask-sheldon.com crawled: Installation of Scrapy Install Python package management system (pip): $> sudo apt-get install […]
Today I stumbled over http://scrapy.org/ while searching for an OpenSource website crawler. Its an interesting crawling and scraping framework for Python. It looks very convenient and easy to use. The most interesting feature seems to be the possibility to select website elements (f.e. hyperlinks) via CSS-selectors. In any case I’ll give it a try.