I wonder what my blog's stats would look like with all the malicious activity (spam and exploit bots) removed. My suspicion is that these amount to almost half of the visits. Anybody have promising filtering approaches?

Following a suggestion by @leip4Ier, I created a small Python script to filter the log – only RSS requests and requests by IP addresses that loaded CSS are passed through. Almost all bot activity is filtered away and I am left with around 50% of the visits on most days.

But there is far less fluctuation. Apparently, the spike earlier this month was caused by bot activity. Some articles went way down in terms of traffic, particularly as I already suspected the one with "login" in title and an ancient article about Flash.

Unsurprisingly, all planet.mozilla.org referrers went away – these were image requests only (same with some forums that embedded my images directly). 90% of the Twitter and 70% of GitHub referrers vanished as well however, these weren't actual clicks.

One of the remaining requests scanning for vulnerabilities is apparently produced by the DotGit browser extension (thanks @leip4Ier for the hint).This one will query /.git/HEAD for any website visited in order to recognize websites that have repository metadata exposed.

