I wonder what my blog's stats would look like with all the malicious activity (spam and exploit bots) removed. My suspicion is that these amount to almost half of the visits. Anybody have promising filtering approaches?

Following a suggestion by @leip4Ier, I created a small Python script to filter the log – only RSS requests and requests by IP addresses that loaded CSS are passed through. Almost all bot activity is filtered away and I am left with around 50% of the visits on most days.

But there is far less fluctuation. Apparently, the spike earlier this month was caused by bot activity. Some articles went way down in terms of traffic, particularly as I already suspected the one with "login" in title and an ancient article about Flash.

Unsurprisingly, all planet.mozilla.org referrers went away – these were image requests only (same with some forums that embedded my images directly). 90% of the Twitter and 70% of GitHub referrers vanished as well however, these weren't actual clicks.


@WPalant wait, github and twitter embed images from 3rd-party domains?..

Β· Β· Web Β· 1 Β· 0 Β· 0

@leip4Ier No, that's mostly HEAD requests – seem to be checks whether a page exists and metadata retrievals, yet they have the referrer set. I didn't check whether it's Twitter and GitHub themselves who perform these requests or whether it's some third parties. But they definitely aren't browsers.

Sign in to participate in the conversation
Infosec Exchange

A Mastodon instance for info/cyber security-minded people.