Page 15 of 20

PostPosted: 01 Jan 2017, 11:11
by Gergel
Yeah, I saw that do. Will do a deep-dive into forum logs soonish.

PostPosted: 01 Jan 2017, 11:57
by Shevron
We did get past our 40gb quota. Good thing it was the 31st so it reset after midnight.

Curious how we blew through it though. Never happened before.


Double post merged on 01 Jan 2017 11:53

Gerg .. right now 11:52am, counter stands

Bandwidth
654.52 MB / 39.06 GB ( 2% )

Something is not right. That's way too much data, considering how not busy the forum is.


Double post merged on 01 Jan 2017 11:56

Graphs show that we average about 10-17GB monthly for the past year.

Suddenly this December it's 40GB - on a month that hasn't been particularly busy either.

PostPosted: 01 Jan 2017, 13:36
by Gergel
Looks like a web crawler bot (SEMrush) has taken an unhealthy interest in the forum. I'm going to politely ask all web crawlers to cease and desist, using the "robots.txt" method, and block SEMrush with a bit more aggressive methods.

Will keep an eye on the logs.
LinBeifong_o.gif
LinBeifong_o.gif (235.73 KiB) Viewed 12145 times

PostPosted: 01 Jan 2017, 13:53
by Toot
Excuse my ignorance, but what does a webcrawler do?

PostPosted: 01 Jan 2017, 14:17
by Gergel
A webcrawler is how Google and other search engines get all their data.

It's a bot that downloads all the webpages that it can find (such as every thread and post in the forum), and then does something with the data it has gathered. Google for example uses the gathered data to provide search results. So if you try to google rhyme and punishment toot it uses this data to provide a link to Toot's profile in forums.rnp-moonglade.net forum.

Normally the webcrawlers should be reasonably polite and not hammer servers too much. But it looks like this damn SEMrush bot just keeps downloading the entire forum over and over, day after day, causing tens of bloody gigabytes of traffic. So now I've ordered the server to just deny access to it, so that it just gets a tiny error message everywhere, instead of big forum pages.

PostPosted: 01 Jan 2017, 14:38
by Toot
Thanks, that's what I'd suspected, but didn't know how Google and suchlike actually gathered their info. :)

PostPosted: 01 Jan 2017, 15:46
by Dunnykin
Image

PostPosted: 01 Jan 2017, 16:55
by Gergel
Now where'd I put my goshdanged EMP...

PostPosted: 01 Jan 2017, 18:03
by Shevron
Here ...

Image

PostPosted: 04 Jan 2017, 09:58
by Tormeron
Sounds like a bugged crawler, it should only be crawling a website every now and then, not daily, and if daily it doesn't need to download the entirety of the forum contents, normally crawlers are only interested in the html wording, cause pictures and videos they can link to and not have to actually download it.

When a google bot enters the forums I have or websites it barely registers any traffic since it keeps only downloading htmls and such.

Also, what the hell does it have to download on the forum in gigabytes? 90% of the pictures are links to outside resources. and vids are all outside resources.