denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)
Denise ([staff profile] denise) wrote in [site community profile] dw_maintenance2023-09-28 11:16 pm

Continuing dispatches on the war against spam

A few days ago we let you know about spam prevention measures that we were taking to help stem some of the flood of garbage. One of those temporary measures included geoblocking all IPs from several of the countries that are our largest source of spam. This did (as we knew it inevitably would) have some collateral damage for real users, and we're very sorry!

We're continuing to experiment: this time we've slightly expanded the range of countries we're geoblocking to include the ones that we held off on geoblocking because it would affect too much legitimate use, but we've limited the geoblocking only to the account creation page. This should mean that if you were having trouble accessing the site because of geoblocks, you should be able to access 99% of the site without a problem, and the only page you won't be able to access is the account creation page. With luck, this should cut back heavily on our spam account creation without disrupting legitimate use of the site. The current list of countries that are geoblocked from account creation are Bangladesh, Cambodia, Egypt, India, Indonesia, Morocco, Pakistan, Singapore, Turkey, and Vietnam. (If you're an existing user from one of those countries and you'd like to make an additional account, email support@dreamwidth.org with the username you'd like to register and we can register it for you. If the number of requests gets to be enough that it's taking up too much of our time, we may have to pause this until we can build automated exceptions, but we'll start there.)

We will continue to monitor the results of these experiments and adjust as necessary: when we do one of these experiments, we always make sure to define in advance what "too much interference with legitimate use" will look like, and we try very hard to stick to it. I apologize to everyone who's been collateral damage in our efforts to filter out more of the goddamn spammers.
dennisgorelik: 2020-06-13 in my home office (Default)

[personal profile] dennisgorelik 2023-09-29 07:35 pm (UTC)(link)
> IP addresses from multiple netblocks, from multiple providers, that are completely clean in every reputational database

I am suggesting to use your own reputational database of IP addresses, and keep your own database private.
So it will be hard for spammers to find out if their IP address is already blacklisted.

Bad IP should not prevent Dreamwidth account creation, but instead should allow spammer to create the account, so Dreamwidth can collect other spam indicators, such as:
- Email address and email domain.
- Connections to other Dreamwidth accounts.
- Content keywords.
- Other involved IP addresses.
- ...

> move on to the next group of clean ones

Do you mean that it is easy for a scammer to get access to clean IP addresses?
The spammer's dilemma is that if IP address is easy to access - then this IP address is quickly getting blacklisted.


> if I pulled the IPs of our last 100 spam accounts, every single one of them will have a cleaner reputation than the IP address you are currently using

Does Dreamwidth maintain the internal database of IP addresses Spam/Ham scores (based on Dreamwidth users activity)?

> Because they stop using them when they start accumulating negative reputation.

If Dreamwidth does not immediately delete spam accounts, then it may be quite tricky for spammers to detect that their IP address accumulated negative reputation in internal Dreamwidth database.

> there were about 130 ISPs licensed to operate in Bangladesh. We saw spam from over 100 of them.

So penalize IP addresses from 100 Bangladesh ISPs and do not delete accounts created from the remaining 30 Bangladesh ISPs.
This will put users' pressure on the bad ISPs to deal with spammers in their own IP networks.

> if your only understanding of toll fraud comes from a Google search and reading a surface-level article

I run a job board and deal with spam and scam every day.
Spam is a relatively minor issue for us vs scam (which is operated manually and not on a bot scale).

For spam indicators we use:
1) IP addresses (and networks).
2) Email addresses.
3) Content keywords.
4) Browser User Agents.
5) User's feedback.

> Site behavior is also not an accurate spam detection system.
> It detects less than 5% of spam account creation, and some days less than 1%.

What do undetected spam accounts do?

If they do something harmful - why you cannot detect such harmful behavior?
dissectionist: A digital artwork of a biomechanical horse, head and shoulder only. It’s done in shades of grey and black and there are alien-like spines and rib-like structures over its body. (Default)

[personal profile] dissectionist 2023-09-29 08:47 pm (UTC)(link)
Denise, by this point I feel like this guy is just sealioning you. I’m sorry you’re having to deal with it.
dennisgorelik: 2020-06-13 in my home office (Default)

[personal profile] dennisgorelik 2023-09-29 10:17 pm (UTC)(link)
If you think that the spam detection system I describe is "one simple trick" - you misunderstand what I describe.

If you are not interested in the discussion about spam detection strategies - it is ok.

I thought that you posted about spam problems in order to get more ideas that might help you with improving your spam detection algorithms.
marinarusalka: (general: wrong on internet)

[personal profile] marinarusalka 2023-09-29 10:53 pm (UTC)(link)
Denise, you have the patience of a saint. I would’ve banned that dude three comments in, and not nearly as politely as you did.

Thank you for your hard work.
kore: (Default)

[personal profile] kore 2023-09-30 05:11 am (UTC)(link)
Oh my God seriously. That was some deep level explaining he didn't deserve but it was intriguing to hear about, like an irritant ending up as a pearl!
lovingboth: (Default)

[personal profile] lovingboth 2023-09-30 09:57 pm (UTC)(link)
A rare upside of spamsplaining :)
andrewducker: (Default)

[personal profile] andrewducker 2023-10-02 12:35 pm (UTC)(link)
Seconding all of this.
havocthecat: elizabeth weir, rodney mckay, and john sheppard gossip and pretend not to (sga lizzie/shep/mckay gossip)

[personal profile] havocthecat 2023-09-30 03:25 pm (UTC)(link)
I'm sorry you had to deal with that, but I learned a lot from reading it, if it helps to know someone else got something out of...that.