Due to the recent spam waves affecting the Fediverse, we’d like to open requests for comment on the use of automated moderation tools across Pawb.Social services.
We have a few ideas on what we’d like to do, but want to make sure users would feel comfortable with this before we go ahead with anything.
For each of these, please let us know if you believe each use-case is acceptable or not acceptable in your opinion, and if you feel like sharing additional info, we’d appreciate it.
1. Monitoring of Public Streaming Feed
We would like to set up a bot that monitors the public feed (all posts with Public visibility that appears in the Federated timeline) to flag any posts that meet our internally defined heuristic rules.
Flagged posts would be reported per normal from a special system-user account, but reports would not be forwarded to remote instances to avoid false-positives.
These rules would be fixed based on metadata from the posts (account indicators, mentions, links, etc.), but not per-se the content of the posts themselves.
2. Building of a local AI spam-detection model
Taking this a step further, we would like to experiment with using TensorFlow Lite and Google Coral Edge TPUs to make a fully local model, trained on the existing decisions made by our moderation team. To stress, the model would be local only and would not share data with any third party, or service.
This model would analyze the contents of the post for known spam-style content and identifiers, and raise a report to the moderation team where it exceeds a given threshold.
However, we do recognize that this would result in us processing posts from remote instances and users, so we would commit to not using any remote posts for training unless they are identified as spam by our moderators.
3. Use of local posts for non-spam training
If we see support with #2, we’d also like to request permission from users on a voluntary basis to provide as “ham” (or non-spam / known good posts) to the spam-detection model.
While new posts would be run through the model, they would not be used for training unless you give us explicit permission to use them in that manner.
I’m hoping this method will allow users who feel comfortable with this to assist in development of the model, while not compelling anyone to provide permission where they dislike or are uncomfortable with the use of their data for AI training.
4. Temporarily limiting suspected spam accounts
If our heuristics and / or AI detection identify a significant risk or pattern of spammy behavior, we would like to be able to temporarily hide / suppress content from the offending account until a moderator is able to review it. We’ve also suggested an alternative idea to Glitch-SOC, the fork we run for furry.engineer and pawb.fun, to allow hiding a post until it can be reviewed.
Limiting the account would prevent anyone not following them from seeing posts or mentions by them, until their account restriction is lifted by a moderator.
In a false-positive scenario, an innocent user may not have their posts or replies seen by a user on furry.engineer / pawb.fun until their account restriction is lifted which may break existing conversations or prevent new ones.
We’ll be leaving this Request for Comment open-ended to allow for evolving opinions over time, but are looking for initial feedback within the next few days for Idea #1, and before the end of the week for ideas #2 through #4.
Appreciate the feedback so far, let me try to see if I can answer most / many of the questions:
What are the risks of #4?
Many users are worried about the risk of automated actions going wrong and not knowing what we mean with “pattern of spammy behavior.”
For how we would identify the pattern of behavior that would allow for automated actions, we would review any major spam wave, such as the one we’ve been experiencing over the past few days:
We would then identify any indicators we could use that are indicative of the known spam, and create a heuristic ruleset that would limit or suspend those accounts while targeting only those accounts actively engaging in the spam, not just referring to it. There are additional safeguards we can add, such as preventing rules being applied to users where the user is followed by someone on our instances.
For the risk of automated actions going wrong, if we were using a limit (not a suspend) then the account would be hidden from public view but could still be viewed if specifically searched by name, it would also suppress all notifications from that user unless they are followed by you. (e.g. if they messaged you out of the blue, you wouldn’t see it if you weren’t following them.)
If a suspend was used, the account would be marked for deletion from our instances but all follower relationships would immediately break (e.g. if you were following them, the system would automatically unfollow when they are suspended). Typically, we can restore data within 30 days, but follower relationships are typically unrecoverable. So long as rules are appropriately limited in scope to only target those with a lot of spam indicators, no false-positives should occur.
What about appeals?
For local users (anyone registered on furry.engineer and pawb.fun), all actions against your account (except reports) can be appealed. If you have a post removed or are suspended, all actions can be appealed directly to the admin team.
For remote users, we can remove restrictions on remote accounts if we receive an appeal from any of our users, or by the affected account directly. This can be done via email, or just through a DM to one of the admins who can pass it to the team.
Would the AI model have oversight?
Yes. Where the team believe the filter has flagged sufficient content appropriately and maintains no false-positives, we may promote a model or ruleset to allowing automated actions (limit / suspend).
We’ll keep an eye on the actions of each ruleset by reviewing the daily / weekly actions taken to ensure they meet the criteria and have not misidentified any users or content, and we’ll also start publicly tracking the statistics of the models / rulesets we create and use, including a count of false-positives or reversed decisions.
Will you notify users?
Due to limitations in Mastodon, we can only notify local users (users on furry.engineer or pawb.fun) when actions are taken against their account; This process happens automatically when your post is removed, or your account is warned, limited, or suspended.
There’s no easy way to notify remote users other than sending them a DM, but doing so could be seen as spammy or lead to inciting further abusive behavior by informing them of our activity. While we can have transparency with our users due to having an invite-only platform, other instances are frequently open-registration which can allow the abusive user to re-create an account to continue to harass our users. BUT, I’m open to suggestions on this.
hmm, on the last point: If it’s just a single user harassing then it shouldn’t be too much trouble if they re-create an account. The anti-spam system should flag them again if they keep harassing. If it’s a lot of bots then I would assume they already have methods to determine whether an account is suspended (like DM-ing each other maybe). Hence there wouldn’t be an advantage of not informing them of being suspended.
I might be completely wrong here and missing a key point as I don’t really know anything about Mastodon or spam prevention really but it just feels wrong to censor someone without them knowing.
If time is crucial you could inform people an hour/a day/etc. after their suspension.
So, the issue lays in that there’s no technical way to notify the remote user (someone not on furry.engineer or pawb.fun) that they’ve been suspended on our end, without sending a message to them directly. If we suspend them on our end, that doesn’t per se suspend them on their end and they wouldn’t know that their messages were no longer reaching our users; They would still be able to message other users on their instance, and users on other instances, but not to our users.
We’re apprehensive about notifying remote accounts specifically because we don’t often know the moderation practices of the remote instance (to know if they’ll deal with it, or if they have open-registration allowing anyone to join without approval) and it may encourage further abusive behavior through ban evasions (creating new accounts on that instance or elsewhere to continue messaging) from the user being made aware that we’re no longer receiving their messages.