Instagram's new filter makes internet bullies 'disappear'
CEO Kevin Systrom says new filter could be the end of offensive comments Image: REUTERS/Lucas Jackson
Once upon a time, trolls were the scary creatures lurking under bridges in children’s fairytales.
Today they have left the page for the screen, lurking in the comments sections of online news sites and social media posts.
“Trolling” can include everything from mild putdowns and a celebrity chef’s meal reviews to threats of rape and murder.
In the latest bid by social media networks to defeat the trolls, Instagram has announced it will now use artificial intelligence (AI) to effectively delete comments from trolls as soon as they appear on a user’s post.
Instagram uses the deep learning technique of parent company Facebook's DeepText program – originally developed to tailor users’ Facebook feeds based on the words and phrases they used in posts and comments – to filter out comments considered offensive.
Beating the trolls
Trolling has been the scourge of social media networks for as long as they have existed.
A Pew Research Center survey found that one in four internet users in the US had experienced harassment. Among 18-24 year olds this figure rose to 70%.
However, given that social networks can be used as platforms for political debate, among other topics, those running social media companies have been reluctant to take on an active “policing” role.
“We’re not here to curb free speech,” insisted Instagram CEO Kevin Systrom speaking to Wired about the new AI comment filter.
While companies like Instagram may not wish to impose limits, neither can they be seen as doing nothing if they wish to keep and grow their audiences. The Pew study found that one in 10 of those who suffered online harassment chose to leave the social network on which they experienced the abuse.
Systrom introduced his company’s new comment filter as “the next step in our commitment to foster kind, inclusive communities on Instagram”.
It follows the introduction last year of a keyword filter, which allowed Instagram users to list the words they considered offensive or inappropriate, and block comments that contained them.
Instagram’s latest filter essentially switches this from a manual process to an automatic one.
How it works
Before battling trolls, Instagram first employed Facebook’s DeepText algorithms to tackle the other bane of social networks: spam.
This spam filter was developed by Instagram employees sorting through comments and feeding those regarded as spam into DeepText, teaching it to recognize comments classed as spam.
According to Wired, DeepText was taught to recognize spam not only by the content of the message, but also by its source: a comment by someone you follow is less likely to be spam than a message from someone you don’t know.
Having been satisfied by the results of the spam filter following its launch last year, Instagram turned its attention to the trolls.
The same process was used, with humans categorizing offensive phrases and then feeding them into DeepText.
The filter is automatically added to an Instagram account, but can be switched off in the comment settings (see below).
With the filter on, hostile or harassing comments disappear, although the person who typed it will still see it – an attempt by Instagram to stop trolls trying to understand how the filter works, and find a way to beat it.
The problem of false positives
Instagram’s offensive comment filter is currently only available in English, although with the spam filter now rolled out to Spanish, Portuguese, Arabic, French, German, Russian, Japanese and Chinese, it is likely only a matter of time before trolls are blocked in multiple languages.
However, the complexity of language means that AI-based solutions such as Instagram’s are not without their challenges.
Research by Cornell University looking at automated hate speech detection found that simply listing potentially offensive words was insufficient for identifying hate speech on twitter, as context was key.
Equally, there was potential for false positives, with unoffensive tweets being blocked due to some of the words they contained.
Questioning the capability of the Instagram filter, Wired put several of these false positive sentences to Instagram, including the following:
“I didn’t buy any alcohol this weekend, and only bought 20 fags. Proud that I still have 40 quid tbh.”
The sentence is clearly using the British slang for cigarettes, but was flagged by the Cornell researchers’ system as hate speech, due to “fags” also being a pejorative term for homosexuals in the US.
While Instagram declined to comment on these specific false positive sentences, Systrom admits “our work is far from finished and perfect”. He told Wired that it would take time to tell whether it is a success or not.
If the filter were to “cause trouble” and block too many non-abusive comments, “we’ll scrap it and start over with something new”, he said.
Don't miss any update on this topic
Create a free account and access your personalized content collection with our latest publications and analyses.
License and Republishing
World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.
The views expressed in this article are those of the author alone and not the World Economic Forum.
Stay up to date:
Social Media
Related topics:
Forum Stories newsletter
Bringing you weekly curated insights and analysis on the global issues that matter.
More on Social InnovationSee all
Kim Dong-Yeon
January 17, 2025