How ‘hashing’ could stop violent videos from spreading

Mosques targeted in deadly New Zealand mass shooting
Police speak outside Christchurch Hospital following attacks at two mosques in Christchurch, New Zealand on March 15, 2019.

Nearly 18 hours after a terror attack that killed 49 people at a mosque in New Zealand on Friday, footage from the shooting remained live on YouTube and Facebook.

Some experts say tech companies should more broadly adopt a technology they’re already using to combat child pornography and copyright violations to more quickly stop the spread of these types of videos.

The 17-minute livestreamed video, which has not been verified by CNN, appears to be filmed by one of the shooters as they walked into a mosque and opened fire.

Facebook says it took down the livestream “quickly,” but hours later, re-uploads of it were still circulating on the site. Twitter suspended the original account in question and is working to remove other versions on the platform. YouTube said it is utilizing “technology and human resources” to remove content that violates their policies.

Technologists say digital hashing, which has existed for more than a decade, could be better used to prevent the re-upload of videos. Hashing wouldn’t have been able to catch the original live video of the attacks, but it could stop re-uploaded copies from spreading.

“The video is still circulating online,” said David Ibsen, the executive director of the Counter-Extremism Project, an organization that maintains a hashing database for terrorist videos. “The technology to prevent this happening is available. Social media firms have made a decision not to invest in adopting it.”

YouTube told CNN Business it is using hash technology to prevent uploads of the already removed New Zealand massacre videos, but not necessarily for ones that show a portion of the original. It is instead relying on “automated flagging systems and user flags” to stop the spread of those clips.

Twitter declined to elaborate on its approach to hashing.

In a statement, Facebook said: “We are adding each video we to find to an internal database [hashing], which enables us to detect and automatically remove copies of the videos when uploaded again.”

The company said it removed the video from Facebook Live and hashed it so that other videos that are visually similar are automatically removed from Facebook and Instagram. However, Facebook did not comment on why some portion of the original video remain up hours later.

According to Hany Farid, a professor of computer science at Dartmouth College who has used hashing to combat child pornography, if Facebook were using “robust” hashing — a method that should be able to detect variations on reuploads — it ” should be finding the majority of reposts.” Additionally, any variations that fall through the cracks can then be hashed, and added to the same database to prevent further reuploads.

Social platforms like Facebook are increasingly relying on artificial intelligence to flag violent content. But the process can be unreliable due to many factors, such as how much content is being uploaded on a daily basis, and AI’s inability to understand nuances like the context in which an event is taking place.

Facebook, Google and Twitter currently use hashing to combat illegal material, such as child pornopgrahy, copyright violations and videos that go against its terms of services (like extremist content). But according to a YouTube spokesperson, using hash technology on videos that could potentially show up in a legitimate context, such as in a news clip, might place too much of a burden on human content moderators.

“Hashing is extremely effective in outright preventing the upload of content which is illegal, regardless of context, like child sex abuse imagery,” the spokesperson said in a statement. “For major news events, context is key, and uploads that are documentary may be allowed on YouTube.”‘

The company added that hashing could flag videos that use original footage in the proper context, such as in a news video.

How hashing works

Video hashing works by breaking down a video into key frames and giving each a unique alphanumerical signature, or hash. That hash is collected into a central database, where every video or photo that is uploaded to a platform is then compared against that dataset.

The system requires a database of images and doesn’t use artificial intelligence to identify what is in an image — it only identifies a match between images and videos.

“Hashing has the advantage that it works at scale,” said Farid.

In 2008, Farid worked with Microsoft to create PhotoDNA, a system that can quickly identify child pornography at massive scale. PhotoDNA is currently being used by most major tech platforms, he said.

“It’s exceedingly fast,” he said. “It allows you to deal with billions of uploads a day.”

According to Farid, each image users have uploaded to Facebook in the last 10 years has been “scanned against a known database of child sexual abuse material.”

Platforms also use hashing to monitor videos for copyright infringement. If you try to upload a copy of the Avengers film onto YouTube, you won’t get very far, thanks to hashing.

Tech platforms have shown more interest in hashing over the years to stop the spread of terrorist videos. After years of the platforms struggling with these issues, Facebook, Youtube, Twitter and Microsoft joined to create the Global Internet Forum to Counter Terrorism, an organization that maintains a database of known terrorist extremism hashes.

Tech companies hesitant to implement

Although hashing has been used by tech companies for years, Facebook and Google said the future of content moderation mostly lies in artificial intelligence.

“If we fast forward 5 or 10 years, I think we’re going to have more AI technology that can do that in more areas,” Mark Zuckerberg said in his testimony to the Senate Commerce and Judiciary committees in April 2018.

But to Farid, that answer is insufficient: “I don’t know what you do in the interim five or 10 years. I guess we live with it?”

“It’s been proven to work in the child abuse space, the copyright infringement space and now in the extremism space, so there are no more excuses,” Farid said. “You can’t pretend that you don’t have technology. The decision not to do this is a question of will and policy — not a question of technology.”