Detecting CSAM: Through Hashing and Intermediaries

Detecting CSAM, Hashing, and Intermediaries

Governments across the globe frequently claim child sexual abuse material (CSAM) and potential terrorist activity as justifications to support backdoors in encryption-based services. Big tech companies continue to face pressure while offering end-to-end encrypted services. Is the roll-out of on-device hash matching by private companies an attempt to fend off such requests from the government without affecting a user’s privacy?

Background

Historically, the ‘protection of children’ justification served as one of the cornerstones of the Cyber Paternalism Movement. This movement opposed the Cyber Libertarianism Movement and argued favouring a regulated cyber space to prevent cyber anarchy. Governments and companies use the same justification to meet regulatory ends resulting in actions that vary between long-due and unreasonable.

A decade ago, United Nations suggested that the number of child sex offenders browsing at any given time is 750,000. According to IWF, this number has now increased to 1 million. After the COVID-19 pandemic, the numbers have further worsened. There is a 374% increase in the number of self-generated CSAM materials on the internet compared to pre-pandemic levels. Another serious concern is the level of human intervention required by industry regulators and law enforcement agencies to vet the content. One research study also indicates the increasing mental trauma associated with the vetting of CSAM. Therefore, automating the detection of CSAM is no longer an option but a necessity. This necessity comes with its own challenges and policy framework concerns.

CSAM: What does the Indian law say?

Section 67B of the Information Technology Act, 2000 criminalises a wide range of activities pertaining to child sexual abuse material. It prescribes imprisonment for up to five years and a fine of up to ₹10 lakhs. However, this provision came into existence in 2008 through the Information Technology (Amendment) Act, 2008. We have discussed this provision in detail here.

The Protection of Children from Sexual Offences Act, 2012 (POCSO) is a comprehensive legislation with multiple provisions related to CSAM. It criminalises using minors to produce CSAM, engaging in sexual activity with them, and storing CSAM content. Section 15 of POCSO prescribes punishment for storing pornographic material involving children. For instance,

Storing or possessing CSAM but failing to delete or destroy it: Fine of at least ₹5000.
Storing or possessing for the purpose of distribution, display, or transmission: Imprisonment of up to 3 years, or fine, or both.
Storing CSAM for commercial purposes: Imprisonment of at least 3 years with a maximum of five years, or fine, or both.

Privacy v. Detecting CSAM: Understanding Apple’s case

In August 2021, Apple announced its hashing-based identification tool called Neural Hash. This tool seeks to address CSAM proliferation through Apple’s cloud services with the help of hashing and machine learning tools. This announcement meant that Apple was diverging from its “what happens on your phone stays on your phone” approach. Via this tool, the company sought to scan the content uploaded on a user’s iCloud for known instances of CSAM and report the identified users to NCMEC. NCMEC (National Centre for Missing & Exploited Children) is the largest child protection organisation in the United States.

This announcement received criticism from various privacy and free speech activities across industries. Even though Apple released a clarificatory document to answer commonly asked questions about their tool, this backlash pushed the company to delay its implementation.

Metadata-based detection of CSAM

While there are many accepted definitions of CSAM, any visual representation of sexually explicit behaviour involving minors would fall under the scope of this term. The general process of hashing used by big tech companies involves assigning a hash value to an identifiable image and comparing it with existing databases such as the one offered by NCMEC. Along with server-side and on-device protocols, a company flags the image if a match is found without requiring human intervention. The NCMEC database contains known instances of CSAM content that have been triple-vetted.

This process of scanning users’ personal data has received much criticism. While the general understanding says that two different pieces of content cannot have the same hash value, researchers have reported cases of hash collision. Without additional safeguards, there is a good chance that an innocent picture can be labelled as a known instance of CSAM.

Role of intermediaries: Reporting and takedown

Private organisations play a crucial role in aiding the government in CSAM detection and takedown. Be it India or elsewhere, private parties majorly provide internet-based services to their users. In India, the Information Technology (Intermediary Guidelines and Digital Media Ethics Code) Rules, 2021 (“IT Rules”) highlight the role of intermediaries. The honourable Supreme Court noted the increasing presence of CSAM on the internet in the case of In Re: Prajwala (2018). Rajya Sabha’s Ad-Hoc Committee Report (2020) also argued in favour of using hashing methods for detecting and curbing CSAM on the internet.

Rule 4(2) of IT Rules obligates significant social media intermediaries to trace the initial content source for various reasons. Under Rule 4(4), they should also make a proactive effort in monitoring CSAM on their platforms. By this obligation, IT Rules have made the proactive detection of CSAM content an endeavour-based initiative rather than a mandatory requirement. Regardless of these rules, a large volume of CSAM still goes unreported. However, without a legal requirement, it is left for private companies to decide the approach they would follow when it comes to detecting CSAM on their platforms without user reports.

Concluding remarks

In 2019, tech companies reported a total of 16,836,694 instances of CSAM content on the internet. India emerged as the most significant source among 240 countries, with nearly 1,987,430 reports. If you do quick maths, three cases must be reported every minute. However, between 2014 to 2019, police across the country filed chargesheets for 120 out of 260 cases. Eight trials were concluded in this duration, with only six resulting in successful convictions. In this context, the controversy surrounding the hashing of CSAM is fuelled by opposing claims of deterring its transmission and not giving arbitrary takedown power to social media platforms. Currently, Indian law does not require an over-and-above detection of CSAM by intermediaries. However, it certainly encourages them to look out for known instances of CSAM content.

Tannvi and Sebin Sebastian PM, undergraduate students at the School of Law, Christ University, have jointly authored this article.

Featured Image Credits: Image by rawpixel.com on Freepik

Technology. Law. Policy. You

For all things cyber