Skip to content

Study carried out at DCC/UFMG automates the identification of child pornography

Published on June 20, 2022 – Conference

According to Agência Brasil, the number of images of child sexual abuse and exploitation found on the internet grew by 70% in the first four months of 2023, compared to 2022, being the biggest increase since 2020. Only in this period, the non-governmental organization SaferNet , which maintains an agreement with the Federal Public Ministry, sent 14,005 complaints received at its center to the Institution.

At the same time, the number of unique links shared, which give access to images of abuse, has also grown since 2019, in the first four months of the year, when comparing records from one year to the next. There is only one exception to the trend, that of 2022. According to data from SaferNet, two years after its creation, in 2006, it reached 289,707 reports, a record high. Another important piece of data from the organization that indicates the vulnerability of children and adolescents is the 102.24% increase in these practices since 2020, the first year of the Covid-19 pandemic.

Concerned about this situation, researchers from the Department of Computer Science (DCC) at UFMG and UNICAMP have been carrying out research on the topic and, in June 2022, published it at the FAccTo Conference – a computer science conference with an interdisciplinary focus that brings together researchers and professionals interested in fairness, accountability and transparency in sociotechnical systems – the article “Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets”.

According to research, online sharing and viewing of child sexual abuse material (CSAM) is growing rapidly, to such an extent that human experts can no longer handle manual inspection. However, automatic CSAM classification is a challenging field of investigation, largely due to the inaccessibility of target data that is—and should forever be—private and in the exclusive possession of law enforcement agencies.

Thus, to help researchers extract insights from unseen data and reliably provide additional understanding of CSAM images, the authors proposed an analysis model that goes beyond dataset statistics and respective labels. “Our study focuses on extracting automatic signals, provided both by pre-trained machine learning models, for example, object categories and pornography detection, and by image metrics such as luminance and sharpness. Only aggregated statistics of sparse signals are provided to ensure the anonymity of victimized children and adolescents. The pipeline allows you to filter the data by applying thresholds to each specified signal and provides the distribution of such signals within the subset, correlations between signals, as well as a bias assessment,” they explained.

In this way, the researchers demonstrated the proposal on the region-based annotated child pornography dataset (RCPD), one of the few CSAM benchmarks in the literature, consisting of more than 2,000 samples between regular and CSAM images, produced in partnership with the Police Federal of Brazil. “Although noisy and limited in many ways, we argue that automatic signals can highlight important aspects of global data distribution, which is valuable for databases that cannot be made public,” they clarified.

The article was authored by researchers from DCC/UFMG, Camila Laranjeira da Silva, João Macedo and Jefersson dos Santos, as well as scientist Sandra Avila, from UNICAMP.