Understanding Anonymisation, Pseudonymisation and De-identification in the context of GDPR and PDPB

In recent times, the 2018 General Data Protection Regulation (hereinafter referred to as GDPR) of the European Union has been a document of interest to Indians. The main reason for the same is that the legislative body of India has drafted a bill called the Personal Data Protection Bill, 2019 (hereinafter referred to as PDPB). Ever since the Justice K. S. Puttaswamy judgement that declared the right to privacy as a fundamental right under Article 21 came, privacy has been an important question in India. PDPB is the first step towards recognising various rights that individuals have over their data. This bill derives substantial inspirations from the GDPR, and hence, it cannot be left out while discussing PDPB.

In this article, I will be discussing how anonymisation, pseudonymisation, and de-identification can be understood in the context of GDPR and PDPB.

Importance of GDPR and PDPB

GDPR came into force in 2018, and it was enacted to fulfil a standardised form of legislation that enables the protection of data across all the member states of the European Union. The European Union has a total of 27 Member states as of the year 2020. Hence, it was a very crucial decision to have one unified law governing issues as important as data protection and privacy. GDPR is a stringent legislation and ensures a deterrent punishment for data breaches. For example, Article 83 prescribes a penalty up to 2% of the company’s global turnover or €10 million or whichever is higher.

On similar lines, PDPB aims to bring a paradigm shift in privacy and data protection in India. It is expected that this bill would soon be enacted as an act of Parliament.

Data Anonymisation

Data anonymisation has been defined in recital 26 of the EU GDPR. The same has also been defined in Section 3(2) of the PDPB. Data anonymisation refers to the removal of identifiers, either direct or indirect, by some form of an irreversible process which must be a standardised process approved by the authorities. This means that the data still exists, but the link between the data and the data principal is converted or transformed in such a way that the data principal cannot be identified from such data.

In simpler words, the collected data cannot be attributed back to the person from whom data has been taken or collected. It must be noted here that GDPR does not impose any restrictions for dealing with anonymised data. The same is the case with PDPB. The reason behind excluding anonymised data from the scope of the legislation is simple:

As the anonymised data cannot be attributed to individuals from whom it was collected, there does not exist a question with respect to the individual’s privacy.

Data anonymisation, over the years, has become a common practice across industries. For example, Google has mentioned in their privacy policy that they use anonymisation in their data processing activities. The collected data is anonymised for building products that are safe from phishing attacks and malware so that search queries are auto-completed, while at the same time, a user’s identity is protected.

Generalising data and adding noise to data are considered two primary methods for anonymising a data set.

Pseudonymisation and De-identification

Pseudonymisation has been defined in Article 4(5) of the GDPR while de-identification is defined in Section 3(16) of PDPB. These provisions define these terms as the processing of data in such a way that it can no longer be attributed back to a particular person without requiring additional information. It is also mentioned that appropriate technological/organisational measures have to be implemented so that data cannot be attributed to an individual. The ambit of measures here can include encrypting personal identifiers so that a limited number of individuals, as identified by the access level policy of an organisation, have access to the decryption keys. However, pseudonymisation does not always include encryption. An organisation can also replace specific personal identifiers with artificial data.

However, there is a subtle difference between pseudonymisation and de-identification. When data no longer allows the identification of an individual without some additional information, it is pseudonymisation. Pseudonymisation can be done intentionally. Recital 28 of GDPR mentions that pseudonymisation is a practice that helps in reducing the risks; however, it is not punishable to add additional information to identify the individual. When identity-related information is stripped from personal data, it becomes de-identification. This is done intentionally and as per PDPB, it is punishable to re-identify personal data when it was earlier de-identified.

Essentials to check if data is pseudonymised

The data shall be stored in such a way that it cannot be attributed to a particular individual.
The data shall be stored in such a way using technological/organisational measures so that it cannot be attributed to a particular individual without requiring additional information.

Let’s understand these concepts using an example.

Consider that there is an airline company called X. X stores customer information such as name, address, mobile number, destination travelled to, etc. Here, information such as travel destinations has to be retained by the airline so that its analysts can derive the most frequently visited cities and show relevant advertisements to the customers.

However, the analysts do not need to know the name, address, and mobile number of customers for their analysis. So, X assigns a passenger ID for each passenger and stores passenger ID, and destinations travelled to in a separate data set. This dataset is pseudonymised as the analysts are aware of the destinations travelled. However, they do not have access to the original data set to identify which customer travelled to where. In order to identify which customer travelled to where, one would need access to the original database so that passenger ID can be correlated.

Anonymisation v. Pseudonymisation/De-identification

Pseudonymised data still comes under the ambit of personal data as defined in Article 4(1) of the GDPR and Section 3(28) of the PDPB. On the other hand, as soon as data is anonymised, it does not fall within the purview of personal data and cannot be governed by either GDPR or PDPB.

Section 3(34) of PDPB defines re-identification. Section 92(2) states that if a data principal consents to such re-identification, it is allowed and can be done. Similarly, in Recital 31 of GDPR, it is given that public authorities who are doing their duty as per the member state law are allowed to re-identify (although this term has not been used explicitly) the data to a particular individual if they have requested the requirement of such data in a written format. Further, the processing of such data must be in accordance with the provisions of the GDPR.

Lastly, there is a difference between the methods of de-identifying data and anonymising data. De-identification can include the removal and replacement of a few personal data fields so that a particular person cannot be identified from the remaining dataset unless additional information is provided. Anonymisation includes the removal of all personal information from the data set in such a way that a particular person can never be identified.

Prima facie, these terms may appear complex to understand; however, they are not extremely difficult to comprehend. I hope that this article clears the distinction between anonymisation and pseudonymisation/de-identification. If you have any queries, please write to us at contact@cyberblogindia.in, and we will get back to you within a day or two.

Featured Image Credits: Design vector created by freepik – www.freepik.com

Technology. Law. Policy. You

For all things cyber