Data Mapping is the process of identifying all points where personal information enters a system, how it flows through various processes, and where it is eventually stored or deleted. This provides a visual and logical inventory necessary for auditing and compliance.
Risk Assessment (DPIA): A Data Protection Impact Assessment is a systematic process used to identify and minimize the privacy risks of a data processing project. It involves calculating the likelihood of a data breach and the severity of the impact on the individuals involved.
De-identification Techniques: These include Anonymization, which irreversibly strips identifiers so the individual can no longer be identified, and Pseudonymization, which replaces identifiers with artificial codes (pseudonyms) that require a separate key to re-identify the person.
It is critical to distinguish between Direct Identifiers and Indirect Identifiers. A direct identifier, like a passport number, points uniquely to one person, while indirect identifiers, like a birth date or zip code, only identify a person when combined with other data points.
| Feature | Anonymization | Pseudonymization |
|---|---|---|
| Reversibility | Irreversible | Reversible with a key |
| Legal Status | No longer 'Personal Data' | Still considered 'Personal Data' |
| Risk Level | Very Low | Moderate |
| Utility | Lower (data is less granular) | Higher (retains relational links) |
The Mosaic Effect occurs when multiple pieces of non-identifying information are combined to uniquely identify an individual. Analysts must look beyond single data points and evaluate the dataset as a whole to prevent re-identification through cross-referencing.
When evaluating whether a data point is personal information, apply the 'Motivated Intruder' Test. Ask if a person with reasonable effort and access to external resources (like social media or public records) could identify the individual from the data provided.
Always check for Metadata. Students often forget that file properties, timestamps, and hidden tags in digital documents can contain personal information even if the visible content does not.
Verify the Legal Basis for analysis. In exam scenarios, simply having the data is not enough; you must identify if the analysis is justified by consent, contract necessity, legal obligation, or legitimate interest.
Look for Secondary Identifiers. In datasets where names are removed, check if unique combinations of attributes (e.g., a rare job title combined with a specific office location) act as a 'proxy' for a direct identifier.
A common misconception is that Encryption is the same as Anonymization. Encryption is a security measure that hides data, but because it is reversible with a key, the data remains personal information and must be analyzed as such.
Another pitfall is assuming that Publicly Available Information is not subject to privacy analysis. Even if a person's information is on a public website, analyzing and aggregating that data for new purposes often requires a legal basis and privacy considerations.
Analysts often fail to account for Inferred Data. Personal information can be generated through analysis (e.g., predicting a person's health status based on their shopping habits), and this new 'inferred' data is also protected personal information.