Drücke „Enter”, um zum Inhalt zu springen.
Hinweis zu diesem Datenschutz-Blog:
Anscheinend verwenden Sie einen Werbeblocker wie uBlock Origin oder Ghostery, oder einen Browser, der bestimmte Dienste blockiert.
Leider wird dadurch auch der Dienst von VG Wort blockiert. Online-Autoren haben einen gesetzlichen Anspruch auf eine Vergütung, wenn ihre Beiträge oft genug aufgerufen wurden. Um dies zu messen, muss vom Autor ein Dienst der VG Wort eingebunden werden. Ohne diesen Dienst geht der gesetzliche Anspruch für den Autor verloren.

Ich wäre Ihnen sehr verbunden, wenn Sie sich bei der VG Wort darüber beschweren, dass deren Dienst anscheinend so ausgeprägt ist, dass er von manchen als blockierungswürdig eingestuft wird. Dies führt ggf. dazu, dass ich Beiträge kostenpflichtig gestalten muss.

Durch Klick auf folgenden Button wird eine Mailvorlage geladen, die Sie inhaltlich gerne anpassen und an die VG Wort abschicken können.

Nachricht an VG WortMailtext anzeigen

Betreff: Datenschutzprobleme mit dem VG Wort Dienst(METIS)
Guten Tag,

als Besucher des Datenschutz-Blogs Dr. DSGVO ist mir aufgefallen, dass der VG Wort Dienst durch datenschutzfreundliche Browser (Brave, Mullvad...) sowie Werbeblocker (uBlock, Ghostery...) blockiert wird.
Damit gehen dem Autor der Online-Texte Einnahmen verloren, die ihm aber gesetzlich zustehen.

Bitte beheben Sie dieses Problem!

Diese Nachricht wurde von mir persönlich abgeschickt und lediglich aus einer Vorlage generiert.
Wenn der Klick auf den Button keine Mail öffnet, schreiben Sie bitte eine Mail an info@vgwort.de und weisen darauf hin, dass der VG Wort Dienst von datenschutzfreundlichen Browser blockiert wird und dass Online Autoren daher die gesetzlich garantierten Einnahmen verloren gehen.
Vielen Dank,

Ihr Klaus Meffert - Dr. DSGVO Datenschutz-Blog.

PS: Wenn Sie meine Beiträge oder meinen Online Website-Check gut finden, freue ich mich auch über Ihre Spende.
Ausprobieren Online Webseiten-Check sofort das Ergebnis sehen

Data Protection Impact Assessment for Artificial Intelligence and other digital services

0
Dr. DSGVO Newsletter detected: Extended functionality available
More articles · Website-Checks · Live Offline-AI
📄 Article as PDF (only for newsletter subscribers)
🔒 Premium-Funktion
Der aktuelle Beitrag kann in PDF-Form angesehen und heruntergeladen werden

📊 Download freischalten
Der Download ist nur für Abonnenten des Dr. DSGVO-Newsletters möglich

Data protection impact assessment (DSIA) is required by the DS-GVO for certain types of data processing and is intended to help identify and minimize risks. Does it also apply to systems that use artificial intelligence? An overview with recommendations.

Podcast on the topic

Risk assessment for machine learning and artificial intelligence following episode 29 of the privacy Deluxe podcast:

Introduction

Article 35 of the DSGVO introduces the concept of data protection impact assessment and describes when such an assessment is to be established. In paragraph 1, it is mentioned that the regulation applies in particular to new technologies. Artificial intelligence is one such technology. ([1])

A risk assessment as part of a DSFA should always be quickly producible. Since risk assessment is both a prerequisite for determining whether a DSGA is required and an integral part of any DSGA, it must therefore always be part of the DSGA.

Risk assessment = Multiplication of three values, see contribution.

Must a DSFA be established for each machine learning system? Machine Learning can also be seen as new technology. For because revolutionary approaches like Transformer or powerful pre-trained AI-models, but also the resurgence of LSTM (Long Short-Term Memory, invention from Germany) are certainly in combination or partly also individually novel.

The referred legal provision is based on the type, scope, circumstances and purposes of processing personal data. In digital services, the criterion regarding the scope of data processing should regularly be considered fulfilled.

Since a data protection impact assessment is not required for all data processing operations, the effort for the work that also has to be performed outside of a DPA is not attributable to the DPA.

Examples of such work: Mandatory information, system security, training.

A complete DSFA is required pursuant to Article 35(1) GDPR, if data processing is likely to result in a high risk for the rights and freedoms of natural persons. Which data processing takes place must be known anyway within the scope of the obligation to provide information pursuant to Article 13 GDPR or Article 14 GDPR.

According to Art. 35 Abs. 2 GDPR, the controller shall consult the data protection officer when conducting a data protection impact assessment . However, this point is irrelevant for the question of DSFA, as one can already see from the fact that in the aforementioned paragraph 2 it is stated that the DSBs are only to be added if such an appointment has been made (cf. § 38 BDSG). ([1])

In accordance with the provision of Article 35, paragraph 4, supervisory authorities compile a list of processing activities that are relevant for a DSFA. The DSK's list gives examples and mentions, for instance, customer support through artificial intelligence.

Data Protection Impact Assessment

Initially, the DSGVO only applies to personal data. Accesses to end devices, as regulated by lex specialis (§ 25 TDDDG, until 14.05.2024 it was TTDSG), are usually not the core subject of KI applications and can be left out here.

All other data than potentially personal data are therefore irrelevant for a DSFA. In this context, it should be noted that a non-personal data point is also personal if it appears together with a personal data point and knowledge of both data points exists simultaneously at the same responsible person. See Cookies, which are to be considered as personal because of their contact with the IP address.

As already mentioned, AI-systems are new technologies. It must therefore be looked at more closely according to legal regulations. This also makes sense, because when something new is introduced, there was no previous consideration of whether a DSFA should be established or not.

In Art. 35 Abs. 3 GDPR are cases mentioned, in which a DPO (Data Protection Officer) is to be appointed. These cases are briefly as follows:

  1. Comprehensive and systematic evaluation of personal aspects of natural persons including profiling.
  2. Comprehensive processing of special categories of personal data (political opinions, health data etc.), see Art. 9(1) GDPR.
  3. Comprehensive systematic surveillance of publicly accessible areas.

For any type of system, a DSFA must be performed if one of these cases is given and the other conditions apply, which includes the risk for affected persons. Let's take the example of video conferencing software Zoom. Zoom writes in its terms of use (valid from 07.08.2023, as of 10.08.2023):

They agree that Zoom may access, use, collect, create, modify, distribute, process, transmit, maintain, and store any data generated by the service for any purpose permitted by law, including for product and service development, marketing, analysis, quality assurance, machine learning or artificial intelligence (including for training and tuning algorithms and models) …

Excerpt from the Terms of Use of Zoom, bolded by me.

As stated, all data from video conferences held on Zoom can be used by Zoom for virtually any purpose in virtually any way. The video images of conference participants are included just as much as spoken words or transcripts of these words. Also, according to these terms, passing on or further use of the transcripts and other data is permitted by Zoom. After public pressure, Zoom made an addition that promises that customer data will only be used with consent for AI training. Nevertheless, Zoom reserves the right to use customer data without consent for numerous other purposes including marketing and machine learning! See also comments at the end of this post.

Zoom mentions applications of artificial intelligence in its terms of service. Whether or not these are included does not seem to be relevant to the question of a DSFA.

Here are the three cases of Article 35 Section 3 GDPR covered by cases 1 and 2: Clearly, personal aspects can also be disclosed or discussed in video conferences. Think only of planning schedules and possibly upcoming holidays, childcare, or health problems, thus opening case 2.

Apparently Zoom processes data very extensively and also systematically. Systematic processing can probably be assumed for all digital processing at first, until the opposite is made plausible. To make this plausible, a DSFA would likely be required.

When Zoom is a provider from outside Europe, all data recipients and their countries must be determined. This will happen anyway and is not a special topic of a DSFA. For each country, it must then be checked whether the rights and freedoms of individuals there are guaranteed in accordance with the GDPR. This too is no special topic of the DSFA. But if these statements are already available and countries outside Europe are not just Germany or those from Europe, then these countries should also be taken into account. It seems to require either very little effort to set up a DSFA for individual countries or a lot of effort. Little effort saves the discussion about a DSFA, as the discussion takes longer than writing it down. A lot of effort justifies a DSFA outright, because where many questions are open, a data protection impact assessment must be considered appropriate.

Artificial intelligence systems can particularly affect the rights and freedoms of individuals. This is relevant due to Article 35, paragraph 1 GDPR. As ChatGPT shows, AI expenditures on user queries can lead to a high credibility. The user sees the result and is often enthusiastic about the linguistic quality and conclusions of the AI. This also leads to false or incorrect results being seen as cash value.

Generative AI systems, which process personal data and do not pseudonymize at least, are particularly sensitive to handle. Here I think a DSFA is always appropriate. Even for research purposes considered sensible, a DSFA should be made beforehand. What if it turns out from the results that certain individuals had or have a particular disease? If the circle of people who receive the results is small and very trustworthy, this would need to be recorded in writing, which again is the right place for a DSFA.

Systems that are intended to support decision-making regarding a person's suitability should also be given special consideration. This profiling process, which affects life histories, cannot take place without further safeguarding. Part of this safeguard is a DSFA. Whether such a system is AI or not plays no or only a subordinate role.

Recommendations

The best is a GDPR (General Data Protection Regulation) if it's not necessary. To avoid a GDPR, only data protection-friendly systems should be used. For these, a GDPR can then also be set up quickly. Instead of discussing the necessity of something for a long time, this thing can simply be written down quickly instead.

AI-Systems in particular should not be used by untrustworthy third parties if personal data or other sensitive data such as one's own patents, business secrets or other confidential data are involved. Untrustworthy seems to me for example OpenAI with ChatGPT. Nobody knows exactly what happens to the data there either. Microsoft and Google are also in my opinion not trustworthy third parties. They use any kind of data for all possible own purposes.

A risk assessment also helps in evaluating whether a system is data-friendly or not.

Data friendliness encompasses all types of data that can be processed automatically.

How company-owned AI systems can be set up, I have described for example here:

When drawing up a DSFA for a provider of several services in use, a document hierarchy could be used. This also suggests Art. 35 para. 1 last sentence GDPR.

  1. Master Document: General evaluations of the provider and its subcontractors for data processing.
  2. Service Document Detail: References the master document and evaluates service-specific details.

A DSFA can initially be set up in a very rough form which can mean little effort. An example is mentioned below. If this Short-DSFA gives occasion for further investigation, more effort will have to be spent.

For a AI system like ChatGPT, a DSFA could unfold as follows. Briefly, this is the tried-and-tested rating scheme on a numerical basis. The numbers are not further justified here and are only to be seen as exemplary.

Example: Risk assessment for ChatGPT in document search

The assumption for this example is that employee data from a company are input into ChatGPT in the form of a document, against which questions are asked that ChatGPT should answer. This is also referred to as the Question Answering Task or more specifically as the Ask You Document Task. The document is first automated pseudonymized. Of course, errors can occur here, which are highlighted in this example.

The following Risk Assessment is part of a complete DSFA. A full DSFA is then necessary when the risk for affected persons is not low enough or high enough. In Wikipedia the content required in a DSFA is as follows (here slightly shortened and with notes at the end of each point):

  • Systematic description of the planned processing operations and purposes of processing. → Should already be known via Art. 12 GDPR.
  • Assessment of the necessity and proportionality of processing operations in relation to the purpose → partly given through the risk assessment mentioned here as an example, partly also (compulsorily) known from Art. 12 DSGVO.
  • Assessment of risks for the rights and freedoms of the affected persons. → See mentioned risk assessment as part of a complete DSFA.
  • Measures to help in higher risk situations. → If the calculated risk is not low, further action would be required here.

Let's start with the Discovery Probability. That is the probability that someone notices a data protection incident occurred. I choose as scale the numbers from 1 to 10, where 1 is the most favorable (i.e., highest) discovery probability. If employees are properly trained, they will quickly recognize and often report an incident (unless someone finds ChatGPT really great and doesn't want to rat it out). So, I choose the Value 4. After all, not everyone can see whether a chatbot's output contains personal data. Especially personal identifiable information is not always easy to spot. Also, larger text outputs may lead to not everything being read but rather copied and pasted blindly into a public report.

The entry probability is the probability with which a data protection incident occurs. The value 1 represents the most favorable case, i.e., very rare or perhaps never occurring data protection incidents. That an incident occurs appears to be very likely in the example scenario. After all, hundreds of documents could be searched daily. The automated pseudonymization function cannot work perfectly. I therefore choose the value 8.

The Severity of the Event indicates how much a data protection incident interferes with the rights and freedoms of individuals. It depends on the type of employee information involved in the example. If it's about work time losses, whose knowledge is often equated with health data, then the severity would not be too low to rate it with Value 8. But even the performance evaluation of employees would justify such a value. Much worse, from the point of view of employees, can hardly go.

Multiplying Discovery Probability, Entry Probability and Severity of the Event results in a value between 1 and 1000. The value 1 would be the result if all three criteria had been evaluated with the most favorable value 1 each. The value 1000 is the result of 10 x 10 x 10, the worst imaginable scenario.

Conducting a risk assessment as part of a data protection impact assessment is always a good idea. Either it's quickly done. Or it raises further questions. In any case, it becomes clear afterwards whether the use of a digital system appears sensible from a data protection point of view or not in general.

In this example, a value of 4 x 8 x 8 = 256 arises in the context of risk assessment. Now each responsible person must think for themselves at which threshold special measures should be described to handle the event appropriately well and quickly. Such a measure could be the least temporary ban on using ChatGPT or using it unrestricted.

I see the value 200 or perhaps also 250 as a threshold, from which thought should be given to drafting emergency plans or relief measures.

Conducting a risk assessment leads to ChatGPT being considered not suitable for this use case, "Searching corporate documents that may contain employee data", as the data recipients are not only persons within the company or possibly in the public domain, but also OpenAI, Microsoft, and all subcontractor processors.

A DSFA should therefore be at least performed as a risk assessment on the level of multiplication of three numbers. If a threshold is exceeded, further investigations should be considered. A poor value often speaks against the use of a system, making the DSFA then obsolete, which has done good service for decision-making.

A risk assessment can also be done schematically. For many services, large parts of the assessment can proceed similarly or alike, with possibly other values for the risk criteria. The effort required here seems to me mostly not very high or even very low.

Conclusion

The scope of a DSFA is determined by the level of risk a system poses to affected individuals. For third-party AI systems, I consider risk assessment as part of a DSFA always necessary. For our own AI systems, it can be quickly determined whether an extension is needed or not through multiplication of three numbers. In general, most in-house AI systems are non-critical unless they serve the evaluation of employees, health data etc. Further legal considerations outside of a DSFA are naturally not always necessary. The origin and nature of input data must be clarified as well as the legal basis for data processing.

Whether a DSFA is carried out by the data protection officer or by the responsible person is secondary, although it is primarily relevant for the DSBs in practice.

A full DSFA requires as additional work "only" the invention of relief measures and emergency plans. All other necessary contents are already given by Art. 12 DSGVO and a always sensible, often quickly completed risk assessment anyway

The question of a data protection impact assessment is less relevant when own or data-friendly systems of third parties are used and the risk can be quickly assessed as low. With own systems, especially with in-house AI systems, the question of data flows and recipients does not arise. They are known and can be limited at will.

For third-party services of usual suspects like Microsoft, Google or Zoom, a single DSFA could be used per provider that might only be supplemented with details per specifically employed service or plugin.

A DSFA reduces the risk but not, due to illegal use of services or plugins, receiving a complaint, warning or lawsuit. With data protection-friendly systems, a lot of work can be saved and legal security significantly increased. One often only has to want it. Alternatives are available for many application cases in abundance.

In conclusion, a few key aspects summarized:

  • A risk assessment is always sensible and often necessary anyway.
  • Information obligations from Art. 12 DSGVO must be available in any case. Whether they are created for the DSFA or due to data subject rights is secondary. The effort for the information obligations can therefore not be attributed to the DSFA.
  • General guarantees that the responsible parties must establish according to Art. 5 DSGVO, are independent of a DSFA. They have to be done anyway for all possible data processing. Example: A secure password for a ChatGPT access. A general password policy should be present in the company anyway.
  • Specific guarantees per service are also generally to be provided and have nothing to do with a DSFA. Reasoning: Less risky services require no DSFA. Nevertheless, for these there should be a guarantee, such as for data processing security.

Key messages

Data protection impact assessments (DPIAs) are required for new technologies like artificial intelligence that process personal data and could pose a high risk to people's rights.

AI systems should be carefully reviewed under data protection laws because they often process personal data and can have significant impacts on individuals.

Data protection impact assessments (DPIAs) should be conducted for AI systems, especially those processing personal data, to ensure compliance with GDPR and protect individual rights.

The text explains how to assess the risks of using AI systems like ChatGPT for data processing, emphasizing transparency and user control over personal information.

Using ChatGPT to search company documents containing employee data poses a high risk of violating privacy due to the potential for data leaks and the involvement of third-party processors like OpenAI and Microsoft.

A risk assessment is always a good idea, and often necessary anyway. Focus on using data-friendly systems and tools to minimize the need for complex Data Protection Impact Assessments (DPIAs).

Services with low risk don't need special safety checks, but they still need to be secure.

About

About the author on dr-dsgvo.de
My name is Klaus Meffert. I have a doctorate in computer science and have been working professionally and practically with information technology for over 30 years. I also work as an expert in IT & data protection. I achieve my results by looking at technology and law. This seems absolutely essential to me when it comes to digital data protection. My company, IT Logic GmbH, also offers consulting and development of optimized and secure AI solutions.

Artificial Intelligence: Technical and Legal Foundations