Drücke „Enter”, um zum Inhalt zu springen.
Hinweis zu diesem Datenschutz-Blog:
Anscheinend verwenden Sie einen Werbeblocker wie uBlock Origin oder Ghostery, oder einen Browser, der bestimmte Dienste blockiert.
Leider wird dadurch auch der Dienst von VG Wort blockiert. Online-Autoren haben einen gesetzlichen Anspruch auf eine Vergütung, wenn ihre Beiträge oft genug aufgerufen wurden. Um dies zu messen, muss vom Autor ein Dienst der VG Wort eingebunden werden. Ohne diesen Dienst geht der gesetzliche Anspruch für den Autor verloren.

Ich wäre Ihnen sehr verbunden, wenn Sie sich bei der VG Wort darüber beschweren, dass deren Dienst anscheinend so ausgeprägt ist, dass er von manchen als blockierungswürdig eingestuft wird. Dies führt ggf. dazu, dass ich Beiträge kostenpflichtig gestalten muss.

Durch Klick auf folgenden Button wird eine Mailvorlage geladen, die Sie inhaltlich gerne anpassen und an die VG Wort abschicken können.

Nachricht an VG WortMailtext anzeigen

Betreff: Datenschutzprobleme mit dem VG Wort Dienst(METIS)
Guten Tag,

als Besucher des Datenschutz-Blogs Dr. DSGVO ist mir aufgefallen, dass der VG Wort Dienst durch datenschutzfreundliche Browser (Brave, Mullvad...) sowie Werbeblocker (uBlock, Ghostery...) blockiert wird.
Damit gehen dem Autor der Online-Texte Einnahmen verloren, die ihm aber gesetzlich zustehen.

Bitte beheben Sie dieses Problem!

Diese Nachricht wurde von mir persönlich abgeschickt und lediglich aus einer Vorlage generiert.
Wenn der Klick auf den Button keine Mail öffnet, schreiben Sie bitte eine Mail an info@vgwort.de und weisen darauf hin, dass der VG Wort Dienst von datenschutzfreundlichen Browser blockiert wird und dass Online Autoren daher die gesetzlich garantierten Einnahmen verloren gehen.
Vielen Dank,

Ihr Klaus Meffert - Dr. DSGVO Datenschutz-Blog.

PS: Wenn Sie meine Beiträge oder meinen Online Website-Check gut finden, freue ich mich auch über Ihre Spende.
Ausprobieren Online Webseiten-Check sofort DSGVO-Probleme finden

Artificial intelligence: Personal data in AI models

0
Dr. DSGVO Newsletter detected: Extended functionality available
More articles · Website-Checks · Live Offline-AI
📄 Article as PDF (only for newsletter subscribers)
🔒 Premium-Funktion
Der aktuelle Beitrag kann in PDF-Form angesehen und heruntergeladen werden

📊 Download freischalten
Der Download ist nur für Abonnenten des Dr. DSGVO-Newsletters möglich

Many are calling for the regulation of AI applications. Ideally, mass data for training AI models should no longer contain personal data, even if it comes from public sources. The Federal Data Protection Commissioner, for example, is calling for this. What does this mean in practice?

Introduction

A AI-model is an electronic brain, representing a neural network. The connections between neurons represent knowledge, entirely analogous to the human brain. The knowledge is fed in through reading millions or billions of online freely available documents. These documents include especially web pages.

In many of these texts that feed into AI models, personal data is present. These data thus land in the training data of an artificial intelligence. Moreover: Outputs generated by a chatbot based on this training data can also contain personal data.

Some people, such as Germany's Federal Data Protection Commissioner, find it problematic that this personal data ends up in AI models. This data in AI models raises several fundamental questions:

  1. Does the data owner (the data subject) consent to their personal data ending up in a particular AI model? More precisely (as long as there is no requirement for consent):
  2. How can a data owner block their data from being used in AI models (opt-out)?
  3. How can data from an existing AI model be deleted retrospectively?

These questions give rise to a number of problems in practice, which are discussed below.

When does personal data exist?

Whether a data value is personal or not can very often not or not reliably be determined. A person may often recognize the proper names of people as such, but certainly not always. A machine (AI) can do this even less well.

Directly personal data, such as names or postal addresses, cannot be reliably identified by machines.

If an vehicle registration plate, a phone number or a company name is personal, no one knows (except for someone who is intimately familiar with the vehicle, phone number or company). A machine can therefore not know whether "Maier GmbH" is a personal data value. The name of a GmbH is personal if it can be inferred directly or indirectly to a person (Art. 4 No. 1 GDPR). A one-person GmbH is apparently personal. The name of a GmbH with 50 employees is apparently not personal. If the name of a GmbH with 50 employees is mentioned in conjunction with an employee who is 1.98 meters tall ("our company's tallest employee"), then this combined statement consisting of company name and height measurement of an employee is to be considered as personal.

Automated data in its entirety can never be reliably classified as personal or non-personal.

Algorithms therefore always involve considerable uncertainties when recognizing personal data.

The previous example in particular makes it clear that nobody and nothing can reliably see whether data is personal or not. Even a telephone number cannot directly tell whether it belongs to a person or a company and whether the company consists of one person or several.

How can data be blocked from use in AI models?

The short answer is: Not at all. At least this is the current state of affairs. There simply no standard for protecting data on websites from unauthorized access. Reading a public website is obviously always possible. This is exactly what a website is meant to be: accessible to as broad a public as possible. Robot programs (Crawlers, Scanners) can hardly be distinguished from a human reader. Many websites do not even have the possibility of trying this in any technical way. That's where we stand with technology today.

The only currently practical way is to use the robots.txt file. This file allows website operators to define which search engines are allowed to access their content and which are not. Meanwhile, this file is also respected by some AI applications that scrape content.

It is technically not possible to block your own data from being used in AI models.

As of today and until further notice.

Many AI applications are not interested in this robots.txt file or any exclusion requests from website owners anyway. Furthermore, it is about wishes rather than technically hard definitions. Even if ChatGPT says that it respects the wishes of a webpage regarding blocking its content against AI use by ChatGPT, this is purely a matter of trust. Whoever still trusts OpenAI and ChatGPT should recall the facts:

  1. Italy's data protection authority has banned ChatGPT because OpenAI appears to have stored data illegally, such as user input.
  2. OpenAI did not request consent from the user, but merely offered an opt-out option.
  3. OpenAI now advertises ChatGPT Enterprise and the benefit "Get enterprise-grade security & privacy". This means: "We only adhere to data protection rules if you buy the Enterprise version".

Those who trust companies like OpenAI, Google or Microsoft as soon as a reassuring report comes out, although these companies have previously shown numerous questionable behavior, are acting at least not rationally, but driven by desire.

Data from crawling databases, such as The Pile or Common Crawl or C4, initially act independently of ChatGPT, but are then used by ChatGPT and other AI models for training large language models. In this way, one problem becomes a multiple problem, namely one for each data reader.

How is data deleted from an existing AI model?

The short answer is: not at all. In any case, to date there is no mathematical procedure that can be used to delete data from an AI model with surgical precision (or at all).

The currently only way to delete data from an existing AI model is to throw away the model and train it completely anew. When retraining, the data to be deleted are no longer taken into account for training.

Data cannot be deleted from an existing AI model.

As of today and until further notice.

Read full article now via free Dr. GDPR newsletter.
More extras for subscribers:
Offline-AI · Free contingent+ for Website-Checks
Already a subscriber? Click on the link in the newsletter & refresh this page.
Subscribe to Newsletter
Computer-generiertes Bild
Alle Bilder in diesem Beitrag wurden von einem Computer-Programm erzeugt. Verwendet wurde das selbst entwickelte KI-System von Dr. DSGVO, ähnlich zu Midjourney. Die Bilder dürfen auf Webseiten frei verwendet werden, mit der Bitte, eine Verlinkung auf diesen Blog zu setzen.
About the author on dr-dsgvo.de
My name is Klaus Meffert. I have a doctorate in computer science and have been working professionally and practically with information technology for over 30 years. I also work as an expert in IT & data protection. I achieve my results by looking at technology and law. This seems absolutely essential to me when it comes to digital data protection. My company, IT Logic GmbH, also offers consulting and development of optimized and secure AI solutions.

Artificial intelligence: hype and overrated or reasonable expectations?