Why is the EU lagging behind in the development of AI systems?

The EU is falling behind because data protection laws like GDPR severely restrict the use of big data for training AI models. This leads to a shortage of the necessary training data, which is essential for powerful AI systems.

What role do data protection laws play in the development of AI models in the EU?

Data protection laws pose a significant obstacle, as they heavily regulate the use of mass data necessary for training AI models. The strict rules hinder the development of competitive AI systems within the EU.

Why is German copyright problematic when using data from German websites to train AI models?

German copyright law requires an explicit rejection from the website operator if their content is used for AI training. The theft of imprint and terms of service pages is standard, which makes machine data collection impractical and legally risky.

What problem does the EU's current definition of AI pose?

The EU's definition of AI is problematic because, for example, it does not consider ChatGPT as intelligent, even though the model often provides better answers than the average human. This definition ignores the actual performance and autonomy of modern AI systems.

According to the article, why are AI models fundamentally problematic?

The article argues that most AI models are based on illegally processed data, and therefore are factually illegal. The lack of enforcement and the insufficient response from authorities further exacerbate this problem.

How can the use of AI systems be ensured to minimize legal risks?

To minimize legal risks, the use of AI systems should be carefully monitored. This can be achieved by operating AI systems offline, for example, with GPU servers or GPU clusters, to comprehensively control inputs and outputs.

Why are AI models currently so restricted in the EU?

The current situation is caused by strict data protection laws such as GDPR, which significantly restrict the use of personal data for training AI models.

What measures are proposed to improve AI development in the EU?

It is recommended that faster and stricter sanctions be imposed on AI providers, particularly those outside the EU, as well as the removal of bureaucratic hurdles to improve law enforcement.

Sichere KI, digitaler Datenschutz & Website-Compliance

Artificial intelligence (AI) is based on large data sets. The EU protects personal or author-related data very well, which is good in itself, but it hinders the development of competitive AI systems. Further reasons speak against high-performance language models made in Germany. Can this dilemma be resolved?

Introduction

The most common application cases for AI are probably language models (LLMs) and image models. Video generators or object recognizers may soon be added. This article therefore focuses on LLMs for simplicity's sake. The findings are largely or entirely transferable to many other model types, such as classifiers or medical diagnosis systems.

Currently all competitive language models come from countries outside the EU. Mistral may be a small exception, although their language models are not quite at the forefront.

Aleph Alpha is no exception, as its new model Pharia-1 performs only moderately well in benchmarks, to put it politely.

Some believe the EU could still catch up. That won't happen. Because for powerful language models, there's only one thing that's truly necessary: data. Nothing else. No personnel. No technology. No money. No time. Nothing except a very large amount of, ideally representative, data is missing. Of course, the data must be legally compliant. Thus, even fewer data sets are available.

For very good language models, there is exactly one important ingredient missing in Europe:

Data.

Everything else is always available: One person, one or a few servers, the best program code for AI training.

The reasons for the EU lagging behind in AI are literally enshrined in law.

Data protection laws

Data protection is very important. Numerous scandals prove this, scandals that primarily originate outside of Europe. Here are a few examples:

In the US, a very important presidential election was influenced by the illegal use of user analysis data from Google and Facebook (Meta) ("Cambridge Analytica").

Microsoft is being referred to as a security risk in the US by prominent entities, due to its lack of data security. ([1])

Meta is not better than Microsoft, but rather worse. Because Microsoft at least earns money with products as well as data, while Meta has nothing except user data. These user data are maximally monetized. Data protection laws like GDPR are thereby more of a hindrance. ([1]) ([2])

Similar negative reports can be made about Google. That sometimes criminals are caught because US security authorities evaluate the use of Google products does not really calm down. Who as an innocent citizen is at the wrong place at the wrong time will quickly be labeled a criminal and rot unscholarly in prison or even have to reckon with the death penalty.

The General Data Protection Regulation (GDPR) as Regulation has a very good basic idea. It was issued when AI was not yet an issue. It is sensible in itself. But why is it actually not applied? German data protection authorities effectively only sanction in homoeopathically detectable doses.

The General Data Protection Regulation (GDPR) essentially only allows the use of personal data for AI training on the basis of a legitimate interest (cf. Art. 6 Abst. 1 GDPR). Consent is ruled out in cases of mass data. A contract would be legally difficult to establish for mass data.

Worse still: For authorities, the legitimate interest as a legal basis is NOT available (available in the aforementioned Article 6 Paragraph 1 GDPR according to letter f). Authorities can thus practically not train AI-systems. This is particularly unfortunate, because authorities would have many valuable data that could also benefit citizens again.

The General Data Protection Regulation (GDPR) applies "only" to personal data, which also includes pseudonymised data (Article 4 No. 1 GDPR). The GDPR does not apply to anonymous data.

However, if you put it somewhat hyperbolically, there are practically no truly anonymous data:

Anonymous data is data for which the original data is no longer accessible (very rare case).
Anonymized data are not as representative as original data and therefore less valuable for AI training.
Anonymisation itself is a data processing operation. In practice, authorities are virtually prohibited from carrying it out. Others can practically only carry it out if there is a legitimate interest involved, which is difficult to assess.

We're talking about practice here. What's valid in theory doesn't interest any company in the world that wants to solve concrete problems. Theoretical discussions leave out one thing, namely the practical relevance.

In fact, mass data alone cannot flow into a AI system for reasons of data protection, for example, for training the AI . ([1])

This also applies to public data on the internet. The following cases are problematic: (Note: I kept the "" untouched as per your request):

Someone writes something about another person. This could be a statement of fact, or it could be defamation. The other person does not want this information to be public knowledge, and certainly not stored in a AI language model.
A person publishes information about themselves. An AI stores this information because a crawler reads the person's website. Later, the person decides to withdraw the information and demands it from the AI operator as well. However, data from AI models cannot be deleted. Try erasing an information from your head. You can't. Your brain and the AI brain are both neural networks. There's no difference here. Believe it or not. What's important is that information cannot be removed from AI models.

Repetition: Due to data protection reasons, mass data cannot be used for AI training in the EU. This is at least an unwanted side effect of the otherwise very sensible GDPR.

Copyright

German copyright law allows under § 44b UrhG training AI with works protected by copyright. These works may even be temporarily stored for AI training.

Read full article now via free Dr. GDPR newsletter.

More extras for subscribers:
Offline-AI · Free contingent+ for Website-Checks

Already a subscriber? Click on the link in the newsletter & refresh this page.

↓

Subscribe to Newsletter