Drücke „Enter”, um zum Inhalt zu springen.
Hinweis zu diesem Datenschutz-Blog:
Anscheinend verwenden Sie einen Werbeblocker wie uBlock Origin oder Ghostery, oder einen Browser, der bestimmte Dienste blockiert.
Leider wird dadurch auch der Dienst von VG Wort blockiert. Online-Autoren haben einen gesetzlichen Anspruch auf eine Vergütung, wenn ihre Beiträge oft genug aufgerufen wurden. Um dies zu messen, muss vom Autor ein Dienst der VG Wort eingebunden werden. Ohne diesen Dienst geht der gesetzliche Anspruch für den Autor verloren.

Ich wäre Ihnen sehr verbunden, wenn Sie sich bei der VG Wort darüber beschweren, dass deren Dienst anscheinend so ausgeprägt ist, dass er von manchen als blockierungswürdig eingestuft wird. Dies führt ggf. dazu, dass ich Beiträge kostenpflichtig gestalten muss.

Durch Klick auf folgenden Button wird eine Mailvorlage geladen, die Sie inhaltlich gerne anpassen und an die VG Wort abschicken können.

Nachricht an VG WortMailtext anzeigen

Betreff: Datenschutzprobleme mit dem VG Wort Dienst(METIS)
Guten Tag,

als Besucher des Datenschutz-Blogs Dr. DSGVO ist mir aufgefallen, dass der VG Wort Dienst durch datenschutzfreundliche Browser (Brave, Mullvad...) sowie Werbeblocker (uBlock, Ghostery...) blockiert wird.
Damit gehen dem Autor der Online-Texte Einnahmen verloren, die ihm aber gesetzlich zustehen.

Bitte beheben Sie dieses Problem!

Diese Nachricht wurde von mir persönlich abgeschickt und lediglich aus einer Vorlage generiert.
Wenn der Klick auf den Button keine Mail öffnet, schreiben Sie bitte eine Mail an info@vgwort.de und weisen darauf hin, dass der VG Wort Dienst von datenschutzfreundlichen Browser blockiert wird und dass Online Autoren daher die gesetzlich garantierten Einnahmen verloren gehen.
Vielen Dank,

Ihr Klaus Meffert - Dr. DSGVO Datenschutz-Blog.

PS: Wenn Sie meine Beiträge oder meinen Online Website-Check gut finden, freue ich mich auch über Ihre Spende.
Ausprobieren Online Webseiten-Check sofort das Ergebnis sehen

Understanding AI hallucinations: Causes & examples of artificial fiction

0
Dr. DSGVO Newsletter detected: Extended functionality available
More articles · Website-Checks · Live Offline-AI
📄 Article as PDF (only for newsletter subscribers)
🔒 Premium-Funktion
Der aktuelle Beitrag kann in PDF-Form angesehen und heruntergeladen werden

📊 Download freischalten
Der Download ist nur für Abonnenten des Dr. DSGVO-Newsletters möglich

AI hallucinations are false statements made by an AI based on correct information. The result is a plausible statement that could be correct, but is not. An explanation for hallucinations is possible if you look at how AI language models encode information and its meaning, for example.

Introduction

Everyone has probably heard of hallucinations in the context of artificial intelligence.

Hallucinations are so fascinating because they involve false statements that have the potential to sound plausible. What sounds plausible is often taken as true or correct. That's exactly where the danger of hallucinations lies.

Hallucinations are false statements that are based on correct background knowledge. False statements that arise from incorrect knowledge or misunderstandings due to a poorly formulated question are not hallucinations.

The basis for hallucinations are semantic vectors. Let's take a closer look at what that is.

How hallucinations develop

A semantic vector is the carrier of a meaning. What is a meaning? A meaning is a subjectively defined statement. You could also say that everyone defines their own truth.

I make the world for myself,

widewide as I like it.

Pippi Longstocking

Truth is (always?) something subjective. Even physical theories such as the theory of relativity and quantum mechanics, which are probably among our best theories, are interpreted purely subjectively.

Humans find the meaning of objects and circumstances through cultural conditioning, education, and personal observation. The computer (“the AI”) searches for meaning based on training data, which is essentially nothing other than that.

Meaning is found through optimization of a mathematical function. So will probably also do humans, only that the optimization in humans can also intentionally contain contraproductive elements (!?) on purpose.

The following image shows a simplified representation of how the meaning of the sentence “The dog is running in the park” is encoded in an AI model.

AI models (here language models) store the meaning of information in the form of (semantic) vectors.

The sentence just mentioned has numerous facets of meaning. For example, it says something about a living being (“dog”), addresses an activity (“running”), contains a tonality (here: neutral emotion) and describes a place (“park”). Incidentally, the term “park” can have several meanings. Here it refers to the green area in a city. Another meaning that is not meant here, but would be possible, is the imperative of “parken”.

The illustration shows only two dimensions (2D) for simplicity's sake. In reality, AI models work with multiple hundreds of dimensions, such as 512. This high number of dimensions serves to capture various meaning facets of a statement, thus being able to represent them.

The statement “The dog is running in the park” is true. How can this result in a false statement?

Many AI systems generate results by representing the meaning of the input as a vector (or vectors) and do the same for stored background knowledge. Now you can calculate with vectors – anyone who is fit in math or can remember better times will know this.

The following figure schematically shows the addition of two semantic vectors.

Creation of hallucinations by adding two vectors as carriers of meaning.

You can see two statements that are both true:

  • Albert Einstein received the Nobel Prize for Physics -> True statement
  • Albert Einstein developed the theory of relativity -> True statement

If you now add the vectors of these two statements, you get the vector shown in red in the image, which represents a false statement. The addition of the two arrows, green and blue, is illustrated in the top right of the figure in miniature form. The dashed lines also indicate how the red arrow is created from the green and blue arrows.

In the AI model, through similarity search with the red result vector, then the statement that best matches the result vector is generated. As a result, the false statement: “Einstein received the Nobel Prize for the Theory of Relativity”.

It is hard to believe that one of the most outstanding geniuses in human history did NOT receive a Nobel Prize for one of the most outstanding theories in human history. This could be seen as impertinence or as a sign of the global stupidity of mankind. After all, the following achievements are based on Einstein's theory of relativity:

  • GPS and Navigation Systems: Without considering relativistic effects, GPS satellites would already be off by several kilometers after just a few hours. The clocks on the satellites run faster due to the weaker gravity.
  • Particle accelerators like the Large Hadron Collider at CERN only work because relativistic effects are taken into account when accelerating particles up to nearly the speed of light. These facilities have led to the discovery of the Higgs boson and many other elementary particles.
  • Medical Imaging uses the theory of relativity in Positron Emission Tomography (PET). Here, antimatter particles are used whose behavior Einstein had predicted beforehand.
  • Astronomy and Cosmology were revolutionized. The theory enabled the prediction and later observation of Black Holes, Gravitational Waves, and helped in understanding the expansion of the Universe.
  • Quantum computing benefits from relativistic concepts, particularly in the development of highly precise atomic clocks and quantum sensors.
  • Synchronization of computer networks and financial trading systems now takes into account relativistic time effects for precise time measurements over large distances.

At best, the Nobel jurors were intellectually overwhelmed at the time and did not dare to give their blessing to a theory that was possibly wrong in their eyes.

Back to hallucinations:

The above-mentioned false statement therefore came about by combining two correct statements. This is another reason why it sounds so plausible and can distract from reality (as we have defined it).

This is how hallucinations arise.

In addition, the responses of AI language models are often so appealing and structured that they appear professional. This reinforces the impression that the false statement (hallucination) is correct.

Vectors can, by the way, be converted back into text (or other data types that are also called modalities) again. For this purpose, so-called embeddings are used. These are mathematically determined vector representations of statements.

Causes of hallucinations

There are many possible causes of hallucinations. Here is an excerpt of some possibilities for false statements.

Statistical interpolation instead of factual knowledge

Language models do not store information as discrete facts, but as statistical weightings between different concepts. When the system responds to a query, it combines these weightings to generate the most likely answer. In doing so, the model may make connections between concepts that were not explicitly present in the training data.

It should be noted that human existence is based entirely on statistics. Some people are outraged by this statement. They should take a look at quantum physics (at a general level).

Statistics is the basis for intelligence, not the reason why there would be no intelligence.

Incomplete or contradictory training data

The quality of the training data significantly influences the tendency to hallucinations. If a model is trained on data that contains inaccuracies, contradictions or gaps in knowledge, these problems can occur more frequently in the subsequent application. In addition, different sources can provide contradictory information on the same topic.

It is no different with humans.

Confidence without understanding

AI systems have no awareness of their own knowledge limits. They cannot distinguish between certain knowledge and assumptions and do not explicitly express uncertainties. Instead, they generate answers with an apparent certainty that does not correspond to the actual reliability of the information.

Robots will soon close these knowledge boundaries. Like humans, they will explore the environment and gather their own environmental experiences.

Special risk areas

Certain subject areas are particularly prone to hallucinations. These include current events that occurred after the training date, highly specialized subject areas with limited publicly available information, and areas where precision is critical, such as medical or legal topics.

As far as law is concerned, hardly any “normal” person understands what is written in legal articles, judgments or in some laws and regulations. Incidentally, this is intentional: lawyers want to keep to themselves in some areas and therefore use a “language of power”. This information does not come from the author of this article, but from a lawyer.

Overview of AI hallucinations: Causes, recognition, risk, avoidance options.

Conclusion

Hallucinations arise through a combination of true knowledge. They bypass the truth. Hallucinations can be understood by considering semantic vectors as carriers of meaning.

But such hallucinations can also take place in an analogous way in neural networks, the foundation of many current AI models. After all, a neural network can represent a vector because it is so plastic that it can carry and represent any kind of information.

As far as the AI Act is concerned, it is better to try to reduce the risk of hallucinations. It is even better to avoid hallucinations.

More on hallucinations can be read in the soon-to-be-published Collection “Risk Analysis-AI”, which is being edited by Prof. Wanderwitz, this contribution is based in part and in clearly cut form on the essay that is intended for the collection.

Hallucinations can be avoided, if you make an effort. With ChatGPT, you won't get it right, but with a optimized AI for companies and other organizations, you will. In a customer project, we were able to build a reliable chatbot for a complex field of knowledge. The chatbot answered correctly in many cases. It also knew when to answer correctly. In the few doubtful cases, the person in front of the screen received the information that the answer should be checked. In any case, the sources of the answer were indicated. With the answers known to be correct, a quality seal was issued by the chatbot (“we stand behind this answer”).

About the author on dr-dsgvo.de
My name is Klaus Meffert. I have a doctorate in computer science and have been working professionally and practically with information technology for over 30 years. I also work as an expert in IT & data protection. I achieve my results by looking at technology and law. This seems absolutely essential to me when it comes to digital data protection. My company, IT Logic GmbH, also offers consulting and development of optimized and secure AI solutions.

AI programming: software development easier, faster, more effective