While inaccuracies are accepted or often insignificant in everyday language, a precise understanding of the meaning of a statement is fundamental for lawyers. Legal texts can be analyzed with an AI. Can this be done satisfactorily with generic AI systems such as ChatGPT? What alternatives are there?
Update
One useful application is the summary of legal texts. Optionally in formal or citizen-friendly language up to the "language of the street". With our own AI language models that run on our own AI servers, this was implemented specifically for Hessian laws and for the GDPR.
Result for the GDPR regulation text.
Motivation
Microsoft's Bing search engine uses a language model from the OpenAI database. Microsoft recently entered into a partnership with OpenAI. The Bing search engine responds with false statements, even though it has access to the best hardware and the best software. The reason is probably that Bing is supposed to be universally usable and not specific to your company.
Microsoft Bing's highly developed language model responds to a first question and the semantically identical and almost identical second question with the opposite answer in each case, and incorrectly in both cases.
See the following examples. After all, the answer to Bing is available very quickly, no real consolation.
Here is an example of the failure of Bing's advanced, unspecialized language model. The question is suitable to be answered in court by an expert witness. I myself had already done this purely by chance.
Can the location of a server be determined by its IP address?_
Answer from Bing (as of 31.08.2023): Yes. By the way, newer versions of Bing or Copilot are also unable to answer reliably.

This answer is incorrect. An IP address is not suitable for reliably determining the location of a server. In fact, the assignment of the IP address to a server can change at any time. To clarify: This is about servers, not about Internet connections of private PCs!
Now the same question is asked of Bing. However, a single word is exchanged, namely "using" for "using".
The question now is: can the location of a server be determined using its IP address?
The answer should be the same, but it is not (in the truest sense of the word, because Bing answers "not").

This answer is also wrong, because the reason given after the short answer "not" is also wrong. Even with a court order, it is often impossible to determine which IP address a server was assigned to at time X. This is because, if we take Google as an example of an operator of hundreds of thousands of servers, Google would have to log the IP address of each server at all times. It is not clear whether this takes place. In any case, it seems unlikely. Due to massive load balancing, the server network of large operators is highly dynamic. In addition, Bing gives a reason that does not match the question in part. Furthermore, "not" as a short answer does not fit the reasoning.
Introduction
When using third-party systems like those from Microsoft or OpenAI, questions about legality also arise alongside the quality of results. Recently there was a charge against openJur, because they had published a previously published judgment on their own website as well. Because mistakenly the full name of a person was mentioned in the judgment. To feed such data or business secrets or other confidential data into a chatbot certainly does not increase legal security.
Data-friendly AI systems not only significantly increase legal certainty, but often also the quality of the results.
Refers to autarkic AI systems.
Lawyers have often discussed the extent to which artificial intelligence can help to understand judgments more quickly. The NLP task of text summarization, for example, is suitable for this purpose. NLP stands for "Natural Language Processing" and attempts to capture the meaning of natural language. NLP approaches have been around for a long time.
What's new is that with powerful language models (LLM = Large Language Model) complex texts can now be processed in unprecedented quality. This makes it possible, for example, to program a question-answer assistant for this blog. The results are astonishing. However, unwanted statements must be prevented by intervening in the system. Often, so-called hallucinations are responsible for undesirable results.
Hallucinations arise because the general knowledge of a language model is overlaid with specific knowledge from the context. The context, for example, are all contributions on Dr. GDPR. A language model learns not only the grammar of a language like German, but also acquires factual knowledge in the process. Hereby, false facts can be taken up. A good example is the widely spread, but fundamentally false statement that Cookies are text files.
The following section explains the difficulties involved in analyzing and machine understanding legal texts. These difficulties apply to all types of texts, except that the highest possible accuracy is required, especially in the legal field.
The question of whether general AI systems such as ChatGPT can be suitable for processing legal texts properly is then discussed.




My name is Klaus Meffert. I have a doctorate in computer science and have been working professionally and practically with information technology for over 30 years. I also work as an expert in IT & data protection. I achieve my results by looking at technology and law. This seems absolutely essential to me when it comes to digital data protection. My company, IT Logic GmbH, also offers consulting and development of optimized and secure AI solutions.
