Why is it advisable to use your own AI systems?

To protect sensitive data such as trade secrets and confidential information from access by cloud providers like ChatGPT or Google. Local execution ensures more control and data security.

What types of response systems are provided by the AI application?

The AI application can provide both extractive answers (word-for-word quotes) and abstract answers (new formulations based on multiple documents). This enables efficient summarization and analysis of information.

What hardware is required to run AI systems?

For demanding AI systems, especially those that generate complex responses, graphics cards with a large amount of memory (up to 400 GB) are required. The cost of this hardware can be significant, but there are ways to minimize the expenses.

What is Artificial Intelligence?

Artificial intelligence (AI) is the development of computers and software that possess human-like abilities, such as learning, problem-solving, and autonomous decision-making. The development of AI requires complex architectures and specialized mathematical methods.

What is an IP address?

An IP address is a numerical sequence assigned to each device on the Internet to distinguish it from other devices. IP stands for Internet Protocol and serves as a unique address for each device on the network.

Should a website have a cookie popup?

No, a website does not need to have a cookie popup. Cookies are just a tool to identify the user and to prompt the server to pass certain information to the user.

Cookies are tools for user identification and information passing to the server. Often, it is falsely claimed that cookies are text files, which is not correct.

Why does the unoptimized AI give the answer 'What is the answer to all questions?'

The unoptimized AI provided a circular response, referring back to the question itself to demonstrate the limits of its own knowledge base and to highlight the impossibility of a comprehensive answer.

Artificial Intelligence: Question-and-answer system for the Data Protection Blog Dr. GDPR

Sensitive data doesn't belong in foreign or American hands, such as ChatGPT, Microsoft's clouds, Google's or AWS'. How good that own AI systems are possible and affordable. Finally business secrets no longer have to be invited into ChatGPT or any cloud. An experiment for a question-answer assistant for this data protection blog, Dr. GDPR.

Introduction

If we didn't care about data protection so far, maybe we do now that our business secrets shouldn't be scattered all over the world. Perhaps there are legally binding confidentiality agreements for certain documents. Whether confidentiality is still granted when a document is uploaded to ChatGPT's or Google's cloud, I dare to doubt it.

Data-friendly: Secure for all kinds of data, whether personal data (data protection), confidential data or business secrets.
Data-friendly is more than data-protective.

Even the often despised data protection is once again on the minds of many. While search engines were allowed and are still allowed to process data without intervention, the same data from AI systems cannot be processed without a request from data protection authorities. Funny. It's probably also due to the possibilities offered by artificial intelligence, but just as much due to herd mentality (if one authority checks it, then we can do it too, without being seen as spoilsports, think some officials). Only that's why I find it understandable why the most inactive data protection federal state in the world (Hesse) also made a timid approach in the form of an inquiry to ChatGPT announced).

A frequent application case for using Artificial Intelligence is document searching. More demanding are question-answer systems or search engines that directly provide text summaries of hit documents. My plan was to create a find system for the Dr. GDPR Data Protection Blog, and that's data-friendly.

The search assistant for Dr. GDPR should provide an answer to natural language questions. Here is an example:

Does my website need a cookie popup?
The answer of AI is better than that of most people. Answer Dr. GDPR AI: see below.

As one can infer from the question posed, some questions are formulated differently than would be academically correct. Many ask whether something is in compliance with data protection, meaning most often whether a specific data processing is lawful according to the GDPR.

The answer should be given by my AI in its own words, based on the contributions that have appeared so far on Dr. GDPR. Hereby hallucinations should be avoided, as it's all about facts and legally relevant knowledge. Hallucinations are invented statements that do not exist. How hallucinations come into being, I will address specifically in a future contribution. One can explain them thoroughly and need not rely on speculation.

Prototype proves feasibility

That own AI systems can be programmed and run locally on their own servers, I have proven through a prototype. The simple way would have been one of the following possibilities:

Use the interface of ChatGPT
Throw a lot of money at the problem and bless the Americans (Cloud)
Throw no more money at the problem and buy expensive hardware.

Buying expensive hardware is a viable option for larger companies, but not for many SMEs. Therefore, I have chosen another Setup. When choosing the hardware, costs were taken into account. To this end, one must know that AI calculations take place on graphics cards instead. The graphics card is not used here to output images or text. Rather, the thousands of mini-processors of a graphics card are misused to perform computationally intensive work of an AI faster than a single Einstein processor of your still so good personal computer can do it. Unfortunately, graphics cards with a lot of main memory cost a lot of money. A graphics card with 48 GB of main memory cost 15,000 euros just a few months ago. For good AI models, however, rather 96 GB or even up to 400 GB of more expensive main memory of several graphics cards (not hard drive storage and not cheaper RAM of a computer!) are needed.

My AI systems, on the other hand, run on minimum hardware, if one understands the term in the context of Artificial Intelligence. An example: The search for (own) documents from the company's intranet via natural language questions works on a rented server of the mini-class. Of course, an own company server can also be used. This succeeds through exploiting optimization procedures that one buys through additional technical complexity. Once solved, the complexity problem is resolved.

Effective AI applications and language models

For Question-Answer Assistants, however, a bit more is needed than for intelligent document search. Not only should documents be found, but also contents from these documents should be extracted and presented as an answer. A simple way to do this is with an extractive answer. This is a faithful quote from the original text. More difficult and better are abstractive answer systems. They provide an answer in their own words and can even combine knowledge from several documents to deliver an answer in new words. The answer would not have been feasible with just one document. A person would have had to find, read, and intellectually process many documents. The AI takes this unpleasant, time-consuming, and above all, unachievable task for many people away and solves the problem.

My AI systems claim to be data-friendly. Furthermore, they should run on hardware as inexpensive as possible. Both are possible, as practice shows.
Deeper tested application cases so far: Document search, text understanding, image generation, image analysis, audio applications.

When we talk about searching and summarizing documents, we usually mean documents and answers in German language. To put it very briefly: German is unfortunately not a world language. That's why it's much harder to process German texts with an AI application than English or Chinese texts (where the latter would be extremely difficult for me too).

For my AI system, therefore, an electronic brain („model“) is needed that understands German and „speaks“. This increases the requirements for a AI architecture significantly. But this problem is also solvable, as I have found out. The size of the required AI model due to the German requirement would not be usable on affordable hardware.

For using powerful AI systems on servers that are both affordable and available in Germany (data protection! business secrets! confidentiality!), some tricks are needed. When creating the AI solution, I felt like I was at "Jugend forscht!

Read full article now via free Dr. GDPR newsletter.

More extras for subscribers:
Offline-AI · Free contingent+ for Website-Checks

Already a subscriber? Click on the link in the newsletter & refresh this page.

↓

Subscribe to Newsletter