Data is a valuable resource, especially when it comes to business secrets. But confidential and personal data should not be given to third parties (like ChatGPT) for legal reasons. Own AI systems offer besides confidentiality the advantage of great flexibility and precise alignment with concrete requirements. A practice report.
Introduction
Since it's just simple, was a slogan of a mobile phone provider. Simple is what the new false often says for data-intensive applications. Data protection does not really interest many people. When it comes to employee data, vertraglich as confidentially secured data, patent foundations or other business secrets, then companies are more sensitized. Finally, no one wants legal trouble. The desire to bring the internal company knowledge out into the world is probably not so widespread.
Artificial Intelligence: The legal approach examines what may be permitted and clarifies risks. The technical approach provides data-friendly systems and resolves many legal issues on its own.
Constructively acting rather than arguing is a good strategy, I think. Lawyers still have enough to do even then.
It's easy to use ChatGPT, but some people make it too simple to their own detriment. This already shows that thinking is harder than doing something wrong or suboptimal. Even greater efforts are accepted if they are only slight, but often repeated. Rather 100 times a small effort with a high overall expenditure than 1 time a medium-sized effort with a significantly lower overall expenditure.
Recently, Meetings as a provider of video conferencing software formulated new terms of use. With this, Zoom grants itself the right to use all data received in Zoom video conferences almost arbitrarily. Included is also the dissemination of your data, including transcripts and use for machine learning ("training an AI"). This would not have happened with a data-friendly solution from Germany. Equally, it would not have been a problem with your own system. Now all Zoom users potentially have a problem.
All Zoom users potentially have a problem because they allegedly prefer free third-party systems instead of data-friendly solutions.
Thanks to Zoom for the decision-making help.
If you don't make it easier than easy, at least use the ChatGPT interface through your own program. This way many applications can be created. ChatGPT brings with it, in addition to remarkable abilities, several incurable problems:
- ChatGPT is very slow.
- Most of ChatGPT's data is irrelevant for business applications (hindering ballast, promoting hallucinations, slowing down the system, increasing error susceptibility).
- All data lands with OpenAI and thus with Microsoft.
- Data is not secure at ChatGPT (see late added opt-out instead of consent, data leak, American company policy etc.).
- ChatGPT is based on outdated general knowledge.
- ChatGPT is not familiar with your company's documents and hopefully will never learn them.
- ChatGPT costs money, depending on the number of processed text pieces (tokens). Uploading and analyzing a larger PDF will already make you poorer. Incorrect programming (infinite loop or recursion) will quickly ruin any budget.
- ChatGPT is not infinitely scalable.
If your inputs are also used for the training of a third-party AI model or for fine-tuning, then privacy and confidentiality cannot be guaranteed anymore. A language model learns not only grammar and structure, but also takes in knowledge. The resulting shortcomings are more annoying and counterproductive than a legal problem. This means that these problems cannot be legally resolved.
Offline-AI as a solution for companies and authorities.
Further information. ([1])
Similar things can be said about image generators like Dall-E or Midjourney. Many of these generators are based on an approach called Stable Diffusion. Almost all relevant methods of this kind use the LAION dataset. This one has used the Common Crawl data dump to find websites that embed images along with image descriptions. Common Crawl, in turn, is a massive dump of nearly any website. If one of your images has landed in the image dataset, it's not in its pure form. Rather, your company image (logo, product image etc.) has ended up in the artificial neurons of a third party's AI dataset in structural storage. Getting that image out again is hardly possible. Rather, the AI model would have to be recalculated. Whether the owner of the AI model will do this is questionable. After all, training is an extremely computationally intensive task with demanding data acquisition.
Proprietary AI systems
All the problems mentioned above are yours when you use your own AI system. I call this type of systems local AI systems or autonomous AI systems. These systems do not require an internet connection and could, in the best case, stand under your desk.
These benefits have in-house systems of Artificial Intelligence:
- Full Data Control: You decide which training data or pre-trained AI models are used.
- Ask your data and not internet data: Feed your company documents and media into it.
- High Speed: Anyway, your system will be faster than ChatGPT if you want it to be. The number of your users will be significantly lower than those of popular AI platforms. Moreover, you can reduce the data volume significantly.
- Customizable at will: More on this below.
- A wide range of application scenarios: Semantic search,text understanding, question-and-answer assistants, image generators, audio transcription, and many more.
Here's an example from practice, what is possible with a local system for your company. The example runs on a low-cost server and works. It is however still in development and will look much more than currently at the end. The pending completion is no big deal and only has something to do with my prioritization.




My name is Klaus Meffert. I have a doctorate in computer science and have been working professionally and practically with information technology for over 30 years. I also work as an expert in IT & data protection. I achieve my results by looking at technology and law. This seems absolutely essential to me when it comes to digital data protection. My company, IT Logic GmbH, also offers consulting and development of optimized and secure AI solutions.
