Artificial Intelligence: Frequently Asked Questions and Answers

A guide for companies and administrations to better assess and plan AI projects. In several AI seminars, some questions from AI enthusiasts have emerged as particularly relevant. This article names questions from practice and answers them practically.

Introduction

Someone who wants to solve a problem with AI often doesn't know if it's possible or how much effort it will take. The following practical collection of answers to questions about AI solutions shows what is possible.

The questions come from a recently held webinar, but also from other contacts with customers and interested parties. The answers are based on experiences from projects in which customer wishes were implemented.

Many business applications can be solved excellently and economically with proprietary AI.

The projects all included a relevant share of AI programming. Open-source AI models were always used, and occasionally interfaces to ChatGPT, Claude 3, Command R+ or other commercial language models were also employed.

The answers to the questions raised are intended to provide guidance on how to better identify, assess, plan, and execute AI projects.

Questions from practice about AI and answers

In bold is the question stated. Below it, in normal text flow, is the answer.

Which application case is particularly well-suited for entry into AI?

Due to the very good results and the low hardware requirements, the following use cases are very suitable for getting started with AI solutions – without having to use ChatGPT!

Knowledge Search: Search through own documents or tickets in the ticket system.
Search Function Website: Intelligent search function for texts and PDFs on your own website.
Complaint Management: Based on previous cases give a recommendation to the employee as to how a current complaint should be best handled.
Damage Regulation: Analogous to Complaint Management.
Intelligent Internet Search: Retrieve search results from a search engine (via interface/API) and intelligently sift through them. Irrelevant hits are filtered out of 1000 matches.
FAQ Response System: Either question-answer pairs or documents containing answers are required (questions that the documents answer can be synthetically generated).
Classification of documents, texts, headings, images, signals: Assign each document to one of several predefined categories. Automated learning of the correct categories. High hit rate possible.

These applications can run on your company's or organization's own hardware without having to send data to third parties.

Which application cases are still suitable for a solution with AI?

In particular, it should be mentioned:

Chatbot / Knowledge Assistant: Conversation with memory, answer in own words, use of internet knowledge for finding answers…
Content Generation: Generating high-quality creative content, such as blog posts; Summarizing document contents
Object Recognition: Recognize object classes (Person, House, …) on images and videos, intelligent motion detection.
Image Generation: Generate images based on given text, generate images similar to input image. Automated copyright checking is possible.
Translation of Language and Text: Transcription, speech output, translation from one language into any of 100 other languages.

The effort required for this is often low. Only the hardware requirements are higher than for the application cases mentioned in the previous section.

What is Offline-AI?

Offline-AI is an optimized AI that can function without an internet connection but can communicate with the outside world when needed.

Advantages:

Full data control
Often better results than ChatGPT, Gemini, etc
Often cheaper

More Info on Offline-AI

What are the realistic time resources for an AI project?

For a prototype and a feasibility study, the effort is often very low. When it comes to processing your data, this data (as always) needs to be read in. This is a conventional task. .

Time works for you: Start your AI project, and you can be sure that technological progress in the AI field will benefit you in a few months.

How easily can a language model be swapped with another AI one?

In short: Most of the time this is child's play possible. Many language models follow the same system architecture. They can be replaced by changing fewer lines of code. New, better language models can therefore be used as a Drop-In Replacement, to use a technical term.

What are the licensing costs for AI programs and AI language models?

The open-source market offers an extremely high quality and up-to-dateness in the AI field that cannot be compared to any other open-source market.

This applies to both AI frameworks and AI language models (and other AI models).

The licensing costs are therefore, in short, zero.

It looks different when using the API of ChatGPT or similar services. Costs apply, which depend on the intensity of use.

Can AI be run on its own hardware?

Yes. A plastic example from practice: This text was written on a laptop that runs KI language models with 30 billion parameters (30B models). What is possible on a laptop works all the better on a KI server.

For AI server: Either rent (from German or purely European providers) or buy. The main costs when buying result from the costs of the graphics card(s).

For many use cases, such as knowledge search or generating recommendations for damage reports or customer complaints, a minimal hardware is sufficient.

What is the care effort for a AI application?

The care required is rather lower than with other IT systems, often even zero. If new knowledge documents are available, these can be automatically read in and processed. The effort naturally arises when new knowledge is gathered to further improve the quality of the system or add new knowledge. Without adding new knowledge, the effort is rather close to zero.

Can new knowledge be added to an AI application after deployment?

Yes, that is possible in several ways.

The simplest way is to select new knowledge that matches a user query and present it to the language model to help it formulate its answer.

More sustainable is the fine-tuning of the language model with the new knowledge. The language model is thus further trained here.

Examples are needed to train the AI.

Can training data be generated if there are too few examples available?

Yes, that is possible. Artificial examples are generated for this purpose. This is called synthetic datasets. A language model is used to generate synthetic datasets. For public data, you can use a cloud service like ChatGPT, Command R+ or similar, if you find that suitable. Often better, because it is also possible without additional costs, is the use of a local language model. This local model can also be trained to be particularly good at generating synthetic training data.

Another advantage of local models is the possibility of continuously (24/7) commissioning them to generate synthetic data. What would cost some tens of thousands of euros per month with ChatGPT, a local AI model achieves this at a fixed cost. The fixed costs consist of the operating costs of your hardware. With rented hardware, these amount to a few hundred euros per month. If you buy a system or already have one, only electricity costs arise during operation. .

Here it becomes clear that significant strategic advantages and opportunities arise when one invests a bit more effort than others who, out of convenience, resort to the supposedly better solution "ChatGPT".

How reliable are the answers of a language model / chatbot / AI system?

It's like with humans: no one knows it, unless they already knew the answer beforehand.

Concretely: Language models often provide correct answers in conventional operation, but not often enough to speak of reliability. Even ChatGPT fails with more specific questions that are not about the height of the Eiffel Tower.

The search for knowledge in documents is itself highly reliable.

The reliability of chatbots can be significantly improved by intelligent additional techniques. The effort required for this is low.

Conclusion: "There is no free lunch". One has to put in a bit of effort to achieve high reliability. The effort required is often manageable and economically feasible.

How can one prevent data from flowing to ChatGPT or other AI providers?

Data leakage to OpenAI or Microsoft can only be prevented if you do not use ChatGPT.

Using ChatGPT, data leakage cannot be prevented. Often, data protection settings are to the detriment of customers (Opt-Out instead of Opt-in). Legally, theoretically, your data outflow can only be prevented. .

What data protection problems can arise when using an application like HeyGen?

HeyGen serves here only as a placeholder for many AI applications of this kind, with which synthetic content can be generated.

HeyGen is an online tool for creating videos. It clones your own voice, places it on a different face, and synchronizes the lips accordingly. Text input is output as speech in the cloned voice.

If your marketing employee voluntarily suggests HeyGen and has their own voice cloned, they may not have a right to withdraw consent because they did not previously give consent that they could revoke. Otherwise, it should be noted that consent from the voice owner may be required for the use of a human voice.

Is ChatGPT the best language model?

Probably not anymore. ChatGPT delivers amazing results, but it wasn't specifically trained for the German language. Moreover, it's "old," it's based on an architecture that is now considered outdated. The sheer amount of data and sheer size (number of artificial neurons and connections) alone ensure competitive performance. .

The costs of ChatGPT are not competitive because they are sometimes more than double those of comparable language models from other providers. Mistral from France, for example, offers a very good model, as does Cohere from the USA. Non-European providers are more suitable for applications where critical data is not involved and there is no risk of knowledge outflow.

Open-source models are now so excellent that they are competitive. They are also constantly improving and can be self-operated.

What needs to be considered when operating a call answering system with AI?

This refers to a (static) statement created with a KI application. .

The synthetic voice should not be too similar to that of a real person (with a recognizable voice). The source material for voice synthesis should be copyright-free. .

Otherwise, there is nothing important to note. In particular, no personal data is processed here by AI.

What knowledge should a machine learning programmer have?

Deep technical knowledge and programming experience are very advantageous. Anyone who has never worked in-depth with AI programming should not start building an AI system from scratch.

AI is a very complex topic with numerous rapid developments. Above all, knowledge of Python and Linux/Ubuntu (or similar) should be available. If the technical fundamentals of Artificial Intelligence are known, that doesn't hurt. Knowledge in interface programming is also advantageous. Whether the programmer is the one who first sets up the system and installs it from scratch is another question. Often it is sensible if someone else does that. .

As for AI, it is beneficial in the medium term if the technical contact person (often the programmer/developer) closely follows the rapid developments in the technical AI market and is familiar with them. .

Can AI also be used for other tasks than text processing?

Yes. Here, one speaks of modalities. Modalities are data types, namely text, image, video, audio, temperature sensor values, web analytics data, etc.

There are open-source, freely available AI models for numerous modalities. For example, speech can be extracted from videos or podcasts and converted into text. This works better with self-developed AI systems based on open-source than, for example, with Microsoft Teams! Data control is not considered here.

Which computer would you recommend as a work station for AI work?

It depends on whether you want to program or "just" work with AI as a user.

For programming, I recommend an Ubuntu system. Windows with WSL would also be possible, which is good for getting started, but not suitable for professionals.

As a user you can take any PC (or Notebook) of your choice that either has a graphics card capable of AI with as much VRAM as possible (Nvidia) or an Apple AI chip (such as the M3). Everything else is almost irrelevant. What's still important is a hard drive from 1 TB, preferably with fast SSD technology. RAM from 32 GB.