Microsoft hails Copilot as a professional solution designed to excel in all sorts of tasks. A test with a standard task reveals that this is, even with a generous interpretation, completely inaccurate. Aside from these functional shortcomings, the question of data security arises.
What is Microsoft Copilot?
Copilot is something to do with Artificial Intelligence. Exactly what Copilot is, couldn't be determined during the test. The test results did not encourage further testing.
The answer to the question of what Copilot is supposed to be, Microsoft provides via email after registering for the free trial version. According to Microsoft, Copilot is a powerful AI system:
"Whether you want to learn programming, plan the perfect vacation, or simply need a little help writing a difficult email, your AI companion in everyday life helps you get everything done like a pro."
Source: Microsoft's welcome email "Welcome to Microsoft Copilot, Your AI Companion in Everyday Life."
This statement sounds as if you could accomplish many things very well with Copilot. You will be empowered by Copilot to "get everything done like a pro," says Microsoft.
The email even includes a concrete example, which is prominently mentioned in the email:

Summarizing answers is therefore mentioned. What exactly is meant by this is certainly not clear to the author of this contribution. The linked Microsoft page ("Try now") also shines with general platitudes: "Turn inspirations into actions" and "Get more done – anytime, anywhere."
The Copilot Test
This test is certainly not representative of all the possibilities that Copilot is supposed to deliver. However, it does test Copilot's suitability for a very common task: Summarizing text.
Microsoft at least mentions summarized answers as the first use case (see above). Could this perhaps (also or especially) refer to summarizing text?
The task is therefore not overwhelmingly difficult and not out of touch with reality. Almost everyone would probably think of it as a use case for AI systems.
Copilot forced two tests to be carried out. In the first test, Copilot was given a URL to a blog article and was supposed to summarize it. The result was so bad that a second test seemed fair. Here, Microsoft's so-called Copilot was given the test manually, so that Copilot would not be overwhelmed by retrieving an article from the internet.
Test: Summarize a blog article via URL
The question (prompt) to Copilot was simple:
Summarize the following blog article: https://dr-dsgvo.de/ki-und-intelligenz-ist-der-mensch-nicht-auch-ein-token-papagei/
Given question that Copilot should answer.
Copilot's answer was as follows:

The sources were anonymized in the screenshot. Of the five mentioned sources, four referred to one website and the fifth to another website. Both websites are not mentioned and not linked in the text that should be summarized.
The provided text, which Copilot was supposed to summarize, does not contain any information about "ADM systems". The author of the text is completely unaware of what an "ADM system" is supposed to be. As an informatician, he has never heard of it. Either 30+ years of IT experience were for nothing or Copilot fabricated or threw irrelevant factoids (related to the task) around.
Copilot answers a standard task completely inappropriately.
See post for details.
Copilot's answer shines by its uselessness.
Copilot writes something about "transparency, self-control and supervision". These terms do not appear in the text. Under the text, there is only the keyword "Full data control" in a contact box, which refers to an Offline-AI that Copilot makes unnecessary for many situations and often surpasses Copilot according to the claim. Also, there was no mention of "discrimination" in the original text, which Copilot insinuated into its answer.
In the article that Copilot was supposed to summarize, it is not primarily about the GDPR, but about AI. The terms "data protection" and "GDPR" are not mentioned in the core text (and if so, very rarely and in the form of "… in the Dr. GDPR Blog" etc.).
Conclusion: Copilot completely messed up and did not solve the task.
Nowhere was it visible that the answer could be wrong, that it should be checked best or similar.
On July 5, 2024, Copilot gave the following answer to the same question (with a slightly different wording):

The picture speaks for itself (please note, that it was automagically translated by an own AI system, which will be released as software-as-a-service soon; because you cannot translate German to English with the same length and same word-order, the translated screenshot cannot be perfect).
Test: Summarize blog article text
Let's move on to test number two. We want to rule out that it was due to retrieving a URL from the internet. It could be that Copilot was overwhelmed with that.
For this test, it should have been easier for Copilot, after Copilot completely messed up in the previous test. Now the text from the blog article was manually entered into Copilot via copy & paste. This is what it looked like:

Unfortunately, it was not possible to copy the entire article into Copilot's Chatbox. This was of course taken into account. However, this is not the reason for the following test result. The answer Copilot provides is:

The answer has nothing to do with the original question. Some evidence for the poor quality of the answer, which is below that of a toddler. The toddler would have done no worse with nothing to say:
- GPT-3 was not mentioned in the text that Copilot was supposed to summarize (1st, 2nd, and 3rd paragraphs of the Copilot response).
- The researchers mentioned by Copilot and their study were not mentioned in the text (1st + 2nd paragraph).
- The aspects mentioned in the 3rd paragraph, "ability to form analogies" and "analogy problems", were not mentioned in the text. There was only talk of analog signals (versus digital signals) and the word "analog" was used in another sentence, but completely subordinate ("…then we talk analog about robots with a built-in computer").
- The "large language models" mentioned by Copilot were not mentioned in the text. There was only talk of "language models". The words "large" or even "LLM" (like "Large Language Model") did not appear.
- The sources mentioned by Copilot were not mentioned in the text (1st, 2nd, and 3rd paragraphs as well as Copilot's source citations).
If you remove from the Copilot response all statements that have nothing to do with the original text to be summarized, very little is left. In school, the teacher would probably give Copilot's answer a grade of 6 for "unsatisfactory".
The result generated by Copilot is a disgrace. It shines with arbitrary statements and can be called completely wrong. The task was to summarize a given text. This task was not even attempted. Instead, the answer reads like a random search result that was summarized by a language model. In addition, it is made to seem as if the answer is the (perfect) answer.
Nowhere on the user interface (Web-UI) is there a hint that Copilot's answers could be wrong or inaccurate. At least not if you look all over the screen. Perhaps Microsoft expects the user to search for and read Microsoft knowledge articles until they find somewhere "Copilot can also utter complete nonsense. But that hopefully doesn't happen so often, because we are Microsoft and pretend that Copilot can really be great at all possible tasks, such as summarizing answers".
Result with Own AI
The test could have been possibly unfair (which sounds rather ironic when you consider the task). Summarizing a text was perhaps too difficult and surpasses even the abilities of the smartest human (that was definitely ironic). To rule this out, the same task was given to an own offline AI. Completely without Microsoft, Copilot, Azure or ChatGPT.
The task for the Offline-AI was:
Summarize the following webpage: https://dr-dsgvo.de/ki-und-intelligenz-ist-der-mensch-nicht-auch-ein-token-papagei
Identical task statement to the Offline-AI, which was too difficult for Copilot.
The unedited answer from the Offline-AI is (original was in German and automatically translated to English for this English version of the article; please refer to the German original to catch the details):
The author claims that AI (Artificial Intelligence) and humans function similarly to token pigeons, by making probability-based decisions. He emphasizes that it's not about the mechanism itself, but the outcome. The author criticizes that many people have a false notion of intelligence and cannot define it.
Outcome of the given task with an AI that completes the work, thus not a Copilot.
He argues that AI systems like neural networks are "simple" mechanisms that can solve complex problems. He also claims that humans are token pigeons by making probability-based decisions.
The author presents a beta version of an offline AI and emphasizes that this solution is better than ChatGPT and does not send data to third parties. He offers free consulting for companies and authorities that need an AI solution.
Overall, the author argues that AI systems function like token pigeons and that it's not about the mechanism itself, but the outcome.
The outcome can be judged by anyone. A comparison with Copilot should show everyone where the quality differences lie.
But it must be very complicated to do something that is hardly solvable even for Copilot, right?
Yes, that is already very complicated (irony). It took almost more time to write the program than to wait for the answer (which came quickly). The basis for this offline AI already existed. Every company can have its own such AI. Similarly quickly, prototypes can be built with which the fundamental suitability of an AI for specific use cases can be tested.
The offline AI, which delivers the above result, did the following automatically:
- Retrieval of the URL mentioned in the prompt
- Extraction of the core content (= blog article)
- Summarization of the blog article
- Output of the answer
Offline-AI means that the AI can run completely autonomously and can communicate with the internet or other IT systems if needed. That's exactly what happened when the blog article was retrieved from the internet.
Conclusion
Copilot is, after all, a marketing instrument from Microsoft and not a serious AI. At least that applies to the mentioned test. By the way, programming tasks can also be completed without Copilot. For this, one uses freely available AI models that do a very good job.
Anyone who wants to upload their own data to the Microsoft Cloud should think about it again here. Provided they haven't already been put off by Copilot's questionable capabilities.
What is annoying is Microsoft's maximum self-confidence, which does not at all match Copilot's shortcomings. It is done everywhere (email, website) as if Copilot were the savior.
Wouldn't you rather use a better solution? The prerequisite is that concrete use cases are considered. That is honest.
Independent of Copilot, it is a problem that lazy or mediocre developers use AI assistants such as Copilot or ChatGPT to create program code that is more unsafe than if it had been created manually. This is suggested by a study of Standford University. A fool with a tool is still a fool!
Key messages
Microsoft Copilot failed to perform a simple task of summarizing a blog article, providing inaccurate and unhelpful results.
Copilot failed to summarize the provided text accurately, generating irrelevant and nonsensical responses.
Copilot, an AI assistant, failed to accurately summarize a given text, instead producing irrelevant and fabricated information.
The author believes that AI systems, like their own offline solution, are more effective than tools like ChatGPT and Copilot because they focus on delivering tangible results rather than relying on complex mechanisms.




My name is Klaus Meffert. I have a doctorate in computer science and have been working professionally and practically with information technology for over 30 years. I also work as an expert in IT & data protection. I achieve my results by looking at technology and law. This seems absolutely essential to me when it comes to digital data protection. My company, IT Logic GmbH, also offers consulting and development of optimized and secure AI solutions.
