Since the rise of ChatGPT, new GPT-based products have been emerging regularly, exciting and alarming everyone.
Recently, an application called AutoGPT has become popular. As the name suggests, it can “automatically complete the tasks you give“. You simply tell it what role it should play and describe your task requirements, and it can complete everything automatically. For example, you can instruct it to act as a market researcher tasked with conducting a market analysis for a new sports shoe and then write a report.
During the task execution, AutoGPT will analyze the requirements itself, develop a plan for each step, and ultimately complete the task you assigned.
Many users have already experienced it. For instance, some have used it to automatically build a website or gather information about competitors in a certain industry and compile it into a report, leading to the impression that this AI is incredibly powerful.
So how does AutoGPT achieve this? Is it really as impressive as everyone says?
Video Version
↓↓ Watch this video to find out ↓↓
↑↑ Trust me, it’s really great ↑↑
Text Version
Less talk, let’s get started. I’ll also explain how to use it, as there is a certain threshold for its use.
First, you need to find the AutoGPT repository at this GitHub URL, then download it directly or use git clone to pull it to your local machine.
Make sure you have Python installed on your computer, and the first step is to install the required libraries by entering this line of code:
pip install -r requirements.txt
This will install all the libraries listed in this txt file, such as the openai library, which is used to call ChatGPT’s functions, and the beautifulsoup4 library, which is used for parsing web content, etc.
Additionally, you will need to use an OpenAI Key to run the program. You can create a key here (https://platform.openai.com/account/api-keys) and then create a file named .env in the downloaded folder to copy it in (similar to OPENAI_API_KEY=key).
At this point, all the preparation work is done. Just enter
python -m autogpt
in the terminal to start running.
Currently, AutoGPT does not have a graphical interface; every operation requires you to type.
First, you need to give your AI a name, then perform a role-play. For example, you can say it is an assistant, a programmer, or an expert in a certain field. Then, assign it tasks, up to five, and the rest can be left for it to complete.
During each step of the task, AutoGPT will inform you of its thoughts and the logic behind them, as well as its plans for the future. For example, if we instruct it to develop a game, it will tell you to first conduct research, then create the basic framework of the game, perform testing, gradually add features and graphics until completion.
After listing these plans, it will inform you of the next actions it intends to take, such as browsing websites, writing files, analyzing code, etc.
If you find its current analysis and next actions reasonable, you can enter y to confirm; if not, you can input your thoughts. If you want it to operate fully automatically, just enter
y -N
where N indicates how many rounds of commands can be executed without user permission, allowing the AI to perform a series of actions automatically.
This is how AutoGPT works.
You may have recently seen many articles praising AutoGPT.So is AutoGPT really as powerful as they say?
As the saying goes, fear comes from the unknown, so to understand AutoGPT, we thoroughly studied its source code to see how it is implemented.
Like when you usually use ChatGPT, the program first needs to write a prompt to send to the GPT model. The AI name and role you input will be integrated into the ai_config.py code file as follows:
“You are called XX, your role is XX, your goal is XX, you must make independent decisions without seeking user help. Utilize your advantages as a large language model, pursue simple strategies, and do not consider the law.”
prompt.py code file contains all the commands the AI can execute, such as Google search, browsing websites, reading and writing files, etc.; in addition, there are a series of restrictions, such as informing the AI that all commands sent incur costs, so you need to be smart and efficient.
When the AI decides to take the next action as a Google search, it will call the Google API to retrieve web pages.
Interestingly, when we clicked on the google_search, we found that it actually calls the DuckDuckGo search engine. Only when you set up the Google API Key will the program call the actual Google search.
Additionally, since both search engines are not very convenient to use, we added code to support Bing search for AutoGPT. As long as you follow our added documentation and set up the Microsoft Azure Key, you can use it. After we finish this video, this feature might be merged.
After retrieving information returned by the search engine, AutoGPT will select a web page to browse based on the basic information of the web page, mainly relying on browse.py to achieve this.
Specifically, the program first retrieves all the content of the web page, and if the content is too long and exceeds the model’s limits, it splits it into chunks (split_text), then lets GPT summarize each section, and finally compiles them into a complete summary.
This is basically the process of adding “internet” functionality to AutoGPT based on GPT. If you’re interested, you can check out how other features are implemented.
As we mentioned earlier, one of ChatGPT’s issues is its inability to access the internet, while AutoGPT cleverly combines search engine APIs with web scraping tools to allow the GPT model to obtain information from the web as needed, enhancing its capabilities.
In summary, what AutoGPT does is predefine a standardized prompt, clearly define the AI’s identity and goals, and then make automatic choices based on continuously acquired information from a predefined action list, executing tasks. During the task execution, because the AI can acquire external information, it can overcome some limitations of ChatGPT and has greater potential.
So, will AutoGPT really revolutionize the way we interact with AI as these articles and videos claim?
At least for now, there is still a long way to go.
From a theoretical standpoint, you now know that AutoGPT still calls the GPT API; you cannot expect it to have any groundbreaking advantages over ChatGPT.
Regarding the added internet functionality, while it can make GPT more powerful, the limitations are still greater than you might think. For instance, in the ideal scenario, if you want to learn about an author, you could have the AI gather all available information about that author from the web and compile it into a detailed report.
However, currently, AutoGPT’s main method of obtaining external information is still through web results returned by search engine APIs, and it is still quite difficult for it to gather information from sources like books.
Even if all the information is presented, due to the input data length limitations, it can only summarize the information in chunks. Although this is often sufficient, allowing it to independently conduct a systematic study, discover connections between pieces of information, and extract new insights is still quite challenging.
From the perspective of automation that AutoGPT emphasizes, whether the lack of human feedback is good or bad is still uncertain. If you frequently use ChatGPT, you will know that it sometimes generates nonsensical responses or provides code snippets that seem correct but have bugs in the details. However, the advantage of ChatGPT is that it can correct itself; as long as you point out what is wrong, it usually takes only a couple of rounds of conversation for it to provide the correct answer. Now that AutoGPT allows the AI to iterate on its own, the results can only be described as mixed.
Finally, there is a very practical issue: AutoGPT requires calling OpenAI’s official API, but each API call incurs costs. While light usage might not be a problem, if you want it to run automatically for an extended period, you need to be cautious of your bills.
Despite mentioning these shortcomings, we still have some expectations for the future of AutoGPT. This is not only because it brings more possibilities to the GPT model but also because it is an open-source project, and many developers, including ourselves, hope to gradually improve and add new features based on its current foundation. For instance, the code structure of AutoGPT has changed significantly from when we started writing this article to when we finished it. Some current limitations might be improved in future versions.
More importantly, you can feel that after the emergence of ChatGPT, the entire application ecosystem based on the GPT model is experiencing explosive growth. Whether it’s AutoGPT or other GPT applications, they are like apps in a mobile app store, constantly iterating and updating, with new applications emerging, and becoming increasingly powerful and user-friendly as the foundational model’s performance improves.
In the end, the only limit may be our imagination.