HuggingGPT, developed by Zhejiang University and Microsoft Research Asia, also known as JARVIS, can automatically analyze the required AI models based on the user’s natural language description and directly call the corresponding models on Huggingface to provide a solution for the user.
1. Workflow of HuggingGPT
The workflow consists of four stages:
-
Task Planning:ChatGPT parses the user’s needs into a task list and determines the execution order and resource dependencies between tasks; -
Model Selection:ChatGPT assigns appropriate models for the tasks based on the descriptions of various expert models hosted on HuggingFace; -
Task Execution:The selected expert models on the mixed endpoints (including local inference and HuggingFace inference) execute the assigned tasks according to the task order and dependencies, providing execution information and results back to ChatGPT; -
Response Generation:Finally, ChatGPT summarizes the execution logs and inference results of each model to produce the final output.
The table below shows the specific details of HuggingGPT:
Task planning evaluations for different tasks are shown in the table below:
The format of task planning is: [{“task”: task, “id”: task_id, “dep”: dependency_task_ids, “args”: {“text”: text, “image”: URL, “audio”: URL, “video”: URL}}], with detailed explanations of the parameters shown in the table below:
2. Example of HuggingGPT
Assuming we have the following request, let’s take a look at the complete process of HuggingGPT:
Request: Please generate an image of a girl reading a book, her posture should be the same as the boy in example.jpg. Then please describe the new image in your voice.
You can see how HuggingGPT breaks it down into 6 subtasks and selects models to execute and obtain the final result.
3. Experimental Effects of Different Tasks in HuggingGPT
References:
[1] https://github.com/microsoft/JARVIS
[2] https://huggingface.co/spaces/microsoft/HuggingGPT
[3] https://arxiv.org/abs/2303.17580
[4] https://twitter.com/DrJimFan/status/1642563455298473986