HuggingGPT: Automatically Calling Models Based on User Needs

HuggingGPT: Automatically Calling Models Based on User Needs

HuggingGPT, developed by Zhejiang University and Microsoft Research Asia, also known as JARVIS, can automatically analyze the required AI models based on the user’s natural language description and directly call the corresponding models on Huggingface to provide a solution for the user.

1. Workflow of HuggingGPT

The workflow consists of four stages:

  • Task Planning:ChatGPT parses the user’s needs into a task list and determines the execution order and resource dependencies between tasks;
  • Model Selection:ChatGPT assigns appropriate models for the tasks based on the descriptions of various expert models hosted on HuggingFace;
  • Task Execution:The selected expert models on the mixed endpoints (including local inference and HuggingFace inference) execute the assigned tasks according to the task order and dependencies, providing execution information and results back to ChatGPT;
  • Response Generation:Finally, ChatGPT summarizes the execution logs and inference results of each model to produce the final output.

HuggingGPT: Automatically Calling Models Based on User Needs

The table below shows the specific details of HuggingGPT:

HuggingGPT: Automatically Calling Models Based on User Needs

Task planning evaluations for different tasks are shown in the table below:

HuggingGPT: Automatically Calling Models Based on User Needs

The format of task planning is: [{“task”: task, “id”: task_id, “dep”: dependency_task_ids, “args”: {“text”: text, “image”: URL, “audio”: URL, “video”: URL}}], with detailed explanations of the parameters shown in the table below:

HuggingGPT: Automatically Calling Models Based on User Needs

2. Example of HuggingGPT

Assuming we have the following request, let’s take a look at the complete process of HuggingGPT:

Request: Please generate an image of a girl reading a book, her posture should be the same as the boy in example.jpg. Then please describe the new image in your voice.

HuggingGPT: Automatically Calling Models Based on User Needs

You can see how HuggingGPT breaks it down into 6 subtasks and selects models to execute and obtain the final result.

3. Experimental Effects of Different Tasks in HuggingGPT

HuggingGPT: Automatically Calling Models Based on User Needs

HuggingGPT: Automatically Calling Models Based on User Needs

HuggingGPT: Automatically Calling Models Based on User Needs

HuggingGPT: Automatically Calling Models Based on User Needs

HuggingGPT: Automatically Calling Models Based on User Needs

HuggingGPT: Automatically Calling Models Based on User Needs

HuggingGPT: Automatically Calling Models Based on User Needs

HuggingGPT: Automatically Calling Models Based on User Needs

References:

[1] https://github.com/microsoft/JARVIS

[2] https://huggingface.co/spaces/microsoft/HuggingGPT

[3] https://arxiv.org/abs/2303.17580

[4] https://twitter.com/DrJimFan/status/1642563455298473986

Leave a Comment