Empowering Questionnaire Generation with Wenxin Yiyan

Currently, various large language models (LLM) are experiencing explosive development, and applications based on LLM are continuously emerging.However, when developers create downstream applications based on LLM, the results generated directly by LLM have many uncertainties in terms of format and content, making it difficult to interact with other business logic code, leading developers to generate multiple times and perform extensive rule-based processing on the output, increasing the threshold and difficulty of developing native applications for large models.

Therefore, this article takes the task of generating questionnaire pages as an example, teaching everyone how to control the ERNIE SDK to output JSON format results and interact with the front end, achieving the goal of generating questionnaire webpages through the ERNIE SDK.

Through this article, you will learn:

The bridge of front-end and back-end interaction — JSON
Using LLM2Json to control the output format of Wenxin Large Model 4.0
The development process of native applications based on Wenxin Large Model

JSON Data Structure

The core of this task is to generate interactive data. The commonly used formats for front-end and back-end interaction data are JSON and XML. Due to the simple structure of JSON format, it is easy to be parsed and generated by major common programming languages, thus currently the vast majority of web applications use JSON format for interaction.

JSON data can be simply divided into two types based on the complexity of the structure: single-layer structure and multi-layer nested structure.

Single-layer Structure

A single-layer data structure is a structure with only one layer of key-value pairs, like {key1: value1, key2: value2, …}, which is relatively simple, has high controllability, and is not prone to errors. For example:

{
    "address": "北京市朝阳区XXX路XXX号",
    "date": "2023-06-25",
    "email": "[email protected]",
    "idcode": "110101199003077777",
    "name": "张三",
    "phone": "13800000000",
    "sex": "男"
}

Multi-layer Nested Structure

A multi-layer nested structure is a more complex data structure. As shown in the example, under the first level of address, there are nested second-level fields: city, area, road, and detail. In real business scenarios, data structures are often multi-layer nested, with many fields and complex nesting relationships, making it more challenging to generate such data structures, which can easily lead to errors in data parsing.

{
    "address": {
        "city": "北京市",
        "area": "朝阳区",
        "road": "XXX路",
        "detail": "XXX号"
    },
    "date": "2023-06-25",
    "email": {
        "common": "[email protected]",
        "backup": "[email protected]"
    },
    "idcode": "110101199003077777",
    "name": "张三",
    "phone": "13800000000",
    "sex": "男"
}

The task of generating questionnaire webpages in this article is essentially to generate a multi-layer nested data structure and interact with the front end to render a visual webpage. Let’s start demonstrating and analyzing the code.

Hands-On Development

Install Dependencies

This project mainly depends on two packages: erniebot and llm2json.

ERNIE SDK is used to call the text generation capabilities of Wenxin Yiyan, currently supporting direct calls to multiple versions of models such as ernie-3.5, ernie-turbo, ernie-4.0, and ernie-longtext.

LLM2Json is an easy-to-use toolkit for formatting large language model outputs. Its main design idea and some implementation codes are referenced from LangChain. It can guide the large language model to output data that conforms to JSON syntax by automatically constructing prompts, solving data format-related issues encountered in large language model formatting outputs, data interaction, and front-end development, making it more convenient and efficient for the development of downstream applications, GPTs, Agents, etc.

pip install erniebot --upgrade
pip install llm2json

ConfigureERNIE SDK

Simple encapsulation of the ERNIE SDK for quick code calls later. Please remember to replace access_token with the token corresponding to your aistudio account, and ensure the token balance is sufficient. Also, this project demonstration uses the ernie-4.0 version, which performs best in the testing environment. Developers can switch to versions such as ernie-3.5 or ernie-turbo based on their cost and inference speed needs.

import erniebot

erniebot.api_type = "aistudio"
erniebot.access_token = "xxxxxxxxxxxxxxxxxxx"

def ernieChat(content):
    response = erniebot.ChatCompletion.create(model = "ernie-4.0",
        messages = [{
            "role": "user",
            "content": content
        }])
    return response.get_result()

Define Data Structure

A questionnaire’s generation structure must have at least two layers.

The first layer consists of title (questionnaire title), description (questionnaire description), and the core data (list of questions) structure.

The second layer defines the nested data under data. Under data, there are several questions and options, and the question types include single-choice, multiple-choice, and fill-in-the-blank, so a new object Question needs to be defined for the questions. The first key is types, which determines the question type, represented as an integer (1 for single-choice, 2 for multiple-choice, 3 for fill-in-the-blank); the second is question, defining the question; the third is choices, which corresponds to the content of the options, represented as a list.

from typing import List
from llm2json.prompts.schema import BaseModel, Field

class Question(BaseModel):
    types: int = Field(description = "Question type, 1 for single-choice, 2 for multiple-choice, 3 for fill-in-the-blank")
    question: str = Field(description = "Question content")
    choices: List[str] = Field(description = "Option content")

class WenJuan(BaseModel):
    title: str = Field(description = "Questionnaire title")
    description: str = Field(description = "Questionnaire description")
    data: List[Question] = Field(description = "List of questions")

Define Correct Example

Due to the complexity of multi-layer nested data structures, it is recommended that developers input a correct example to make the output results generated by the model more perfect and stable.

correct_example = '''
{
    "title": "问卷标题",
    "description": "问卷描述",
    "data": [{
            "types": 1,
            "question": "问题（单选）",
            "choices": ["选项1", "选项2", "选项3"]
        },
        {
            "types": 2,
            "question": "问题（多选）",
            "choices": ["选项1", "选项2", "选项3"]
        },
        {
            "types": 3,
            "question": "问题（填空）"
        },
    ]
}
'''

Define Prompt Task Template

The prompt task template mainly tells the large language model the content to be generated and defines user input variables. In this case, our goal is to generate a questionnaire, and the user input variables are the questionnaire’s topic (topic) and the number of questions (num), passing in the data structure and correct example defined in sections 3 and 4.

from llm2json.prompts import Templates

t = Templates(prompt="""
Please design a questionnaire based on the topic <{topic}>.
The questionnaire description needs to briefly explain the purpose of the survey.
The questionnaire types must include single-choice, multiple-choice, and fill-in-the-blank questions, corresponding to types 1, 2, and 3 respectively.
If the question type is fill-in-the-blank, do not return the choices field for that question.
The order of the question types should be generated randomly.
The total number of questions should be {num}.
""",
field=WenJuan,
correct_example=correct_example)

Test Generation

Using user feedback on Wenxin Yiyan as the topic of the questionnaire, generate a questionnaire containing 10 questions.

from llm2json.output import JSONParser
from pprint import pprint

# Replace model variables with user input
template = t.invoke(topic="文心一言用户反馈", num="10")
# Submit the prompt template to ErnieBot
ernieResult = ernieChat(template)

# Parse the generated results
parser = JSONParser()
result = parser.to_dict(ernieResult)
pprint(result)

After running, you will get the following data:

Front-end Binding and Rendering

Once you have the generated JSON format data, you can combine it with the front-end code to parse the data structure, bind fields, and render. The core code of the front end mainly involves judging the type of questionnaire, and then matching different form components based on the questionnaire type, which is the value of types. (Here only the core part of the front-end code is shown; for the complete front-end code, please refer to the project link at the end of the article.)

<div class="choices">
<!-- Single-choice question -->
<div v-if="item.types==1">
<a-radio-group v-model:value="item.choices.keys">
<a-radio v-for="choice in item.choices" :value="choice">
{{ choice }}
</a-radio>
</a-radio-group>
</div>

<!-- Multiple-choice question -->
<div v-else-if="item.types==2">
<a-checkbox-group 
:options="item.choices" />
</div>

<!-- Fill-in-the-blank question -->
<div v-else-if="item.types==3">
<a-input style="max-width:300px"/>
</div>
</div>

Front-end rendering results:

Quick Experience

This project uses the task of generating questionnaire webpages as a case study to introduce the process of controlling large language models to format output JSON multi-layer nested structure data and bind with front-end fields for data interaction. By utilizing JSON data structures for front-end and back-end link interaction, developers can easily integrate the capabilities of large language models into existing OA, ERP, and CRM systems, quickly empowering existing business operations and achieving intelligent upgrades in office work; or efficiently develop native applications for large models from scratch, no longer worrying about catastrophic issues such as data structure parsing errors, providing users with a better service experience.

Project Link

https://aistudio.baidu.com/projectdetail/7431608

(Click Read Original to jump)

Demo Video: 【What? Wenxin Yiyan Can Generate Questionnaires!】

https://www.bilibili.com/video/BV1ne411Y7do

END

#Previous Recommendations #

The front-end technology you want is here!

Baido Comate enhances coding efficiency, unleashing 10 times software productivity.

Understand RAG in one article! Concepts, scenarios, advantages, comparison tuning, and project code examples.

Specifically optimized for large model training, Baido’s communication library BCCL quickly locates faults in the Wan card cluster.

Leave a Comment Cancel reply