Optimizing Token Usage with Prompt Adjustment

Optimizing Token Usage with Prompt Adjustment

In the previous article, I introduced how to deploy large models locally on Mac computers. By customizing prompts, various needs such as domain extraction can be achieved.

However, in reality,<span><span> deploying large models locally</span></span> is not very friendly for individual developers. On one hand, it requires a significant investment to ensure that the hardware has enough computational power to support the training and inference of LLMs; on the other hand, the scale of locally deployed large models is usually not very large, so the results may not be ideal.

For individual developers, the fastest way to develop an AI application is to integrate an API from a large model platform. In this case, all costs mainly reflect in Token expenditure. Here are the Token prices from major LLM platforms:

Tongyi Qianwen

Optimizing Token Usage with Prompt Adjustment

Zhiyu

Optimizing Token Usage with Prompt Adjustment

Baidu Wenxin Ernie

Optimizing Token Usage with Prompt Adjustment

The Dark Side of the Moon Kimi

Optimizing Token Usage with Prompt Adjustment
Google Gemini
Optimizing Token Usage with Prompt Adjustment
Claude
Optimizing Token Usage with Prompt Adjustment

Next, I will use the Tongyi Qianwen platform to implement a pseudo-requirement and see the specific Token usage.

In fact, each platform provides a certain amount of free Tokens. The reason for choosing Tongyi Qianwen is that it is relatively inexpensive.

Optimizing Token Usage with Prompt Adjustment

Implementing a search for exhibition halls using Tongyi Qianwen

The pseudo-requirement I will implement is: list three exhibition halls in a certain city and introduce some collections of each exhibition hall.

Return in JSON format

By customizing the Prompt, we can have the large model return data in a fixed format, a commonly used format is JSON. The Prompt is as follows:

Optimizing Token Usage with Prompt Adjustment
The specific code is as follows:
Optimizing Token Usage with Prompt Adjustment
The return result is shown in the figure below:
Optimizing Token Usage with Prompt Adjustment
It is clear that the result returned by the Tongyi Qianwen large model meets our requirements. However, the request time took nearly 17 seconds, and it consumed 543 Tokens at once. With 1 million Tokens, you can only support <span>1000000/543 = 1841</span> requests.
Assuming your large model application is quite popular, with 1,000 daily active users, and each user makes 10 requests per day. The daily Token consumption calculation formula is as follows:

1000 users * 10 requests * Tokens per request 543 = 5430000 Tokens

As a qualified poor person, it is essential to conserve Token usage. How to optimize prompts to reduce Token usage while maintaining accuracy is particularly important.

Simplifying JSON Format

Based on the JSON format, simplify the format and remove unnecessary content. The following content is added to the previous Prompt:

Optimizing Token Usage with Prompt Adjustment

It can be seen that the Prompt has added a line “Use simplified JSON mode, removing all unnecessary spaces and line breaks”.

The specific modified code is as follows:

Optimizing Token Usage with Prompt Adjustment

Running the above code, the final return result is as follows:

Optimizing Token Usage with Prompt Adjustment

It is obvious that performance has significantly improved: inference time optimized to 8 seconds, and Token consumption reduced to 393.

Using Custom Format

In fact, even the simplified JSON format is still a relatively complex data format. We can define a simpler custom format to further reduce the inference time of the large model and Token usage.

For example, the following format:

Optimizing Token Usage with Prompt Adjustment

In the above image, our custom format is represented by the ! symbol for a JSON Object, the ? symbol for a string, and the # symbol for an array object.

The specific modified code is as follows:
Optimizing Token Usage with Prompt Adjustment
The final running result is shown in the image below:
Optimizing Token Usage with Prompt Adjustment
By using a custom data format, the model inference time and Token consumption have been further optimized.
Prompt Engineering is an ongoing optimization process, but there are some basic rules. Future articles will attempt to outline the basic principles of writing a comprehensive Prompt.
Optimizing Token Usage with Prompt Adjustment

If you like this article

Long press the QR code to follow

Optimizing Token Usage with Prompt Adjustment
Optimizing Token Usage with Prompt Adjustment

Leave a Comment