Windsurf’s Image Recognition Capability Upgrade

Last December, I attempted to use Windsurf and Cursor for image recognition. At that time, these tools could design web UI based on images, but they could not recognize flowcharts. Recently, I discovered that Windsurf has achieved flowchart recognition and can generate corresponding code.This enhancement undoubtedly increases the participation of AI tools in actual work.

  • Flowchart Recognition: From Design to Implementation

To verify Windsurf’s image recognition capability, I used ProcessOn to draw two flowcharts:

1. Product Information Update Process: Includes features like request parameter validation, user validation, optimistic locking, etc.

2. Web Content Summary Generation Process: Involves fetching specified URL webpages, parsing webpage content, prompt engineering, DeepSeek generating summaries, and friendly error handling.

The Product Information Update Process project is implemented based on Spring Boot. Windsurf can accurately recognize each step in the flowchart and generate Controller and Service layer code based on the existing code. Notably, it also returns specific HTTP error codes in the exception handling section according to the flowchart logic. The overall recognition rate and accuracy of functional implementation are relatively high.

The web content summary generation feature is integrated into the article proofreading project implemented in the article “Using Windsurf to Write AI Article Proofreading Website”. This project is relatively complete, and I expanded its functionality based on the existing code. Here are some summaries for this project.

  • No-code implementation took 30 minutes

The entire functionality was implemented without human coding intervention. The main functionality took about 20 minutes to complete, while optimizing the UI and fixing minor issues like error display took about 10 minutes.

This functionality was developed out of a practical need of mine. When I share articles in my circle of friends, simply forwarding the link feels a bit dull. I also don’t want to copy the original text, and my summaries never match the original writing quality, while asking AI tools can be quite cumbersome. Therefore, I am looking forward to WeChat launching a “one-click summary when forwarding” feature, allowing users to use a pre-trained Agent to automatically generate personalized summaries. This feature not only makes it convenient for users but also enhances article dissemination and the activity level of friend circles, achieving a win-win situation.

However, WeChat currently does not support similar features. So, I decided to implement a simple version myself: input a specified URL on the webpage, let the application fetch the webpage content, optimize it using prompt engineering, and then generate concise and objective text summaries through DeepSeek. These summaries need to be appropriately escaped to avoid directly quoting the original text while retaining the original language style. Additionally, I hope the system can provide friendly error messages when URL fetching or summary generation fails.

Currently, this functionality, although not directly integrated into the friend circle, has initially realized my core demand for “automatic summarization.” Of course, if you have a more convenient implementation solution (like ima), feel free to share it with me!

  • Implementation Effects and Problem Analysis

1. Recognition Capability

Windsurf’s flowchart recognition capability is impressive, accurately completing webpage fetching, content extraction, prompt concatenation, DeepSeek interaction, etc., but there is still room for improvement in text recognition. For example, when URL parsing fails and DeepSeek invocation fails, the application did not provide a friendly prompt as expected but displayed error messages directly on the webpage.

I suspect this may be related to the following reasons:

  • The image quality exported from ProcessOn is poor (I did not subscribe to the membership), which affects recognition effectiveness.

  • Windsurf’s accuracy in recognizing Chinese text still has room for improvement.

However, after clearly pointing out the problems, Windsurf can quickly fix errors. Therefore, I did not further verify the specific reasons.

2. Prompt Optimization’s Role

To reduce the risk of changes to existing code, I optimized the prompts:

  • Improve the accuracy of descriptions.

  • Add constraints, such as requiring “new functionality” rather than modifying existing logic.

The optimized prompts guide Windsurf to operate on existing code more amicably. The generated new code primarily consists of new methods, significantly reducing the risk of modifying common files.

Appendix:

1. Flowchart

Windsurf's Image Recognition Capability Upgrade

2. Effect Picture

Windsurf's Image Recognition Capability Upgrade

3. Windsurf generates flowchart based on page code

Windsurf's Image Recognition Capability Upgrade

Leave a Comment