On July 7, Alibaba Cloud announced at the 2023 World Artificial Intelligence Conference that the AI painting creation model Tongyi Wanshi has begun targeted testing. Tongyi Wanshi is the third product in Alibaba Cloud’s “Tongyi” model series, following Tongyi Qianwen and Tongyi Tingwu, which feature text Q&A and speech-text processing capabilities respectively.
The first three capabilities launched by Tongyi Wanshi are text-to-image generation, similar image generation, and image style transfer.
1. Text-to-Image Generation
The text-to-image generation page has a simple interface on the left side, with a text input box at the top where prompts can be entered. Below the text box, there are 8 optional styles (watercolor, oil painting, Chinese painting, flat illustration, anime, sketch, 3D cartoon, etc.) and a generate button.
Image | Left: Watercolor style of a night beach with moonlight reflecting on the waves; Right: Default style of a night beach with moonlight reflecting on the waves (Source: Generated by Tongyi Wanshi)
The performance of Wanshi under conventional scenery is excellent. The waves in the image sparkle, and the moonlight on the beach gives a tranquil and peaceful feeling. The left watercolor image has vibrant colors and a very realistic painting effect; while the right default style beach image makes one feel as if they are at a night beach!
Image | Left: Oil painting style of curry omelet rice; Right: 3D cartoon style of curry omelet rice (Source: Generated by Tongyi Wanshi)
For food images, Wanshi’s performance is also quite good. The oil painting style omelet rice is colorful and very appetizing, while the cartoon style omelet rice image has detailed handling, and even the background scenery is commendable!
Image | Left: Chinese painting style plum blossoms in winter; Right: Flat illustration style plum blossoms in winter (Source: Generated by Tongyi Wanshi)
The generated Chinese painting effect of the plum blossoms truly amazed the editor, making them momentarily believe it was a screenshot from an ancient painting! The illustration style of winter plum blossoms is also impressive in both composition and style.
Next, let’s see the ultimate challenge: how will Wanshi perform when faced with our culturally rich ancient poetry?
Image: “Picking Chrysanthemums by the Eastern Fence, Leisurely Viewing the Southern Mountain” | Left: Default style; Right: Chinese painting style (Source: Generated by Tongyi Wanshi)
Image: “Holding Hands, Growing Old Together” | Left: Default style; Right: Chinese painting style (Source: Generated by Tongyi Wanshi)
When encountering descriptive poetry, the images generated by Wanshi can accurately capture key points in the poetry, such as the chrysanthemums and the high mountains in “Picking Chrysanthemums by the Eastern Fence, Leisurely Viewing the Southern Mountain.” The generated Chinese painting style image also carries a hint of pastoral tranquility.
However, when faced with some abstract ancient poetry, the model’s performance becomes less stable. For example, in “Holding Hands, Growing Old Together,” which originally refers to a soldier’s vow to never part in life and death, it is often used to describe eternal love in modern times. The default style image generated by Wanshi seems to only capture some key information and cannot understand the meaning of the poetry, although the sketch style is quite fitting. However, the style is limited.
2. Similar Image Generation
The similar image generation interface supports uploading images not exceeding 10MB in jpg, jpeg, png, bmp formats. Clicking the generate button produces 4 similar images for download on the right side.
Image | Left: Original image; Right: Similar image generation (Source: Generated by Tongyi Wanshi)
On the left, we input a cat image, and the image generated by Wanshi retains the cat’s fur patterns and characteristics, making it feel like it was drawn by the same artist.
Image | Left: Original image; Right: Similar image generation (Source: Generated by Tongyi Wanshi)
Inputting food images, the generated similar images are almost indistinguishable from the originals. The similar images and the original image are highly unified in style while being completely different in content.
Image | Left: Original image; Right: Similar image generation (Source: Generated by Tongyi Wanshi)
This time we challenge a high difficulty: the left image is a complex ancient-style girl, while Wanshi’s generated effect is somewhat unsatisfactory. Although the content synchronization is achieved, the style is quite different from the original. It seems that for similar image generation of complex images, Wanshi’s model training still needs further improvement.
3. Image Style Transfer
The style transfer interface supports inputting two images, one as the original image and one as the specified style image. The generated image will retain the content of the original image and the style of the style image.
Image | Image One: Original image; Image Two: Style image; Image Three: Generated image (Source: Generated by Tongyi Wanshi)
This test used the lotus image generated by Wanshi (Image One) and the illustration style image generated by Wanshi (Image Two) to merge styles, resulting in an illustration-style lotus image (Image Three). Overall performance is excellent, retaining most of the original image’s content while unifying the colors.
Image | Image One: Original image; Image Two: Style image; Image Three: Generated image (Source: Generated by Tongyi Wanshi)
This time, we chose two images with higher difficulty for style merging. It can be seen that Wanshi merged the content of Image One and the colors of Image Two, completing Image Three. Image Three’s overall style transfer is complete, and the details are well preserved, but it did not generate a true Q-version painting style girl image. We hope that more operational space can be provided to users in the future.
Image | Image One: Original image; Image Two: Style image; Image Three: Generated image (Source: Generated by Tongyi Wanshi)
This test used the official sample images, and it can be seen that Wanshi’s style transfer from the sketch-style original image (Image One) to the hand-drawn style image (Image Two) is excellent. Compared to the first test, the content retention of the original image is almost perfect, and the style is perfectly integrated. This is a demonstration image that truly represents this function.
3. Summary
Overall, I am very satisfied with the functional experience of the ‘Tongyi Wanshi’ product. The advantages of the product are very significant, and it can meet the current needs of most people for AI text-to-image and image-to-image functions. However, the product is still in its early release phase, so there are some functionalities that are not fully developed. Below is a summary of the advantages and suggestions for the [Tongyi Wanshi] product.
Advantages
-
Fast Generation Speed: Based on tests, complex image generation takes less than 45 seconds, while simple images take less than 30 seconds. The speed of image generation meets my needs, which is very convenient for busy users.
-
Diverse Styles for Text-to-Image: Supports 8 styles, and the differences and characteristics between the styles are very significant. Whether it is anime style or realistic style, the expressiveness is good. Users can find suitable choices, and this diversity allows users to create personalized works based on their unique needs and preferences.
-
High Degree of Alignment Between Similar Images and Original Images: The product accurately matches similar images with original images, retaining the features and details of the original image. Users do not need to worry about significant differences between the generated images and the original images during use.
-
Style Transfer Retains Original Image Information: Wanshi can retain the information of the original image, allowing the generated images to have a new artistic style while still maintaining the characteristics of the original image. This feature makes the generated images more artistic and personalized, allowing users to feel the fusion of the work with their original creativity.
Some Suggestions
-
Improve Model’s Understanding of Abstract Phrases: When processing text, AI often struggles to understand abstract phrases, leading to results that do not align with user expectations. Ancient poetry and idioms in literary creation often have other more abstract meanings beyond their literal meanings, and we hope that Wanshi can better understand abstract phrases in the future.
-
Provide Keywords and Retained Words for Image Generation: It is suggested to provide more operational space in the image generation function, such as offering users keywords and retained words, allowing users to flexibly generate images. Users can specify the desired image’s style or theme by inputting keywords while using retained words to determine which content needs to be preserved in the image. Additionally, consider adding a background color change function, allowing users to freely choose the most suitable background color.
-
Artwork Management Library: We hope that Wanshi can provide an artwork management library. Currently, although it can retain 20 generation records, this is still somewhat insufficient for most users. By establishing an artwork management library, users can classify and manage generated artworks by image, style, content, etc. This not only facilitates users in organizing generated records but also helps them find previously generated works more quickly.
In recent years, with the development and application of artificial intelligence technology, more and more software and platforms have begun to offer AI painting creation functions. These functions not only provide a way for those without painting skills but with creativity to express themselves but also offer designers, advertisers, and others a quick way to generate materials.
Although the current functions have certain limitations in creation, with continuous technological advancements, it is believed that they will better meet user needs and generate more diverse and exquisite images in the future. I look forward to seeing more applications of ‘Tongyi Wanshi’ in the field of artistic creation, bringing us more surprises and creativity.
If you need to learn about Alibaba Cloud products, please contact us at 13391508972, or you can inquire at our Taobao store, Chengyun Technology, for complete guidance and long-term consultation services, with 1-on-1 technical support.