Letter to 2029: Breakthroughs in Computer Vision AI

Letter to 2029: Breakthroughs in Computer Vision AI

Letter to 2029: Breakthroughs in Computer Vision AI

In the countless applications of visual AI, we believe that the breakthrough points for future technology may come from three areas: information integration and extraction, healthcare, and autonomous driving.

Written by: Jia Jiaya, Distinguished Scientist at Tencent, Head of Tencent Youtu Lab

Dr. Dai Yurong, Director of Tencent Youtu Lab, Dr. Zheng Yafeng

Image source: China Enterprise Image Library

Editor’s Note:

The cover figure of this issue, Shen Nanpeng, once said that investors must think about the situation ten years from now. Not only investors, but every rational person in the foggy and chaotic present needs to look further ahead, casting their anchors into the river of time to measure the present by the future, so they can stand more firmly and steadily. In response to this, China Entrepreneur has launched a special project titled “The Business Grand Book: Letters to 2029”, organizing nine entrepreneurs, scientists, economists, and artists to each write a letter to 2029, predicting the world ten years from their perspective, hoping to benefit the readers.

In recent years, the development of computer vision AI technology has been rapid, especially with the introduction of artificial intelligence greatly enhancing the capabilities and practicality of algorithms. In the countless applications of visual AI, we believe that the breakthrough points for future technology may come from three areas: information integration and extraction, healthcare, and autonomous driving. The AI technology layout at Tencent Youtu Lab can also be roughly divided into these three modules.

Information integration and extraction mainly refers to content analysis, including person recognition, behavior analysis, scene recognition, object detection, and semantic segmentation, which can extract meaningful and structured information from rich images or videos, combined with application scenarios to generate valuable data, providing users or consumers with precise recommendations. This field has seen rapid progress in recent years. For example, by analyzing users’ click or search behaviors, user profiles can be established, allowing content service platforms to recommend content that users are interested in more accurately. This is what major companies like Google and Facebook are currently doing. Before the maturity of visual AI technology, their user profiles were mainly based on the analysis of text search records. However, with the development of visual AI technology, more user behaviors will be extracted directly from multimedia content in the future. Moreover, information integration and extraction will not be limited to online behaviors. In the future, with the popularization of big data and 5G, a large amount of offline data will be generated. By refining offline data, it will be possible to analyze people’s behaviors more effectively, from product recommendations to urban planning, all utilizing visual AI technology to make people’s lives more convenient, comfortable, and safe.

The purpose of medical AI is to assist in diagnosis, reducing repetitive labor for doctors and helping cover grassroots disease screening in an era of a large population and uneven distribution of physician resources. In clinical practice, the guiding principle for disease treatment is: early diagnosis, early treatment, disease screening, timely medical care, and precise minimally invasive treatment, which have unprecedented significance for the improvement of society and human medical levels. In the next decade, intelligent consultation, intelligent appointment guidance, and automatic screening on medical images, such as automatic detection of pneumonia in X-rays and automatic analysis of cardiac imaging structures, will significantly reduce doctors’ workloads, allowing them to focus more on meeting the needs of critically ill patients. Furthermore, medical AI is expected to achieve widespread initial screening for most diseases, with big data and intelligent analysis likely to change the traditionally complicated medical process, and the development of virtual surgery will enhance physicians’ surgical experience, enabling intelligent surgical robots to perform more precise minimally invasive surgeries for various diseases.

Autonomous driving is a technology that is bound to arrive in the next decade. The core issue that needs to be addressed is environmental recognition. Currently, in the real-world testing of autonomous driving, traffic accidents caused by errors in environmental recognition account for over 90%. To explain simply, if autonomous driving occurs in a game world, where all environmental data can be accurately fed back to the AI responsible for vehicle control, and the AI only needs to make decisions, then in this regard, the AI’s decision-making capabilities are definitely superior to those of humans. This can be seen from the case of AlphaGo defeating humans, where in a completely closed environment, AI’s decision-making abilities have surpassed those of humans. The reason why autonomous driving is still in the testing phase is due to the incomplete understanding of environmental information, leading to decision-making errors. However, solving the issue of environmental recognition will gradually improve as more driving test data is collected. Considering this, autonomous driving is undoubtedly a technology that will come to fruition. At the same time, the applications arising from autonomous driving will increasingly bring convenience to people’s lives.

In the next decade, it is certain that various metrics of AI algorithms will continue to rise, and computer vision algorithms will delve deeper into practical applications: becoming closer to usage scenarios and achieving more precise effects. Advances in hardware and software will allow AI-based visual algorithms to no longer be limited to specific computing hardware and will become common tools for computing devices to understand the world. Today’s “multimedia computers” can record and play various media, and future computers will be able to understand the meanings of various media information.

The development of computer vision AI technology will inevitably directly impact all aspects of our lives, including clothing, food, housing, and transportation.

Imagine, in 2029, stores automatically deducing customers’ body shapes, skin colors, and ages, and recommending suitable clothing combinations; before eating, automatically judging the freshness, nutritional components, and recommending healthy meal pairings; smart homes entering thousands of households, where voice and gestures can freely control appliances, and intelligent security cameras monitor children’s activities at home; in healthcare, the process of disease checks becoming simpler, grassroots medical equipment being more complete, allowing a few healthcare workers to establish disease screening points; for certain diseases, portable imaging devices emerging, with easy-to-use operations and screening processes entering ordinary households, enabling patients to perform self-screening. In the future, we won’t have to worry about which department to consult for a “stomach ache”; intelligent dialogue analysis assistants can help patients determine the range of diseases, choose departments, and make optimal arrangements for examination and medical visits, simplifying the medical process. Intelligent surgical robots will become more intelligent and refined, leading to quicker post-operative recovery and less pain from surgeries.

The widespread adoption of self-driving cars will significantly reduce labor costs in the logistics industry, making B to C businesses easier and faster, and prices will become more reasonable due to reduced labor costs. Parking will no longer require searching for a spot, long-distance bus travel will have more rest time, and self-driving taxis will allow people to call a ride instantly, making it safer for women to call a taxi at night, while city traffic will reduce congestion due to better route planning.

AI will empower computers with the ability to understand the world, allowing computers to better assist humans in analyzing plans and making decisions. Of course, the imaginative space of computer vision AI technology is vast. However, there is still a long way to go from technical research to practical application, which is also the direction and vision of all related researchers’ efforts.

(Reported by: Cui Peng, China Entrepreneur)

END.

Duty Editor: Gao Huanhuan

[Recommended Reading] Click on the image to read

Letter to 2029: Breakthroughs in Computer Vision AI

Letter to 2029: Breakthroughs in Computer Vision AI

Letter to 2029: Breakthroughs in Computer Vision AI

Letter to 2029: Breakthroughs in Computer Vision AI

Leave a Comment