In-Depth Research Report on Computer Vision and Intelligent Imaging Industry

Click on the aboveBeginner’s Guide to Vision”, select to add Star or Top

Important content delivered to you first.

(Friendly reminder: Download method at the end of the article)

1. Computer Vision Leads the AI Industry with Vast Application Scenarios

1.1 What is Computer Vision? AI Technology that Enables Machines to “Understand” Images

Computer vision is a core research field of AI, aimed at giving machines human-like “vision.” It is one of the branches of artificial intelligence, intended to perceive and understand images electronically, allowing computers to recognize and understand the world around them like humans do. The human brain receives 80% of its information from the eyes (vision), and 50% of brain activity is related to processing visual information, highlighting the importance and complexity of vision in information transmission.

The application scenarios are vast, and the technological value is immense. The fields of computer vision applications are extensive, including intelligent monitoring in security, facial recognition, identity verification in finance, product recognition in retail, autonomous driving, smart marketing in entertainment, AR effects, etc., with enormous technological value.

In-Depth Research Report on Computer Vision and Intelligent Imaging Industry

In-Depth Research Report on Computer Vision and Intelligent Imaging Industry

To “understand” the world, computers must possess two key abilities—perceptual intelligence and cognitive intelligence. Through these two abilities, computers will perceive what objects, people, and things are in the image and recognize their expressions.

Ability 1—Perceptual Intelligence: What is in the Image

Perceptual ability allows machines to know what is in the image through methods, primarily through local pixel classification and recognition, such as recognizing, classifying, and locating objects and people. For example, the perceptual intelligence of computer vision is to identify objects in the image such as dogs, cats, flowers, baskets, and green leaves.

In-Depth Research Report on Computer Vision and Intelligent Imaging Industry

From a technical perspective, perceptual intelligence in vision consists of five core technologies: image classification, object localization, object recognition, semantic segmentation, and 3D reconstruction.

  • Image Classification: Classifying images based on their main content. This is the most basic visual task, which classifies an image into a category belonging to a known set of categories, such as classifying an image with a cat into the cat category. The popular basic method is to use deep convolutional networks (CNN) to extract features and classify, directly inputting the image into the network to obtain the object’s category.

  • Object Localization: Locating the image area that contains the main object for identifying the objects within that area. When different objects are present in different positions within an image, it cannot simply be classified into one category. At this point, it is necessary to identify how many categories of objects are present in the image, accurately label their positions, and frame the objects in the image.

  • Semantic Segmentation: Assigning each pixel in the image to its corresponding object category. This involves using object detection methods to frame the objects in the image, typically using rectangular frames, but since objects are usually stream-lined, further annotation of which pixels correspond to which category is necessary—this is image semantic segmentation, as shown in the following image (Fig 3). Semantic segmentation can be viewed as a classification problem, where classification algorithms can be borrowed to classify each pixel into a certain category.

  • Object Recognition: Locating and classifying all objects appearing in the image. This process typically includes: framing the area and then classifying the objects within it. This is a combination of image classification, object localization, and semantic segmentation.

  • 3D Reconstruction: Upgrading from 2D images to stereo vision. 3D reconstruction generally refers to restoring real 3D scenes from 2D images through processes such as image preprocessing, point cloud registration and fusion, and surface generation.

Ability 2—Cognitive Intelligence: What the Image Represents

Based on image recognition, machines also need to understand the relationships between various parts, the overall relationship, that is, to understand and infer the connections between objects, deduce human emotions and intentions, and make judgments about the overall scene, even making decisions. Specifically, as shown in the following image:

1) Image Recognition: Man, Woman, Dining Table, Wine Glass, Food, Flowers, Light

2) Actions, Expressions, and Emotions of People: Eating, Smiling, Happy

3) Relationships between Parts of the Image: Man and Woman gazing at each other, Man and Woman are in a romantic relationship, Man and Woman dining at a table

4) Overall Scene Meaning: A couple on a date at a restaurant, both very happy

In-Depth Research Report on Computer Vision and Intelligent Imaging Industry

With both perceptual and cognitive intelligence, computers can process visual information like the human brain and even exceed humans in accuracy when recognizing faces, objects, and scenes, and on this basis, reasoning and decision-making can be conducted. This capability is in high demand in fields such as security, autonomous driving, finance, and healthcare, and as computer vision technology continues to mature, it will be widely applied across various industries.

1.2 Computer Vision Leads the AI Industry, with Deepest Applications in Security

Computer vision is the largest component of the AI industry in China, with rapidly growing market scale. According to data from the China Academy of Information and Communications Technology, in 2017, computer vision accounted for 37% of the AI market in China. According to iResearch, it is predicted that the market scale of computer vision will reach 12 billion yuan in 2018. Globally, MarketsandMarkets reported that the global market size of AI-based computer vision was 2.37 billion USD in 2017, and it is expected to reach 25.32 billion USD by 2023, with a compound annual growth rate of 47.5% during the forecast period.

From the perspective of investment and financing, computer vision is the most favored area in the domestic primary market. According to Qianhai Capital Research Center, in 2018, the financing amount for computer vision-related enterprises in China reached 15.8 billion yuan, accounting for 25%, ranking first in the AI industry, far exceeding the 7.3 billion yuan in the United States. Autonomous driving, intelligent robots, and other AI directions also rely heavily on computer vision technology, with substantial investments directed towards computer vision research and development.

Computer vision has vast application scenarios and significant commercial monetization potential. Computer vision can greatly enhance machines’ image perception and cognitive abilities, thus its application scenarios are extensive, such as security image analysis, financial identity verification, mobile and internet entertainment, wholesale retail product recognition, industrial manufacturing, advertising marketing, autonomous driving, and medical image analysis, all possessing enormous application value.

In-Depth Research Report on Computer Vision and Intelligent Imaging Industry

  • Security Field: Fastest Implementation. Security is the most mature field for facial recognition technology, and it is also the first segment that AI vision companies generally enter. For example, intelligent analysis of surveillance videos at road checkpoints, stations, subways, and airports can detect faces in the video and compare them in real-time with images in the blacklist database, alerting when matches are found.

  • Financial Field: Widespread Use of Facial Recognition. Facial recognition has seen various solutions in the financial sector, with increasing recognition accuracy, remote account opening has been widely adopted in the internet finance industry, and facial payment and face-based withdrawals have begun to be used by major banks.

In-Depth Research Report on Computer Vision and Intelligent Imaging Industry

In-Depth Research Report on Computer Vision and Intelligent Imaging Industry

  • Medical Imaging: High Data Threshold. Medical imaging has a high threshold for data annotation, requiring professional doctors for annotation, and it is difficult to unify annotations for atypical cases, leading to poor data availability. Beyond annotation work, medical image analysis has stringent requirements for digitalization, data volume, clinical pathways, and corresponding detection quantities.

  • Autonomous Driving: High Technical Difficulty. Autonomous driving involves collecting data from various sources such as cameras and radars, making decisions based on multiple data for recognizing vehicles, objects, roads, pedestrians, etc. Computer vision will play a crucial role in environmental perception (what is around) and map creation (where am I).

In-Depth Research Report on Computer Vision and Intelligent Imaging Industry

In-Depth Research Report on Computer Vision and Intelligent Imaging Industry

The security industry has the deepest application and the largest scale, with advertising marketing following closely behind and expected to accelerate development. Security image analysis currently accounts for about 67.9% of the computer vision market scale in 2017. Advertising marketing, as the second-largest application field, accounts for 18%, with computer vision technology capable of intelligently mining advertising content and constructing new marketing models, leading to accelerated growth in the AI marketing market share.

Penetration is affected by data availability, algorithm difficulty, and demand elasticity, with intelligent marketing, autonomous driving, and intelligent healthcare expected to accelerate development. Currently, based on implementation progress, security and mobile internet are leading with relatively high penetration rates, while healthcare and autonomous driving are still largely in the research and testing stage and have not yet been commercialized. The security industry and mobile internet have become the first industries to apply visual AI technology due to the relative ease of obtaining portrait data and the urgent need for facial recognition, while the healthcare industry, despite having ample demand, has not yet seen large-scale applications due to the lack of systematic data and strong recognition specialization. In the future, as image data becomes more structured and technology matures, we believe that vertical markets such as intelligent marketing, autonomous driving, medical image analysis, and dynamic security will accelerate growth.

2. AI Vision Empowers the Large Video Industry, Establishing a Golden Path for Intelligent Imaging

For the large internet entertainment industry, we predict that video + AI is the future development trend, and this emerging model in the intelligent imaging production track under computer vision is rapidly developing for the following reasons: 1) Targeting a golden track, as 5G approaches, video will become the main information dissemination method in the future, with vast application scenarios for AI in video; 2) Technologically feasible, under AI empowerment, video advertising marketing models will move towards precision, real-time, and intelligence, while entertainment production models will move towards automation, and both tracks are large-scale with deep applications.

2.1 Targeting a Golden Track, Video Becomes the Main Information Carrier and Dissemination Method

Video presentation has become an internet trend. Over the past 20 years, the main battleground for internet information has evolved from portal websites to search engines represented by Google and Baidu, and then to social platforms like Facebook, Twitter, Weibo, and WeChat. Currently, with the rise of various PGC, UGC platforms, live broadcasts, short videos, and VR, video will gradually replace text and images as the main means of expressing and transmitting information on the internet. Just as Google and Baidu structured text, in an era where video becomes the primary means of information presentation, computer vision will also serve as an indispensable foundational technology, opening up numerous application scenarios and improving production efficiency and the convenience of life.

Currently, the products of BATT are primarily presented in video format, such as hand-held information flow (with video accounting for 80%), ByteDance (short videos on Douyin and Toutiao), and even Alibaba’s Taobao/Tmall (which has incorporated video and live broadcasts). From the perspective of total duration, short and medium videos have shown explosive growth. According to QM statistics, from 2015 to April 2019, the proportion of short and medium videos in the total duration of internet users increased from 9.7% to 21.1%, more than doubling, especially for short videos, which rose from only 0.5% to 13.4%, exhibiting explosive growth. Furthermore, news growth is also driven by the current distribution of news products primarily in video information streams. Overall, major internet companies are vigorously developing short video products.

In 2021, video traffic is expected to occupy 82% of all global internet user traffic, laying the data foundation for computer vision. According to Cisco’s assessment, the total length of videos uploaded to the internet in a single month in 2021 will exceed 5 million years, with 1 million minutes of new online video content created every second. Online video traffic is expected to account for 81.7% of all global internet user traffic. The rapid growth of video content volume contains vast information and provides substantial support for the evolution of deep learning algorithms in computer vision.

5G technology accelerates the growth of video information flow, and the trend of information visualization continues to strengthen. 5G will bring ultra-high speed (100 times the speed of 4G), low latency (1/50 of 4G latency), and massive connections (connecting devices exceeding the world’s population by 100 times). Reviewing the impact of communication technology evolution from 2G to 4G on media content forms, it can be seen that as the 5G era approaches, video will increasingly become the main means of information expression on the internet.

2.2 Maturing Image Perception Intelligence Lays a Solid Foundation for Commercialization

As discussed above, the large video industry is a golden track for the next 3-5 years, with demand expected to continue growing rapidly. At the same time, we believe that computer vision technology is also maturing, laying a solid foundation for commercialization, and existing technical capabilities can already support applications in advertising marketing, content automation, and security.

Deep learning drives breakthrough developments in computer vision algorithms. In 2012, the application of deep learning methods in image recognition reduced the error rate of the ImageNet image recognition competition from 25.8% to 16.4%, thus initiating a leap in computer vision algorithm development, with the image recognition error rate reducing to 2.25% by 2017, and the accuracy of facial and object recognition surpassing that of humans. Commercialization has already gained foundational technical support, especially for perception-focused image classification and facial recognition technology, which has achieved commercial value in security, identity verification, and advertising marketing.

Video structuring technology parses images, accumulating massive usable data. Compared to text, audio, and images, video is the carrier with the largest information capacity and highest information transmission efficiency, but because of its large information capacity and nonlinear information organization (unlike text and code with standard rules), it is also the most difficult to convert into linear data. The video structuring technology system transforms videos into structured data that can be processed by computers through image processing, image recognition, content recognition, and semantic fusion technologies, and data is the core resource in the internet era, laying the foundation for the development of the large video industry.

In summary, video has gradually become the main information carrier and presentation method; thus, the video industry is inevitably a golden track. At the same time, algorithm evolution drives the increasing maturity of image perception intelligence, and video structuring technology converts nonlinear videos into linear data. The commercialization of intelligent marketing has already been realized, and the development prospects of video AI + large entertainment industry are worth looking forward to.

3. Intelligent Imaging Application Scenarios: Intelligent Marketing has Commercialized, Content Review and Automated Imaging Production Have Begun

Currently, the main field where intelligent imaging has been commercialized is advertising marketing, utilizing AI technology to innovate video advertising production models and precision marketing in scenarios; additionally, automated image production has also begun exploration, which we will detail below.

3.1 Visual AI + Marketing: Intelligent Embedded Advertising Becomes the Main Track, Live Streaming and E-commerce are Eager to Try

We can summarize the application of computer vision in advertising marketing as follows: first, there must be a buildup of underlying data (video traffic, information contained in videos), followed by information processing through intermediate technologies (how to extract and analyze information), and finally realization in upper-level applications (commercial models that monetize data). The rapid development of these two driving forces at the data and technology levels has led to vertical applications in entertainment, including embedded advertising, live marketing, and entertainment e-commerce.

In-Depth Research Report on Computer Vision and Intelligent Imaging Industry

3.1.1 Intelligentization of Video Advertising, Initial Scale of Intelligent Embedded Business Model

Business Model—Using Computer Vision to Increase the Price of Video Advertising Inventory. The intelligent embedding platform utilizes computer vision technology, where all network content providers or video platforms output videos, and together they create advertising inventory within the videos (which may appear as stickers, objects, hot links, red envelopes, etc.) for all advertisers or agents to place ads. The advertising revenue is then shared with the content providers or video platforms. Additionally, the fundamental difference between intelligent embedding platforms and traditional DSPs and SSPs is that they are not merely channel providers but also technical service providers, discovering advertising inventory through visual AI technology in native videos and matching it according to scenarios. The value of intelligent embedding platforms lies not only in connecting video traffic providers and advertisers but also in supplying incremental advertising inventory and enhancing advertising value through precise delivery.

Video structuring labels provide rich data for precise marketing. Based on deep learning of data related to people and objects, computer vision can further distinguish scene labels, forming its own large data reservoir and closed loop. For example, the VideoAI video structuring data platform by Extreme Chain Technology relies on algorithm optimization and deep learning, with recognition accuracy already meeting commercial application requirements, having accumulated a vast video structuring label database, with celebrity recognition accuracy reaching 99.6%, object recognition accuracy at 99%, scene recognition accuracy at 99.4%, and brand recognition accuracy at 98.8%.

Furthermore, through video structuring technology scanning massive videos, people, objects, and scenes are labeled. On one hand, this allows advertisers to match embedded food advertisements more effectively during real-time placements, enhancing marketing precision; on the other hand, it can also provide celebrity information for brand advertisers, helping them find more suitable spokespersons, accurately combining celebrities with products.

Moreover, from the perspective of substitution, the paid rate of online long video platforms is accelerating, and changes in advertising formats are inevitable. We can verify the industrial development trajectory from another dimension by defining the paid penetration rate as the ratio of the number of paid members announced at the end of the quarter to the number of MAUs for that quarter. The paid penetration rate for iQIYI and Tencent Video is currently around 15%, compared to Netflix’s 40% penetration rate, indicating an inevitable upward trend. Once users become members, they can skip pre-roll ads, while the duration of pre-roll ads for major dramas and variety shows is around 60-90 seconds. In the long run, the increase in paid penetration rate is expected to squeeze the total duration of pre-roll ads.

3.1.2 Advantages of Embedded Advertising

In summary, we believe that compared to traditional pre-roll and embedded advertising, intelligent embedded advertising has the following three main advantages: 1) Significantly optimizes conversion efficiency: it can match advertising content for placement based on suitable scenarios, enhancing precision marketing and improving CTR; 2) Scalable production: intelligently batch identifies advertising inventory, enabling large-scale effective embedding; 3) Lowering barriers to entry, attracting more advertisers: effectual real-time placements lower the entry barriers, attracting a vast number of small and medium advertisers.

Advantage 1: Precise placement through scene recognition, allowing direct access to purchase pages, significantly enhancing conversion efficiency through closed-loop marketing. Intelligent embedding not only recognizes the available embedding space in videos but also judges the scenes for precise matching. For example, in a case study by Extreme Chain Technology, embedding a Changlong tourism advertisement in the drama “In the Name of the People” generated 3.81 million ad exposures, with 6,306 clicks, resulting in a CTR of 0.16%; while embedding the advertisement in a scene of a variety show featuring an ocean theme led to 2.24 million exposures and 67,000 clicks, yielding a CTR of 1.96%, an improvement of over 10 times. Additionally, scene-based embedding can also include e-commerce links, triggering user clicks to redirect to purchase pages, with an average jump rate of 21.74% for Changlong tourism e-commerce, compared to a traditional pre-roll e-commerce conversion rate of 0.3%, exceeding the latter by 72 times.

Advantage 2: Intelligent batch recognition of advertising inventory enables large-scale embedding. No manual judgment is required; intelligent recognition splits video stream pixels, discovering incremental advertising inventory with low cost, high efficiency, and precision. Traditional embedding methods are cumbersome, while intelligent embedding can save significant labor time, improving efficiency by over a hundred times.

Advantage 3: Lowering placement barriers and providing effect monitoring attracts more small and medium advertisers to participate. Traditional embedded advertising generally occurs during the production of dramas and variety shows, requiring prior contracts and being brand-focused. The effectiveness of such ads entirely depends on the viewership of the dramas or shows, creating a high entry barrier that restricts the market scale for video advertising to large brand advertisers only. In contrast, intelligent embedding can be deployed across the entire network without restrictions from dramas or shows (as long as there are available advertising slots), achieving real-time placements and programmatic purchases, with real-time monitoring of ad exposure, clicks, and conversion effects, leading to effectual advertising. Intelligent embedded advertising operates on a CPC or CPM pricing model, allowing advertisers to recharge like they would for search ads or information flow ads, with various budget sizes allowing for flexible placements. For instance, the CPM for Extreme Chain Technology is mostly around 100-150 yuan, significantly lowering the traditional embedded advertising threshold that often exceeds a million, thus attracting small and medium advertisers.

In summary, the future video advertising model is evolving, with intelligent embedded advertising set to replace traditional pre-roll ads as a significant form of video advertising. Compared to pre-roll and traditional embedded ads, intelligent embedded advertising has clear advantages: precision, scalability, and lowered barriers will all drive rapid industry development, making intelligent embedded advertising one of the most easily commercialized forms of computer vision.

3.1.3 Live Interaction and Entertainment E-commerce Improve Interaction Experience and Further Enhance Conversion Effects through Closed-loop Marketing

In addition to the main track—intelligent embedded advertising, visual AI can also be applied in intelligent marketing on PGC on-demand video platforms like iQIYI and Mango TV, as well as in live streaming platforms to enhance interaction experience. Furthermore, it can also create an entertainment e-commerce system by embedding e-commerce links, e-commerce shopping mini-programs, and IP commercial development services to achieve direct purchasing behavior and closed-loop marketing within video scenarios.

Live interaction enhances marketing experience and aids in traffic monetization. The live streaming scenario itself contains fan engagement and interactivity, where visual AI creates interactive forms like lotteries, polls, red envelopes, and card collections, integrating user interaction and content marketing within the live broadcasts.

3.2 Other Scenarios: Automatic Review, Video Information Retrieval, and Initial Steps in Content Automation Production

Represented by intelligent embedding, intelligent marketing has realized large-scale commercial development through visual AI technology. Additionally, the industry is exploring the possibility of applying AI technology in video content review, video information retrieval, and video content automation production, although these areas are still in early stages of development and have not yet established mature business models, they are worth monitoring in the long term.

3.2.1 Video Content Automatic Review

Based on visual perception intelligence and video structuring technology, visual AI can detect images in videos, promptly identifying risks such as pornography, violence, political sensitivity, and problematic celebrities. 1) Sensitive Individuals: Utilizing facial recognition technology and celebrity image databases, computers can automatically and accurately monitor whether sensitive individuals exist in the video, and can flexibly configure blacklists, synchronously identifying sensitive and ordinary individuals. Once an ordinary person becomes sensitive, related videos can be quickly blocked. 2) Sensitive Speech and Behavior: Image cognitive intelligence assists computers in understanding video information and scenes, and performing real-time comparisons across the internet.

Compared to manual review, AI intelligent review has clear advantages. 1) Cost Advantage: According to Tencent Finance, Kuaishou urgently recruited 3,000 content reviewers in April 2018, with the review team reaching 5,000. Douyin’s review team also has thousands of members, indicating that the demand for video content review is strong and growing with the scale of video streams. Compared to manual visual reviews, the greatest advantage of AI video review lies in automated bulk reviews, which significantly saves labor costs; 2) Real-time Updates: While reviewing content, visual AI is also learning from all internet videos, continuously improving recognition accuracy through deep learning by real-time updates of blacklists and sample libraries.

3.2.2 Video Information Retrieval and Copyright Protection

Computer vision breaks through the technical bottleneck of video retrieval, enabling the possibility of “finding videos using videos.” As previously mentioned, image perception, cognitive intelligence, and video structuring technology allow images to be parsed into linear data, thus breaking the technical bottleneck for video information retrieval. For a long time, search engines have intelligently retrieved text information, and later, with technological advancements, achieved image search through images. AI can split video frames through video structuring technology, tagging videos and frames, enabling massive video classification, information extraction, and video comparison. In the future, with technological maturity, it will also be possible to search for video sources or related videos by uploading video clips or screenshots.

Intelligent imaging empowers video copyright protection. A major pain point in video copyright protection is that traditional technology struggles with automated video comparison and infringement determination. Once image intelligent perception technology matures, “video fingerprints” based on the unique features of video multimodal characteristics will not change with media file format conversion, editing, compression, or rotation, and the intelligent media asset search engine built on this can also perform video comparison across the internet, aiding in video copyright protection.

3.2.3 Intelligent Image Production

In addition to deconstructing and analyzing existing videos, transforming images into data, and mining downstream application scenarios, visual AI technology can also penetrate upstream content production from data (core) to video (presentation mode), providing intelligent video editing, automatic generation of short videos, automated post-production effects, and information visualization services. Compared to other information media, video has the largest information capacity and the highest transmission efficiency, but at the same time, it also has the highest creation difficulty, time consumption, and cost. Intelligent image production can enhance video production efficiency and lower content production costs.

Intelligent Video Editing: For example, for a large number of exciting clips in a variety show, an hour of original video requires several hours of professional personnel to complete. However, with intelligent imaging technology, it can eliminate the constraints of professional equipment, professional editing software, and professional personnel, significantly reducing production costs. By utilizing intelligent imaging technology to analyze people, postures, actions, and motion trajectories comprehensively, automatic editing and synthesis can be completed in as little as ten to several seconds, improving production efficiency by over ten times.

Light Industry for Film and Television: Targeting the high-end, professional video content production market for post-production services such as special effects and 3D, we believe that as technology matures, some non-complex yet labor-intensive special effects processing tasks in professional film and television production will be replaced by visual AI, lowering post-production costs and enhancing the industrialization level of film and television.

Video Information Visualization: Based on a big data platform, using visualization model technology, presenting multi-dimensional data information in forms such as data maps, timelines, bubble charts, interactive charts, and relationship diagrams, realizing visual processing production of information.

An application example: Intelligent image production technology empowers smart media. During the 2019 Two Sessions, Guangming Online utilized Yingpu Technology’s intelligent image AGC technology to produce video content based on semantic scenes, with images dynamically following semantic, expressions, gestures, etc., thereby visually and intelligently presenting the duties of CPPCC members. This not only made the duties of CPPCC members more intuitive and understandable, but also made the content richer and more engaging compared to ordinary videos. For instance, when a CPPCC member spoke about remote immersive teaching in schools, the background behind them transformed into an academically rich scene, with VR glasses automatically placed on the character’s face.

4. Visual AI + Large Entertainment Leaders: Yingpu Technology and Extreme Chain Technology

Both Yingpu Technology and Extreme Chain Technology are heavily involved in the large entertainment industry of domestic computer vision applications, with both currently focusing on intelligent marketing as one of their main business areas, indicating that advertising marketing is the most smoothly implemented application field of visual AI + large entertainment industry. At the same time, each has its own focus in business and commercial model, and product service forms.

4.1 Yingpu Technology: Building an Intelligent Marketing Platform through Computer Vision, Exploring Automated Content Production

Yingpu Technology is a leading provider of digital media visualization technology services and a pioneer in native video marketing. Its business model encompasses two core components:

ACM Core: Freely Disposable Automated Production of Advertising Inventory (Automatic Content Marketing), that is, intelligent embedded advertising business, which has now been scaled. Its business model is as previously described, primarily relying on connecting content providers or video platforms’ videos, utilizing video structuring technology to discover embedded advertising inventory and achieve precise placements, obtaining advertising publishing fees, with costs mainly consisting of revenue sharing with upstream content providers and video platforms, as well as technical costs.

AGC Core: Machine Automated Production of Video Content (Automatic Generated Content), including automatic production of short videos, light industry for film and television, and information visualization services, which are still in the early stages of commercialization but have substantial future potential as video becomes more prevalent.

The company’s intelligent marketing platform consists of two core products: “Embedded Easy” and “Video Easy.” “Embedded Easy” utilizes intelligent computing, overlay setup, and real-time embedding technologies to automatically scan and discover advertising inventory in videos, accurately embedding according to scenes, with forms including stickers, logos, props, and picture-in-picture, focusing on brand and product display; while “Video Easy” is a visualization technology service platform that extends video content, achieving interactive marketing with audiences, including direct link URLs, voting, lottery mini-programs, e-commerce links, etc., focusing on interaction and conversion.

Partnering with SenseTime Technology to Introduce Leading Technology Focusing on AI + Large Entertainment Industry. Yingpu’s underlying technical architecture integrates SenseMedia’s internet broadcast video structuring solutions and SenseAR’s augmented reality rendering platform functions, jointly focusing on AI + large entertainment industry. Both parties leverage their unique advantages in computer vision, video structuring, deep learning, big data, video advertising placement, and internet video interaction technology to provide AI imaging commercialization services in various segments of the internet, film, and video entertainment industries, exploring the multidimensional application value of visual technology and expanding commercial boundaries.

Covering Massive Video Traffic, Providing Big Data Support for Scene Marketing. The company has signed cooperation agreements with numerous content producers such as Hunan TV, Mango TV, Huashu TV, and Mars Culture, while also providing visual marketing technology services for platforms like Tencent Video, LeTV, Thunder, Sohu Video, and Baofeng Video.

Innovative Marketing Methods Win Favor from Numerous Brand Advertisers. The company has helped well-known advertisers such as Mengniu, Nestle, Huiyuan, Nippon Paint, and Blue Moon in scene marketing, with embedded cases being well recognized by advertisers. Currently, the company has a rich array of cooperative brand advertisers, and its benchmark effect aids in future customer expansion.

4.2 Extreme Chain Technology: Visual AI Empowers Scene Economy, Rich Marketing Application Matrix

Visual AI technology drives the “advertising + e-commerce + interactive entertainment” model, creating a closed loop for the video scene industry. Extreme Chain Technology is an artificial intelligence technology company centered on video AI, analyzing scenes across the internet to drive the development of a new scene economy. Its core technologies include the VideoAI video intelligence system and VideoOS video mini-program system, with primary businesses in advertising, television, and interactive entertainment. In addition to having an AI scene marketing platform (intelligent embedded advertising), Extreme Chain has also laid out video e-commerce and interactive entertainment, making its product matrix in intelligent marketing more diverse, but it has not yet penetrated upstream content production.

Technological Accumulation Positions It to Capture Video Traffic Scenes, Achieving Explosive Revenue Growth. The company states that its partners cover 65% of leading video platforms, primarily providing video interactive operating systems, achieving automatic advertising and e-commerce placements across mobile, PC, and OTT cross-screen combinations. The company provides AI e-commerce, interactive entertainment, scene advertising, video search, and video headlines systems for platforms such as Mango TV, iQIYI, China Blue TV, Phoenix Network, Fengxing Network, Sohu, Yizhibo, and Douyin. Monthly user services reach 420 million, establishing deep cooperation with hundreds of brands, businesses, and supply chains, collaboratively constructing a complete video scene industry closed loop, and achieving large-scale commercialization in the AI + video industry. According to Wall Street Journal reports, the company generated 150 million yuan in revenue in 2017 and became profitable, with revenue reaching 580 million yuan in 2018, including over 100 million yuan in revenue in December alone.

Scene marketing accurately matches audiences, and interactive forms achieve the unity of brand effect. The ASMP system is the video AI scene marketing platform developed by Extreme Chain Technology, which structures massive video content, accurately integrating products with content scenes. The ASMP system first automatically scans video scenes through the innovative Video AI (a video structuring data platform based on visual recognition), searching for points within the video where interactive ads can be placed, and then automatically embedding interactive ads such as cloud maps, bubble dialogues, and video polls via Video OS (advertising creation program). During the placement process, the Video Data big data system can monitor placement effects in real-time.

4.3 Mirriad: UK Intelligent Embedded Advertising Company, Overseas Streaming Media Landscape and Business Model Hinder Embedded Advertising Monetization

Mirriad is a UK-based video embedded advertising technology service provider that can automatically scan videos, identify people and objects, and automatically label suitable positions for advertising embedding, conducting batch embeddings, with a profit model based on revenue sharing with content providers, with a sharing ratio of around 20%. In 2017, it partnered with Youku to conduct embedded advertising for弹个车, which became the company’s largest single project that year.

The company was listed on the London AIM market in December 2017, with a revenue of only 874,000 pounds in 2017, still in its early stages. Over half of the revenue in 2017/2016 came from China. Currently, the company’s advertising placement methods are still contract-based project placements and have not yet realized programmatic purchasing for real-time placements. We believe this is related to the competitive landscape and business model of overseas video platforms, as public information shows that Mirriad has not integrated with mainstream overseas video platforms.

Netflix firmly holds the leading position in overseas streaming media, and its business model does not rely on advertising monetization. Among domestic long video platforms, iQIYI, Tencent Video, and Youku Tudou form a tripartite balance, with no significant differences in active users and paid penetration rates; meanwhile, overseas streaming media is dominated by Netflix, which significantly outpaces Amazon and Hulu in both subscriber scale and penetration rate. More importantly, overseas streaming media primarily relies on user subscription fees for revenue, with minimal advertising monetization opportunities; for instance, 97.7% of Netflix’s revenue comes from subscription fees. In contrast, over half of Mirriad’s revenue in 2017 came from China, with India accounting for 23%, due to the fact that Chinese and Indian video platforms primarily originated from free + advertising models, with low paid rates.

Friendly reminder: If you need the original document, please log in to www.vzkoo.com on your PC and search to download.

(Report Source: Dongfang Securities; Analyst: Xiang Wenqian)

Good News!

Beginner’s Vision Knowledge Planet

Is now open to the public👇👇👇

Download 1: OpenCV-Contrib Extension Module Chinese Version Tutorial

Reply with: "Extension Module Chinese Tutorial" in the background of the "Beginner's Guide to Vision" public account to download the first Chinese version of the OpenCV extension module tutorial online, covering over twenty chapters including extension module installation, SFM algorithm, stereo vision, object tracking, biological vision, super-resolution processing, etc.

Download 2: Python Vision Practical Project 52 Lectures

Reply with: "Python Vision Practical Project" in the background of the "Beginner's Guide to Vision" public account to download 31 practical vision projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, face recognition, etc., to assist in quickly mastering computer vision.

Download 3: OpenCV Practical Project 20 Lectures

Reply with: "OpenCV Practical Project 20 Lectures" in the background of the "Beginner's Guide to Vision" public account to download the 20 practical projects based on OpenCV for advancing OpenCV learning.

Group Chat

Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually be subdivided in the future). Please scan the WeChat number below to join the group, noting: "Nickname + School/Company + Research Direction", for example: "Zhang San + Shanghai Jiao Tong University + Visual SLAM". Please follow the format; otherwise, you will not be approved. After successful addition, you will be invited to the relevant WeChat group based on your research direction. Please do not send advertisements in the group; otherwise, you will be removed. Thank you for your understanding~

Leave a Comment