Introduction: Due to the cold start problem in recommendation systems, recommending new videos to users in video recommendations is a highly challenging issue. The effectiveness of new video recommendations directly affects the stability of the recommendation system’s “metabolism” and the healthy development of the content ecosystem. To address this issue, this article mainly introduces the practical solution of the iQIYI recommendation team for cold starting new videos based on Generative Adversarial Networks (GAN) in the short video recommendation business.
01
Background
The cold start problem in recommendation systems refers to the challenge of achieving good recommendations when new items or new users enter the system, as there is no historical information about the user or the item. This issue easily affects the user experience and retention of new users, making it an important problem in the field of personalized short video recommendations.
To solve this problem, we first need to understand the components of the cold start in recommendation systems.
The cold start problem can mainly be divided into three categories: user cold start, item cold start, and system cold start.
Taking the user-item rating matrix as an example, as shown in the figure below, the user-item rating two-dimensional matrix differentiates users and items into cold and hot, forming four quadrants, with the first quadrant being the item cold start problem.
New video recommendations represent a typical item cold start problem within the video recommendation system. For the iQIYI online video service platform, a large number of new videos are produced and launched every moment, especially short videos represented by UGC and PUGC. Additionally, short videos are characterized by rapid consumption and one-time consumption, making the efficient and accurate distribution of new videos to interested users a continuous research issue.
From the content attributes of new videos, compared to old videos, new videos lack historical information, have no user-video interaction behavior data, and may have missing or inaccurate video meta attributes. Accurately describing and expressing new videos is a primary issue.
Regarding the distribution path and cycle of new videos, the distribution will undergo a small amount of exploratory or validation distribution, and after system validation, it enters a phase of large-scale free distribution. Optimizing the distribution path of new videos and efficiently reaching target users is a question of how videos and users establish connections.
From the perspective of the video recommendation system as a whole, new video recommendations are a normal “metabolism” of the recommendation system. Due to the rapid consumption characteristics of short videos, the overall material update speed of the video recommendation system is fast, and how to stably carry out “pushing out the old and bringing in the new” is an “open source problem” for the recommendation system. Additionally, the lack or inaccuracy of new video meta attributes, along with exploratory and validation distribution, can introduce certain overhead and noise to the overall recommendation system, making the accurate and efficient recommendation of new videos an efficiency issue for the recommendation system.
From the supply side of new video production, the effect of new video recommendation distribution can provide direct feedback to video production creators, influencing their judgment and decision-making, and guiding and serving as an indicator for the healthy development of the content ecosystem.
Thus, new video recommendations are crucial for recommendation systems. To effectively build new video recommendations based on first principles analysis, two fundamental questions need to be addressed: the first is how to represent new videos, and the second is how to establish connections between new videos and users.
First, how to express new videos becomes the starting point for solving the problem. Traditional solutions are mostly based on content expressions of new video attributes, such as the category, tags, and creators of new videos; while content expression is intuitive, the expressed features are discrete and sparse, and are significantly affected by noise. Secondly, establishing a direct connection between new videos and users becomes key to solving the problem. Traditional attribute-based content expressions establish connections between new videos and users through similarity or correlation calculations based on video attributes and user profile attributes. This two-part framework is simple in structure but lacks expressive power, and the separation of information transfer leads to entropy reduction, resulting in information loss and error accumulation.
Therefore, we designed a solution for new video recommendations based on Generative Adversarial Networks. The embedding vector generated by the GAN generator contains richer information, while the neural network fits real user characteristics, completing the complex mapping relationship between new videos and users. The new video features generated by the GAN not only relate to the given new video but also resemble real user features, thus achieving a unified representation of new video expression and the mapping relationship between new videos and users.
02
Model Framework Design
Recommending new videos to users in video recommendations is a highly challenging issue. To address this, this article proposes a new video cold start solution based on Generative Adversarial Networks (GAN). It captures the implicit attribute-level interaction information between users and items, utilizing multiple angles of video attributes to generate user features that may be interested in the new video, and recommends new videos to target users based on attribute-level similarity.
The overall framework structure of the GAN model is shown in the figure below:
The GAN model’s network structure consists of a generator and a discriminator. The generator can generate a user feature vector that may like the item based on the input video attributes; the discriminator can determine whether the input user feature is real or generated based on the input user feature-video feature samples, filtering out generated user features that are similar to real users and relevant to the video. Finally, several users most similar to the generated user feature vector are selected from the user pool, and the video is recommended to these users, completing the cold start recall for new videos.
The overall objective function of the GAN model is a min-max optimization expression, where the generator aims to minimize the objective function while the discriminator maximizes the same objective function. The specific calculation formula is as follows:
Where D* represents the discriminator D, ϕ represents the parameters of the discriminator, P(u+|In) represents the function of the discriminator D; G* represents the generator G, θ represents the parameters of the generator, and P(uc|In) represents the function of the generator G, N is the number of videos in the training set.
2.1 Generator
The generator adopts a structure of multiple sub-generators, which can be divided into two parts:
Second Part::
The neural network G merges all generated user feature vectors and outputs a final user feature vector through a multi-layer fully connected neural network, which can be represented as:
The generator is trained to obtain the optimal generator by minimizing the objective function, fixing the parameters of the discriminator during the training of the generator, and optimizing only the terms related to the generator, calculated by the following formula:
Generally, the tasks that the generator needs to complete are more challenging than those of the discriminator because the generator must fit the probability density, leading to a problem that affects GAN performance known as mode collapse, where highly similar samples are generated. Using multiple generators with a single discriminator can effectively alleviate this issue.
2.2 Discriminator
One is (uc,Ic), which forms samples from the given video and generated user features, where uc is the user feature generated by the generator G, and Ic is the feature of the video attributes; the other is (u+,Ic), which forms samples from the given video and real user features, where u+ is the user interested in the given video from real behavior data, and the real user features are generated from real user profiles.
The goal of the discriminator is to distinguish between (uc,Ic) and (u+,Ic), where (u+,Ic) is a positive example with a label of 1; (uc,Ic) is a negative example with a label of 0, calculated by the following formula:
The output of the discriminator is the probability related to the user and video, calculated by the following formula:
Where ϕ are the parameters of the discriminator.
During the training of the discriminator network, the optimal discriminator is obtained by maximizing the objective function, calculated by the following formula:
By training with these two types of samples, the discriminator can filter out generated user features that are not only similar to real user features but also relevant to the given video.
2.3 Recommendation Generation
After the generator generates user features that may like the current item, the new video can be recommended to the real users most similar to the generated users. This article uses cosine similarity to calculate the similarity between the generated users and users in the user pool, optimizing the similarity resolution performance using ANN.
03
Online Effect
This solution has been launched in the iQIYI main APP short video recommendation system, achieving significant positive results in the distribution and consumption of new videos, greatly improving the efficiency of high heat generation and the freshness of system exposure:
1. In terms of new video distribution effects, the exposure ratio has significantly increased, with video freshness improving by 12.4%.
2. In terms of new video consumption effects, CTR has increased by 11.9%, and the average playback duration has increased by 56.5%.
04
Summary and Outlook
The above is our recent implementation of the new video recommendation cold start recall solution in the iQIYI short video recommendation business. Practice has proven that the new video cold start solution based on Generative Adversarial Networks has significant benefits for online new video recommendations. In the future, we will continue to optimize from the following aspects:
1. Optimize the fusion of attribute features generated by multiple generators, introducing attention mechanisms and other methods to make feature expression more reasonable.
2. Adopt a structure with multiple discriminators for different granularities of boosting discrimination filtering, making generated features more diverse and avoiding model pathology and degeneration.
References