CVPR 2023: New Network Cloning Technology Proposed by LV Lab

Machine Heart Report

Editor: Wang Qiang

What happens when neural networks reach 100%? What is the ultimate form of neural networks? What is a network superbody? The answers to these questions may be found in the movie “Lucy”.

In the movie, as the protagonist Lucy gradually develops her brain power, she gains the following abilities:

10%: Able to control the autonomic nervous system, improving bodily coordination and reaction speed.
30%: Able to predict the future and anticipate people’s actions, enhancing insight and judgment.
50%: Able to predict future changes by perceiving subtle changes in the surrounding environment.
70%: Able to control the movement of bodies and objects, possessing extraordinary movement and combat skills.
90%: Able to connect with the universe and time, possessing the power of inspiration and intuition.
100%: Able to achieve supernatural powers, surpassing the cognitive limits of humanity.

At the end of the movie, the protagonist gradually disappears and transforms into a pure energy form, ultimately merging with the universe and time. The realization of human superbody is to connect with the outside world to acquire the ability to achieve infinite values. Transferring this idea to the domain of neural networks, if we can establish a connection with the entire network,we can theoretically achieve network superbody, gaining boundless predictive capabilities.

In other words, the embodiment of the network will inevitably limit the growth of network performance, while connecting the target network with the Model Zoo, at this point the network no longer has an entity but becomes a form of superbody established between networks.

CVPR 2023: New Network Cloning Technology Proposed by LV Lab

Above: The difference between superbody networks and entity networks. Superbody networks have no entity, representing a form of connection between networks.

The concept of network superbody is explored in the CVPR 2023 paper “Partial Network Cloning”, in which the National University of Singapore’s LV lab proposes a novel network cloning technology.

Link: https://arxiv.org/abs/2303.10597

01 Problem Definition

In this paper, the authors mention that utilizing this network cloning technology to achieve network de-embodiment can bring the following advantages:

Weak data dependency: Only partial corrected data is needed to modify some connection modules.
Low training complexity: Only some connection modules and task prediction modules need fine-tuning.
Low storage requirements: Only the connection paths of the network need to be stored, without needing to store the entire network.
Sustainable and recoverable: The connection paths can be increased or decreased without making any modifications to the Model Zoo.
Transmission-friendly: During network transmission, only the connection path information needs to be transmitted, without transmitting the entire network.

The foundation for achieving superbody networks is the rapidly expanding Model Zoo, which provides a vast number of pre-trained models. Therefore, for any task T, we can always find one or more models that enable the tasks of these existing models to form the required task. That is:(three networks selected for connection).

As shown in the figure above, for task T, to construct the corresponding superbody network M_c, this paper proposes the following construction framework:

Step 1: Locate the most suitable body network M_t, so that the intersection T⋂T_t of the task set T_t of the body network M_t and the required task set T is maximized; at this point, the body network is set as the main network;
Step 2: Select the correction networks M_s^1 and M_s^2 to supplement the missing tasks in the body network;
Step 3: Use network cloning technology to locate and connect the partial correction networks M_s^1 and M_s^2 to the body network M_t;
Step 4: Use partial corrected data to fine-tune the connection modules and prediction modules of the network.

In summary, the network cloning technology proposed in this paper for constructing network superbody can be represented as:

Where M_s represents the set of correction networks, thus the connection form of the network superbody is a body network plus one or several correction networks, and the network cloning technology is to clone the required partial correction networks and embed them into the body network.

Specifically, the network cloning framework proposed in this paper includes the following two technical points:

For cloning that includes P correction networks, the first technical point is key part localization Local (∙). Since the correction networks may contain task information unrelated to the task set T, the goal of key part localization Local (∙) is to locate the parts related to the task T⋂T_s in the correction networks, with the localization parameters denoted as M^ρ, and implementation details are provided in section 2.1. The second technical point is network module embedding Insert (∙), which requires selecting appropriate network embedding points R^ρ to embed all correction networks, with implementation details provided in section 2.2.

02 Method Overview

In the network cloning method section, for simplicity, we set the number of correction networks P=1 (thus omitting the superscript ρ for the correction networks), meaning we connect one body network and one correction network to build the required superbody network.

As mentioned above, network cloning involves key part localization and network module embedding. Here, we introduce an intermediate transferable module M_f to aid understanding. That is, the network cloning technology locates key parts within the correction networks to form a transferable module M_f, and then embeds the transferable module through soft connections into the body network M_t. Therefore, the goal of network cloning technology is to locate and embed transferable modules with transferability and local fidelity.

2.1 Key Part Localization of the Network

The goal of key part localization of the network is to learn a selection function M, which is defined as a mask that acts on the filters of each layer of the network. The transferable module can be represented as:

In the above equation, we represent the correction network M_s as L layers, with each layer represented as.The extraction of the transferable module does not modify the correction network.

To obtain a suitable transferable module M_f, we locate the explicit parts in the correction network M_s that contribute the most to the final prediction results. Before this, considering the black-box nature of neural networks, and since we only need some of the network’s prediction results, we use LIME to fit the local modeling of the correction network on the required task (details can be found in the main text of the paper).

The local modeling results are represented as, where D_t is the training dataset corresponding to the desired partial prediction results (smaller than the original training dataset of the network).

Thus, the selection function M can be optimized through the following objective function:

In this equation, the localized key part fits the local modeling of G.

2.2 Network Module Embedding

When locating the transferable module M_f in the correction network, the selection function M is used to directly extract from M_s without modifying its weights. The next step is to determine the embedding position of the transferable module M_f in the body network M_t to achieve optimal cloning performance.

The embedding of the network module is controlled by the position parameter R. Following most model reuse setups, the network cloning retains the earlier layers of the body model as general feature extractors, and the network embedding process is simplified to finding the best embedding position (i.e., embedding the transferable module M_f at layer R). The process of finding the embedding can be represented as:

For detailed formula explanations, please refer to the main text. Overall, the search-based embedding includes the following points:

The process of searching for the best position parameter R starts from the deeper layers of the network to the shallower layers;
After the embedding of the transferable module at layer R, the superbody networkstill needs to introduce an Adapter A at the embedding position and re-finetune the F_c layer (for classification networks), but the parameter amount of both can be negligible compared to the entire model zoo;
When establishing connections from layer L-1 to layer 0 of the network, we roughly estimate the performance of the embedding based on the convergence value of the loss for each fine-tune, selecting the point of minimum convergence value as the final network embedding point.

03 Practical Applications of Network Cloning Technology

The core of the network cloning technology proposed in this paper is to establish a connection path between pre-trained networks without modifying any parameters of the pre-trained networks. It can not only serve as a key technology for building network superbodies but can also be flexibly applied to various practical scenarios.

Scenario 1: Network cloning technology makes online use of the Model Zoo possible. In some resource-limited situations, users can flexibly utilize the online Model Zoo without downloading pre-trained networks locally.

It should be noted that the cloned model is determined by, where M_t and M_s remain fixed and unchanged throughout the process. Model cloning does not modify any pre-trained models and does not introduce new models. Model cloning makes any functional combination in the Model Zoo possible, which also helps maintain a good ecological environment for the Model Zoo, as establishing connections using M and R is a simple masking and localization operation that is easy to revoke. Therefore, the proposed network cloning technology supports the establishment of a sustainable online inference platform for the Model Zoo.

Scenario 2: Networks generated via network cloning have a better form of information transmission. When transmitting networks, this technology can reduce transmission delays and losses.

During network transmission, we only need to transmit the set, combined with the public Model Zoo, the receiver can restore the original network. Compared to the entire cloned network,is very small, thus reducing transmission delays. If A and F_c still have some transmission losses, the receiver can easily fix them by fine-tuning on the dataset. Therefore, network cloning provides a new form of efficient transmission.

04 Experimental Results

We conducted experimental validation on classification tasks. To evaluate the local performance representation capability of the transferable module, we introduced a conditional similarity metric:

Where Sim_cos (∙) represents cosine similarity.

The table above presents the experimental results on MNIST, CIFAR-10, CIFAR-100, and Tiny-ImageNet, showing that the model obtained through network cloning (PNC) has the most significant performance improvement. Moreover, performing fine-tuning on the entire network (PNC-F) does not enhance the network’s performance; on the contrary, it increases the model’s bias.

In addition, we evaluated the quality of the transferable modules (as shown in the figure). From the left figure, it can be seen that the functions learned from each sub-dataset are more or less related, indicating the importance of extracting and locating local functions from the correction networks. For the transferable modules, we calculated their similarity Sim (∙). The right figure shows that the transferable modules have high similarity with the sub-dataset to be cloned, while their relationship with the other sub-datasets is weakened (the non-diagonal areas are marked with lighter colors than the matrix of the source network). Therefore, it can be concluded that the transferable modules successfully simulate the local performance on the task set to be cloned, proving the correctness of the localization strategy.

05 Conclusion

This paper investigates a new knowledge transfer task called Partial Network Cloning (PNC), which clones parameter modules from correction networks and embeds them into body networks in a copy-paste manner. Unlike previous knowledge transfer setups (which rely on updating the parameters of the network), our method ensures that all parameters of the pre-trained models remain unchanged. The technical core of PNC lies in simultaneously performing key part localization and transferable module embedding operations, with both steps reinforcing each other.

We demonstrate outstanding results of our method on multiple datasets in terms of accuracy and transferability metrics.

For reprints, please contact this public account for authorization.

For submissions or inquiries: [email protected]

Leave a Comment Cancel reply