Summary of Contrastive Learning Papers from ACL 2021

MLNLP(Machine Learning Algorithms and Natural Language Processing) community is one of the largest natural language processing communities both domestically and internationally, gathering over 500,000 subscribers, with an audience covering NLP master’s and PhD students, university teachers, and corporate researchers.

The Vision of the Community is to promote communication and progress between academia, industry, and enthusiasts in natural language processing and machine learning.

Author | Yu Xuan Su

Source | Zhihu

Address | https://zhuanlan.zhihu.com/p/434672623

This article is for academic sharing only; if there is any infringement, please contact us to delete it.

1. CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding

Construct positive and negative examples using synonyms/antonyms, three losses: ① MLM; ② Predict whether each token is replaced, binary classification 0/1; ③ Contrastive loss, pulling positive examples closer, pushing negative examples away.

2. Self-Guided Contrastive Learning for BERT Sentence Representations

Two BERTs construct positive and negative examples: ① A BERT without optimized parameters performs max pooling on all layers of the transformer hidden layers, transforming (batch,len,768) to (batch,1,768), then taking the mean as the output representation of the sentence; ② An optimized BERT directly uses the last layer [CLS] as the sentence representation; representations from ① and ② of the same sentence form positive examples, while representations from different sentences form negative examples.

3. ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Four types of data augmentation to construct positive examples: ① Randomly add some noise to token_embedding; ② Index transformation of token_embedding; ③ Randomly deactivate some tokens in token_embedding (setting entire rows to zero); ④ Random deactivation of the entire token_embedding.

4. DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations

Two sentences from the same article are positive examples, while two sentences from different articles are negative examples.

5. Contrastive Instance Learning Framework for Distantly Supervised Relation Extraction

Sentences of the same category are placed in a word bag, and some unimportant words are chosen from this category’s word bag to insert/replace in the sentence as positive examples, while all other category word bags serve as negative examples for this sentence.

6. SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization

The first stage directly generates documents, the second stage uses contrastive learning: pulling all generated documents closer to the original text, and hoping that the loss in the second stage allows all candidate documents to be sorted according to the scores from the first stage;

Summary of Contrastive Learning Papers from ACL 2021

D is the original document, s_hat is the label, h is the scoring function. The specific operation is to calculate the similarity between the label and the generated document’s [cls], where si···sj are generated documents, sorted in descending order of scores, so sj’s score must be less than si’s. By minimizing the loss above, the ranking from i to j must be from large to small, hoping the model can rank candidate documents without reference documents.

7. Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning

Two losses: ① Cross-entropy loss for predicting sentence categories; ② Contrastive learning loss – positive examples come from the same category data, negative examples come from different categories.

8. Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Although the same sentence translated into different languages represents different expressions, in human cognition, the actual semantics should be similar – using multilingual examples to construct positive and negative examples.

Supplementary Knowledge:

SimCSE: Essentially, (self, self) serves as positive examples, while (self, others) serves as negative examples to train the contrastive learning model; (the same sentence goes through the encoder twice, with outputs from two different random dropouts serving as positive examples)

Loss function:

Where:

① Since it is unsupervised, y_true is None for input, and the y_true in the code block below is artificially constructed in ③;

② In each batch, each sentence is repeated once. Sentences a, b, c. Compiling into a batch results in: [a, a, b, b, c, c];

③ If idxs_1 is [[0,1,2,3,4,5]], then idxs_2 is [[1],[0],[3],[2],[5],[4]], the artificially constructed y_true is:

y_pred = K.l2_normalize(y_pred, axis=1
# L2 regularization to reduce the impact of singular values

similarities = K.dot(y_pred, K.transpose(y_pred)) 
# Similarity of each sentence in the batch with other sentences

similarities = similarities - tf.eye(K.shape(y_pred)[0]) * 1e12 
# Set the similarity of itself to zero

similarities = similarities * 20
# Simply scale all similarities by 20 to expand the loss range

loss = K.categorical_crossentropy(y_true, similarities, from_logits=True)
# Optimization target, a typical six-class problem

In supervised SimCSE, positive examples are (positive sample dropped twice), while negative examples include: ① (positive sample dropped, other positive samples dropped) ② (positive sample dropped, negative samples dropped)

R-Drop: Using KL divergence to calculate the two drop values, expecting the results of different drop outputs to tend toward complete consistency. KL divergence can be viewed at: http://www.sniper97.cn/index.php/note/deep-learning/note-deep-learning/3886/

Technical Group Invitation

△ Long press to add the assistant

Scan the QR code to add the assistant on WeChat

Please note: Name-School/Company-Research Direction

(e.g., Xiao Zhang-Harbin Institute of Technology-Dialogue System)

You can apply to join the Natural Language Processing/Pytorch, etc. technical group

About Us

MLNLP(Machine Learning Algorithms and Natural Language Processing) community is a grassroots academic community jointly built by natural language processing scholars from home and abroad. It has now developed into one of the largest natural language processing communities, gathering over 500,000 subscribers, and includes well-known brands such as Ten Thousand Top Conference Group, AI Selection, AI Talent Exchange and AI Academic Exchange, aimed at promoting progress between academia, industry, and enthusiasts in machine learning and natural language processing.

The community can provide an open communication platform for related practitioners in further education, employment, and research. Everyone is welcome to follow and join us.