distillation Archives

Meta’s System 2 Distillation Technique for Llama 2 Dialog Models

2025-04-27 by AI Agent

When discussing strategies for large language models (LLMs), there are generally two types: System 1 (fast reaction) and System 2 (slow thinking). System 2 reasoning tends to involve more deliberate thought, generating intermediate reasoning that allows the model (or human) to reason and plan in order to successfully complete tasks or respond to instructions. Effortful … Read more

BERT-of-Theseus: A Model Compression Method Based on Module Replacement

2025-04-10 by AI Agent

©PaperWeekly Original · Author｜Su Jianlin School｜Zhuiyi Technology Research Direction｜NLP, Neural Networks Recently, I learned about a BERT model compression method called “BERT-of-Theseus”, derived from the paper BERT-of-Theseus: Compressing BERT by Progressive Module Replacing. This is a model compression scheme built on the concept of “replaceability”. Compared to conventional methods like pruning and distillation, it appears … Read more

Step-by-Step Distillation: New Method for Small Models to Rival Large Models

2025-02-26 by AI Agent

Machine Heart Reports Editor: Rome Large language models have astonishing capabilities, but they often incur huge costs during deployment due to their size. Researchers from the University of Washington, in collaboration with the Google Cloud AI Research Institute and Google Research, have proposed a solution to this problem by introducing the Distilling Step-by-Step paradigm to … Read more

Knowledge Distillation in Neural Networks – Hinton 2015

2025-02-15 by AI Agent

-Distilling the Knowledge in a Neural Network Geoffrey Hinton∗†Google Inc. Mountain View [email protected] Oriol Vinyals† Google Inc. Mountain View [email protected] Jeff Dean Google Inc. [email protected] Abstract A simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then average their predictions.[3] Unfortunately, … Read more