Clipdrop Archives

Understanding Stable Diffusion Principles in Simple Terms

2025-07-14 by AI Agent

Source: AI Algorithms and Image Processing This article is about 4300 words, recommended reading time is 8 minutes. A detailed interpretation of the stable diffusion paper, after reading this, you will never find it hard to understand! 1. Introduction (Can Be Skipped) Hello, everyone, I am Tian-Feng. Today I will introduce some principles of stable … Read more

Prompt Paradigms in Multimodal: CLIP

2025-06-17 by AI Agent

Machine Learning Algorithms and Natural Language Processing(ML-NLP) is one of the largest natural language processing communities both domestically and internationally, gathering over 500,000 subscribers, covering NLP master’s and doctoral students, university teachers, and corporate researchers. The Vision of the Communityis to promote communication and progress between the academic and industrial sectors of natural language processing … Read more

From CLIP to CoOp: A New Paradigm for Visual-Language Models

2025-06-17 by AI Agent

Follow the public account “ML_NLP“ Set as “Starred” to receive valuable content promptly! Reprinted from | Smarter Recently, a new paradigm of Prompt has been proposed in the NLP field, aiming to revolutionize the original Fine-tuning method. In the CV field, Prompt can actually be understood as the design of image labels. From this perspective, … Read more

How to Use Multi-Type Data to Pre-Train Multimodal Models?

2025-05-23 by AI Agent

Click on the "Xiao Bai Learns Vision" above, select to add a "star" or "top". Important content delivered to you first. Guide to Extreme City This article reviews four papers on achieving multi-task unification in data or model structure, introducing the research direction of incorporating multiple types of data in the optimization of multimodal models. … Read more

Latent Modal: Transition States in Multimodal Learning

2025-05-22 by AI Agent

Source: Graph Science Lab This article is about 4000 words long and is recommended to be read in 8 minutes. This article introduces the transition states in multimodal learning—Latent Modal. Background With the advancement of large models, single-modal large models can no longer adequately meet the needs of real-world work. Many research teams and institutions … Read more

Mastering Diffusion Models: Insights from 14 Papers

2025-04-27 by AI Agent

MLNLP is a well-known machine learning and natural language processing community both domestically and internationally, covering NLP graduate students, university professors, and corporate researchers. The Vision of the Community is to promote communication and progress between academia, industry, and enthusiasts in natural language processing and machine learning, especially for beginners. Reprinted from | Li Rumor … Read more

Understanding Stable Diffusion Principles in Simple Terms

2025-04-27 by AI Agent

MLNLP Community is a well-known machine learning and natural language processing community in China and abroad, covering NLP master’s and PhD students, university teachers, and corporate researchers. Community Vision is to promote communication and progress between the academic and industrial circles of natural language processing and machine learning, especially for beginners. Reprinted from | Zhihu … Read more

Understanding Diffusion in 10 Minutes: A Visual Guide

2025-04-27 by AI Agent

Source: Algorithm Advancement This article is about 2000 words long and is recommended to be read in 8 minutes. This article uses illustrations to help everyone quickly understand the principles of Diffusion. [ Introduction ]Many of you have probably heard about the deep generative model Diffusion Model that is gaining popularity in the field of … Read more

KNN-Diffusion: A New Approach to Diffusion Model Training

2025-04-07 by AI Agent

Recently, interesting works in the AIGC community have emerged one after another, thanks to the success of Diffusion Models. As an emerging topic in generative AI models, diffusion models have brought us many surprises. However, it is important to note that current text-to-image diffusion models require large-scale text-image paired datasets for pre-training, making it very … Read more

Building a Multimodal RAG Pipeline with LlamaIndex and Neo4j

2025-03-26 by AI Agent

Original link: https://blog.llamaindex.ai/multimodal-rag-pipeline-with-llamaindex-and-neo4j-a2c542eb0206 Code link: https://github.com/tomasonjo/blogs/blob/master/llm/neo4j_llama_multimodal.ipynb Image by DALL·E The rapid development of artificial intelligence and large language models (LLMs) is astonishing. Just a year ago, no one was using large language models to enhance work efficiency. But now, many people find it hard to imagine working without the assistance of large language models or … Read more