Prompt Design and Fine-tuning of Large Language Models

Prompt Design and Fine-tuning of Large Language Models

This article mainly introduces prompt design, supervised fine-tuning of large language models (SFT), and the application of LLM in the mobile Tmall AI shopping assistant project. Basic Principles of ChatGPT “Speaking AI”, “Agent” In summary, it can be broken down into the following steps: Preprocess text: The input text for ChatGPT needs to be preprocessed. … Read more

Overview of Prompt Engineering

Overview of Prompt Engineering

Abbreviation Explanation: task column CR: Commonsense Reasoning QA: Question Answering SUM: Summarization MT: Machine Translation LCP: Linguistic Capacity Probing GCG: General Conditional Generation CKM: Commonsense Knowledge Mining FP: Fact Probing TC: Text Classification MR: Mathematical Reasoning SR: Symbolic Reasoning AR: Analogical Reasoning Theory: Theoretical Analysis IE: Information Extraction D2T: Data-to-text TAG: Sequence Tagging SEMP: Semantic … Read more

Interpretation of QWen2.5 Technical Report

Interpretation of QWen2.5 Technical Report

Paper Link:https://arxiv.org/pdf/2412.15115 Github Code: https://github.com/QwenLM/Qwen2.5 The technical report of the Qwen2.5 series large language model launched by Alibaba Cloud has been released, covering improvements in model architecture, pre-training, post-training, evaluation, and more. Today, we will provide a simple interpretation. Summary: 1. Core Insights 1.1. Model Improvements ● Architecture and Tokenizer: The Qwen2.5 series includes dense … Read more

Understanding Qwen2.5 Technical Report: 18 Trillion Token Training

Understanding Qwen2.5 Technical Report: 18 Trillion Token Training

Introduction The development of large language models (LLMs) is advancing rapidly, with each major update potentially bringing significant improvements in performance and extending application scenarios. In this context, the latest Qwen2.5 series models released by Alibaba have garnered widespread attention. This technical report provides a detailed overview of the development process, innovations, and performance of … Read more

In-Depth Analysis of Word2Vec Principles

In-Depth Analysis of Word2Vec Principles

This Article Overview: 1. Background Knowledge Word2Vec is a type of language model that learns semantic knowledge from a large amount of text data in an unsupervised manner, and is widely used in natural language processing. Word2Vec is a tool for generating word vectors, and word vectors are closely related to language models. Therefore, we … Read more

RestGPT Framework: Controlling Real-World Applications via RESTful APIs

RestGPT Framework: Controlling Real-World Applications via RESTful APIs

©PaperWeekly Original · Author | Yifan Song Affiliation | PhD Student, Institute of Computational Linguistics, Peking University Research Area | Natural Language Processing Paper Title: RestGPT: Connecting Large Language Models with Real-World RESTful APIs Paper Link: https://arxiv.org/abs/2306.06624 Code Link: https://github.com/Yifan-Song793/RestGPT Research Background Large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated many powerful … Read more

Multi-Agent Collaboration Mechanisms: A Review of Large Language Models

Multi-Agent Collaboration Mechanisms: A Review of Large Language Models

With the latest advancements in large language models (LLMs), agentic artificial intelligence (Agentic AI) has made significant progress in real-world applications, moving towards intelligent agents based on multiple large language models that achieve perception, learning, reasoning, and collaborative actions. These multi-agent systems (MASs) based on large language models enable a group of agents to collaborate … Read more

2025 AI Engineering Advancement Guide: Unlocking 10 Core Areas with 50 Must-Read Papers!

2025 AI Engineering Advancement Guide: Unlocking 10 Core Areas with 50 Must-Read Papers!

Hello everyone, I am Mu Yi, an internet technology product manager who continuously focuses on the AI field, a top 2 undergraduate in China, a top 10 CS graduate student in the US, and an MBA. I firmly believe that AI is the “power-up” for ordinary people, which is why I created the WeChat public … Read more

Essential Papers for AI Engineers in 2025

Essential Papers for AI Engineers in 2025

Part 1: Cutting-Edge Large Language Models GPT Series includes related papers on GPT1, GPT2, GPT3, Codex, InstructGPT, and GPT4. These papers are straightforward and clear. Additionally, GPT3.5, 4o, o1, and o3 are more related to release activities and system cards. GPT1 https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf GPT2 https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf GPT3 https://arxiv.org/pdf/2005.14165 Codex https://arxiv.org/abs/2107.03374 InstructGPT https://arxiv.org/pdf/2203.02155 GPT4 https://arxiv.org/abs/2303.08774 Claude and Gemini … Read more

Microsoft Open Sources The Phi Series: Technological Evolution, Capability Breakthroughs, And Future Prospects

Microsoft Open Sources The Phi Series: Technological Evolution, Capability Breakthroughs, And Future Prospects

Microsoft Open Sources The Phi Series: Technological Evolution, Capability Breakthroughs, And Future Prospects 1. Introduction In recent years, the parameter scale of large language models (LLMs) has shown an exponential growth trend, demonstrating strong general intelligence and achieving groundbreaking progress in numerous natural language processing tasks. However, these large models come with high training costs, … Read more