Essential Papers for AI Engineers in 2025

Essential Papers for AI Engineers in 2025

Part 1: Cutting-Edge Large Language Models

GPT Series includes related papers on GPT1, GPT2, GPT3, Codex, InstructGPT, and GPT4. These papers are straightforward and clear. Additionally, GPT3.5, 4o, o1, and o3 are more related to release activities and system cards.

GPT1 https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
GPT2 https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
GPT3 https://arxiv.org/pdf/2005.14165
Codex https://arxiv.org/abs/2107.03374
InstructGPT https://arxiv.org/pdf/2203.02155
GPT4 https://arxiv.org/abs/2303.08774

Claude and Gemini Series To understand competitors, you can check the related papers on Claude 3 and Gemini 1. The latest iteration is Claude 3.5 Sonnet and Gemini 2.0 Flash/Flash Thinking, as well as Gemma 2.

Claude3 https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf
Gemini1 https://arxiv.org/abs/2312.11805
Claude 3.5 Sonnet https://www.latent.space/p/claude-sonnet
Gemini 2.0 Flash https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/#gemini-2-0-flash
Flash Thinking https://ai.google.dev/gemini-api/docs/thinking-mode
Gemma2 https://arxiv.org/abs/2408.00118

LLaMA Series includes related papers on LLaMA 1, LLaMA 2, and LLaMA 3 to understand leading open-source models. Additionally, Mistral 7B, Mixtral, and Pixtral can be viewed as branches on the LLaMA family tree.

Llama 1 https://arxiv.org/abs/2302.13971
Llama 2 https://arxiv.org/abs/2307.09288
Llama 3 https://arxiv.org/abs/2407.21783
Mistral 7B https://arxiv.org/abs/2310.06825
Mixtral https://arxiv.org/abs/2401.04088
Pixtral https://arxiv.org/abs/2410.07073

DeepSeek Series includes related papers on DeepSeek V1, Coder, MoE, V2, and V3, showcasing the results of leading (relative) open-source model labs.

DeepSeek V1 https://arxiv.org/abs/2401.02954
Coder https://arxiv.org/abs/2401.14196
MoE https://arxiv.org/abs/2401.06066
V2 https://arxiv.org/abs/2405.04434
V3 https://github.com/deepseek-ai/DeepSeek-V3

Apple Intelligence Papers This is Apple’s AI-related research results on every Mac and iPhone.

https://arxiv.org/abs/2407.21075

You can also use and learn from many non-cutting-edge LLMs. In particular, BERTs as workload classification models are underrated; see ModernBERT for the latest technology. Notably, AI2 (Olmo, Molmo, OlmOE, Tülu 3, Olmo 2), Grok, Amazon Nova, Yi, Reka, Jamba, Cohere, Nemotron, Microsoft Phi, HuggingFace SmolLM, most are ranked lower or lack papers. Alpaca and Vicuna have historical significance, while Mamba 1/2 and RWKV have potential future significance. If time permits, it is recommended to read scaling law literature: Kaplan, Chinchilla, Emergence/Mirage, post-Chinchilla law.

ModernBERT https://buttondown.com/ainews/archive/ainews-modernbert-small-new-retrieverclassifier/
Grok https://github.com/xai-org/grok-1

Part 2: Benchmarking and Evaluation

MMLU Papers serve as the main knowledge benchmark, alongside GPQA and BIG-Bench. By 2025, leading labs mainly use MMLU Pro, GPQA Diamond, and BIG-Bench Hard.

MMLU https://arxiv.org/abs/2009.03300
GPQA https://arxiv.org/abs/2311.12022
BIG-Bench https://arxiv.org/abs/2206.04615
MMLU Pro https://arxiv.org/abs/2406.01574
GPQA Diamond https://arxiv.org/abs/2311.12022
BIG-Bench Hard https://arxiv.org/abs/2210.09261

MuSR Papers are used to evaluate long-context capabilities, alongside LongBench, BABILong, and RULER. They focus on solving “Lost in The Middle” and similar problems, such as “Needle in a Haystack”.

MuSR https://arxiv.org/abs/2310.16049 LongBench https://arxiv.org/abs/2412.15204 BABILong https://arxiv.org/abs/2406.10149 RULER https://www.latent.space/p/gradient Lost in The Middle https://arxiv.org/abs/2307.03172 Needle in a Haystack https://github.com/gkamradt/LLMTest_NeedleInAHaystack

MATH Papers compile math competition problems. Leading labs focus on specific subsets of MATH, including MATH level 5, AIME, FrontierMath, and AMC10/AMC12.

MATH https://arxiv.org/abs/2103.03874 AIME https://www.kaggle.com/datasets/hemishveeraboina/aime-problem-set-1983-2024 FrontierMath https://arxiv.org/abs/2411.04872 AMC10/AMC12 https://github.com/ryanrudes/amc

IFEval Papers are leading instruction-following evaluation tools and are the only external benchmark tool adopted by Apple. At the same time, MT-Bench can also be seen as a form of IFEval.

IFEval Papers

https://arxiv.org/abs/2311.07911 adopted by Apple https://machinelearning.apple.com/research/introducing-apple-foundation-models MT-Bench https://arxiv.org/abs/2306.05685

ARC AGI Challenge is a well-known abstract reasoning “IQ test” benchmark that lasts longer than many quickly saturating benchmarks.

ARC AGI https://arcprize.org/arc

Related Courses and Content We cover many such benchmarks in Benchmarks 101 and Benchmarks 201, while programs related to Carlini, LMArena, and Braintrust explore private, arena, and product evaluations (refer to LLM-as-Judge and Applied LLMs papers). Additionally, benchmarks are closely related to datasets.

Benchmarks 101 https://www.latent.space/p/benchmarks-101 Benchmarks 201 https://www.latent.space/p/benchmarks-201 Carlini https://www.latent.space/p/carlini LMArena https://www.latent.space/p/lmarena LLM-as-Judge https://hamel.dev/blog/posts/llm-judge/

Part 3: Prompts, ICL, and Chain of Thought

Note: The GPT-3 paper (“Language Models are Few-Shot Learners”) has introduced In-Context Learning (ICL), which is closely related to prompting. Additionally, we also consider prompt injection as necessary background knowledge – refer to Lilian Weng and Simon W’s content.

Prompt Report Papers
A review of prompt-related papers (also with podcast content)
Prompt Report https://arxiv.org/abs/2406.06608 https://www.latent.space/p/learn-prompting
Chain-of-Thought Papers
This is one of several papers proposing the concept of “Chain of Thought”, with other related concepts including “Scratchpads” and “Let’s Think Step By Step”.
Chain of Thought https://arxiv.org/abs/2201.11903 Scratchpads https://arxiv.org/abs/2112.00114 Let’s Think Step By Step https://arxiv.org/abs/2205.11916
Tree of Thought Papers
Introduced the concepts of foresight and retrospection (also with podcast content)
https://arxiv.org/abs/2305.10601 https://www.latent.space/p/shunyu
Prompt Tuning Papers
If the goal can be achieved through “Prefix-Tuning”, adjusting the decoding process (e.g., through entropy control), or representation engineering, explicit prompts may no longer be necessary.
https://aclanthology.org/2021.emnlp-main.243 https://arxiv.org/abs/2101.00190 https://arxiv.org/abs/2402.10200 https://vgel.me/posts/representation-engineering https://github.com/xjdr-alt/entropix
Automatic Prompt Engineering Papers
It is becoming increasingly clear that humans perform poorly in zero-shot prompting, and prompts themselves can be optimized through large language models (LLMs). The most notable implementation case is the “DSPy” paper/framework.
https://arxiv.org/abs/2211.01910 https://arxiv.org/abs/2310.03714

The importance of Part 3 is that in this field, merely reading various scattered papers may not be as effective as some practical guides. It is recommended to refer to Lilian Weng, Eugene Yan, and Anthropic’s prompt engineering tutorials and AI engineer workshops.

Part 4: Retrieval-Augmented Generation (RAG)
Introduction to Information Retrieval
Recommending books is a bit unfair, but we hope to convey the idea that RAG is essentially an information retrieval (IR) problem, and IR has a history of 60 years, including seemingly “boring” techniques such as TF-IDF, BM25, FAISS, HNSW, etc.
https://nlp.stanford.edu/IR-book/information-retrieval-book.html https://en.wikipedia.org/wiki/Information_retrieval#History https://en.wikipedia.org/wiki/Tf%E2%80%93idf https://en.wikipedia.org/wiki/Okapi_BM25 https://github.com/facebookresearch/faiss https://arxiv.org/abs/1603.09320
2020 Meta RAG Papers
This paper first proposed the term “RAG”. The original author has since founded Contextual and proposed the concept of RAG 2.0. The “base configuration” of modern RAG includes HyDE, chunking, reordering, and multimodal data, with related content presented better elsewhere.
https://arxiv.org/abs/2005.11401 https://contextual.ai/introducing-rag2/ https://docs.llamaindex.ai/en/stable/optimizing/advanced_retrieval/query_transformations/ https://research.trychroma.com/evaluating-chunking https://cohere.com/blog/rerank-3pt5 https://www.youtube.com/watch?v=i2vBaFzCEJw https://www.youtube.com/watch?v=TRjq7t2Ms5I&t=152s https://www.youtube.com/watch?v=FDEmbYPgG-s https://www.youtube.com/watch?v=DId2KP8Ykz4
MTEB: Large-Scale Text Embedding Benchmark Papers
This is the current industry standard, but there are known issues. Many embedding methods have their papers to choose from, such as OpenAI, Nomic Embed, Jina v3, cde-small-v1, etc., while Matryoshka embedding is gradually becoming the standard.
https://arxiv.org/abs/2210.07316 https://news.ycombinator.com/item?id=42504379 https://www.youtube.com/watch?v=VIqXNRsRRQo https://huggingface.co/blog/matryoshka
GraphRAG Papers
This is Microsoft’s research on adding knowledge graphs to RAG, now open-sourced. In 2024, this became one of the most popular trends in RAG, alongside ColBERT/ColPali/ColQwen (more in the visual section).
https://arxiv.org/pdf/2404.16130 https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/?utm_source=ainews&utm_medium=email&utm_campaign=ainews-graphrag https://buttondown.com/ainews/archive/ainews-graphrag/ https://www.youtube.com/watch?v=knDDGYHnnSI https://github.com/stanford-futuredata/ColBERT
RAGAS Papers
This is a simple RAG evaluation method recommended by OpenAI. Additionally, Nvidia’s FACTS framework and “Extrinsic Hallucinations in LLMs” – a review by Lilian Weng on the causes and evaluation of hallucinations (also see Jason Wei’s analysis of recall vs. precision) can also be referenced.
https://arxiv.org/abs/2309.15217 https://x.com/swyx/status/1724490887147978793 https://arxiv.org/abs/2407.07858v1 https://lilianweng.github.io/posts/2024-07-07-hallucination/ https://x.com/_jasonwei/status/1871285864690815053
RAG is at the core of AI engineering applications in 2024, so it is necessary to master a large number of industry resources and practical experiences. LlamaIndex (course) and LangChain (video) invest the most in educational resources. Additionally, it is important to understand the ongoing discussion on the “RAG vs. long context” issue.
Part 5: Agents
SWE-Bench Papers (our podcast)
After being adopted by Anthropic, Devin, and OpenAI, this may be the most notable agent benchmark currently (compared to WebArena or SWE-Gym). Although it is technically a coding benchmark, it is more of a test for agents rather than raw LLMs. You can also pay attention to SWE-Agent, SWE-Bench multimodal version, and the Konwinski Prize.
https://arxiv.org/abs/2310.06770 https://www.latent.space/p/iclr-2024-benchmarks-agents?utm_source=publication-search#%C2%A7section-b-benchmarks https://www.latent.space/p/claude-sonnet https://openai.com/index/introducing-swe-bench-verified/ https://x.com/jiayi_pirate/status/1871249410128322856 https://arxiv.org/abs/2405.15793 https://arxiv.org/abs/2410.03859 https://kprize.ai/
ReAct Papers (our podcast)
ReAct opens up long-term research on tool usage and function-calling LLMs, including the Gorilla and BFCL leaderboards. Historically significant are Toolformer and HuggingGPT.
https://arxiv.org/abs/2210.03629 https://www.latent.space/p/shunyu https://gorilla.cs.berkeley.edu/ https://gorilla.cs.berkeley.edu/leaderboard.html https://arxiv.org/abs/2302.04761 https://arxiv.org/abs/2303.17580
MemGPT Papers
This is one of the notable methods for simulating long-running agent memory and has been adopted by ChatGPT and LangGraph. Similar methods have been reinvented in every agent system such as MetaGPT, AutoGen, and Smallville.
https://arxiv.org/abs/2310.08560 https://openai.com/index/memory-and-new-controls-for-chatgpt/ https://langchain-ai.github.io/langgraph/concepts/memory/#episodic-memory https://arxiv.org/abs/2308.00352 https://arxiv.org/abs/2308.08155 https://github.com/joonspk-research/generative_agents
Voyager Papers
This is Nvidia’s research on three cognitive architecture components (curriculum, skill library, sandbox) to improve performance. More abstractly, the skill library/curriculum can be abstracted as a type of agent workflow memory.
https://arxiv.org/abs/2305.16291 https://arxiv.org/abs/2309.02427 https://arxiv.org/abs/2409.07429
Anthropic’s Research on Building Effective Agents
This is an excellent review from the end of 2024, emphasizing the importance of chain calling, routing, parallelization, orchestration, evaluation, and optimization. Additionally, OpenAI’s Swarm can also be referenced.
https://www.anthropic.com/research/building-effective-agents https://github.com/openai/swarm
Frontier Overview of Agent Design in 2024
We covered many cutting-edge results in agent design for 2024 at NeurIPS. It should be noted that we skipped the debate on the definition of agents. But if a definition is really needed, my version can be used.
https://www.latent.space/p/2024-agents https://www.youtube.com/watch?v=wnsZ7DuqYp0
Part 6: Code Generation
The Stack Papers
This is the original open dataset, which is the twin of “The Pile”, focusing on code generation and pioneering excellent code generation research from The Stack v2 to StarCoder.
https://arxiv.org/abs/2211.15533 https://huggingface.co/datasets/bigcode/the-stack-v2 https://arxiv.org/abs/2402.19173
Open Code Model Papers
Options include DeepSeek-Coder, Qwen2.5-Coder, or CodeLlama. Many consider 3.5 Sonnet to be the best code model, but it lacks a paper.
https://arxiv.org/abs/2401.14196 https://arxiv.org/abs/2409.12186 https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/ https://www.latent.space/p/claude-sonnet
HumanEval/Codex Papers
This is a saturated benchmark but essential knowledge in the coding field. Now, SWE-Bench is more famous in coding, but it is more expensive and evaluates agents rather than models. Modern alternatives include Aider, Codeforces, BigCodeBench, LiveCodeBench, and SciCode.
https://arxiv.org/abs/2107.03374 https://aider.chat/docs/leaderboards/ https://arxiv.org/abs/2312.02143 https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard https://livecodebench.github.io/ https://buttondown.com/ainews/archive/ainews-to-be-named-5745/
AlphaCodeium Papers
Google released AlphaCode and AlphaCode2, which perform excellently on programming problems, but this is also a way that Flow Engineering can bring more performance improvements to any given base model.
https://arxiv.org/abs/2401.08500 https://news.ycombinator.com/item?id=34020025 https://x.com/RemiLeblond/status/1732419456272318614
CriticGPT Papers
It is well known that code generated by LLMs may have security issues. OpenAI trained CriticGPT to identify these issues, while Anthropic uses SAEs to identify LLM features that lead to these problems, but this is a concern you should be aware of.
https://criticgpt.org/criticgpt-openai/ https://arxiv.org/abs/2412.15004v1 https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html#safety-relevant-code
Code generation is another frontier field, with many research results having shifted from academia to industry, and practical engineering advice on code generation and code agents (like Devin) appearing more in industry blogs and talks than in research papers.
https://www.youtube.com/watch?v=Ve-akpov78Q https://www.youtube.com/watch?v=T7NWjoD_OuY&t=8s
Part 7: Vision
Non-LLM vision work remains important
For example, YOLO papers (now updated to v11, but need to pay attention to its development context), but more and more transformers like DETRs are also surpassing YOLO.
https://arxiv.org/abs/1506.02640 https://github.com/ultralytics/ultralytics https://news.ycombinator.com/item?id=42352342 https://arxiv.org/abs/2304.08069
CLIP Papers
This is the first successful ViT (Vision Transformer) proposed by Alec Radford. Nowadays, CLIP has been replaced by methods like BLIP/BLIP2 or SigLIP/PaliGemma, but it remains foundational knowledge.
https://arxiv.org/abs/2103.00020 https://arxiv.org/abs/2010.11929 https://arxiv.org/abs/2201.12086 https://arxiv.org/abs/2301.12597 https://www.latent.space/i/152857207/part-vision
MMVP Benchmark (LS Live)
It quantifies some important issues with CLIP. A multimodal version of MMLU (MMMU) and SWE-Bench also exist.
https://arxiv.org/abs/2401.06209 https://www.latent.space/p/2024-vision https://arxiv.org/abs/2311.16502 https://arxiv.org/abs/2410.03859
Segment Anything Model and SAM 2 Papers (our podcast), this is a very successful foundational model for image and video segmentation. It can be used in conjunction with GroundingDINO.
https://arxiv.org/abs/2304.02643 https://arxiv.org/abs/2408.00714 https://latent.space/p/sam2 https://github.com/IDEA-Research/GroundingDINO
Early Fusion Research
In contrast to cheap “late fusion” methods like LLaVA, early fusion covers methods like Meta’s Flamingo, Chameleon, Apple’s AIMv2, Reka Core, etc. In fact, there are at least four research directions in the vision-language model (VLM) field.
https://arxiv.org/abs/2304.08485 https://www.latent.space/p/neurips-2023-papers https://huyenchip.com/2023/10/10/multimodal.html https://arxiv.org/abs/2405.09818 https://arxiv.org/abs/2411.14402 https://lilianweng.github.io/posts/2022-06-09-vlm/
Many leading VLM research results are no longer published
The last batch of results we recently obtained are the GPT-4V system cards and their derivative papers. We recommend that you have practical visual capability working experience related to models like 4o (including fine-tuning 4o visual), Claude 3.5 Sonnet/Haiku, Gemini 2.0 Flash, and o1. Other models include Pixtral, LLaMA 3.2, Moondream, and QVQ.
https://cdn.openai.com/papers/GPTV_System_Card.pdf https://arxiv.org/abs/2309.17421 https://blog.roboflow.com/gpt-4o-object-detection/ https://buttondown.com/ainews/archive/ainews-llama-32-on-device-1b3b-and-multimodal/ https://www.youtube.com/watch?v=T7sxvrJLJ14
Part 8: Speech
Whisper Papers
This is a successful ASR (Automatic Speech Recognition) model proposed by Alec Radford. Whisper v2, v3, distil-whisper, and v3 Turbo are all open weights but lack papers.
https://arxiv.org/abs/2212.04356 https://news.ycombinator.com/item?id=33884716 https://news.ycombinator.com/item?id=38166965 https://github.com/huggingface/distil-whisper https://amgadhasan.substack.com/p/demystifying-openais-new-whisper
AudioPaLM Papers
This is our last look at Google’s speech ideas before PaLM transitioned to Gemini. Also see Meta’s exploration in the speech field with LLaMA 3.
http://audiopalm https://arxiv.org/abs/2407.21783
NaturalSpeech Papers
This is one of the few leading TTS (Text-to-Speech) methods, recently releasing v3.
https://arxiv.org/abs/2205.04421?utm_source=chatgpt.com https://arxiv.org/abs/2403.03100
Kyutai Moshi Papers
This is an impressive duplex speech-to-text open-weight model with high-profile demonstrations. Also see Hume OCTAVE.
http://moshi/ https://www.youtube.com/watch?v=hm2IJSKcYvo https://www.hume.ai/blog/introducing-octave
OpenAI Real-Time API: The Missing Manual
Similarly, cutting-edge general model work has not been published, but we have tried to document relevant content of the real-time API.
https://www.latent.space/p/realtime-api
It is currently recommended to diversify exploration from fields outside large labs
You can try Daily, Livekit, Vapi, Assembly, Deepgram, Fireworks, Cartesia, Elevenlabs, etc. See “Current State of the Speech Field in 2024”. Although the speech model of NotebookLM has not been made public, we have obtained the most detailed description of the modeling process.
As Gemini 2.0 itself also supports multimodal capabilities of speech and vision, these two modalities are expected to merge clearly in 2025 and beyond.
Part 9: Image/Video Diffusion
Latent Diffusion Papers
This is essentially the Stable Diffusion paper. Also see SD2, SDXL, SD3 papers. The team is currently researching BFL Flux [schnell|dev|pro].
https://arxiv.org/abs/2112.10752 https://stability.ai/news/stable-diffusion-v2-release https://arxiv.org/abs/2307.01952 https://arxiv.org/abs/2403.03206 https://github.com/black-forest-labs/flux
DALL-E / DALL-E-2 / DALL-E-3 Papers
OpenAI’s image generation models
https://arxiv.org/abs/2102.12092 https://arxiv.org/abs/2204.06125 https://cdn.openai.com/papers/dall-e-3.pdf
Imagen / Imagen 2 / Imagen 3 Papers
Google’s image generation models. Also see Ideogram.
https://arxiv.org/abs/2205.11487 https://deepmind.google/technologies/imagen-2/ https://arxiv.org/abs/2408.07009 https://www.reddit.com/r/singularity/comments/1exsq4d/introducing_ideogram_20_our_most_advanced/
Consistency Models Papers
This work on LCM (Local Consistency Models) refinement contributed to the viral moment of “fast drawing” in December 2023. This work has now been updated to sCM (Stable Consistency Models).
https://arxiv.org/abs/2303.01469 https://arxiv.org/abs/2310.04378 https://www.latent.space/p/tldraw https://arxiv.org/abs/2410.11081
Sora Blog Post
Text to Video – of course, there is no paper, except for the DiT paper (same author), but it is still one of the most important releases this year, with many open-weight competitors like OpenSora. Liliana Weng’s research can also be referenced.
https://openai.com/index/sora/ https://arxiv.org/abs/2212.09748 https://artificialanalysis.ai/text-to-video/arena?tab=Leaderboard https://arxiv.org/abs/2412.00131 https://lilianweng.github.io/posts/2024-04-12-diffusion-video/
We also strongly recommend familiarizing yourself with ComfyUI (upcoming episode). Text diffusion, music diffusion, and autoregressive image generation, although niche fields, are on the rise.
Part 10: Fine-tuning
LoRA/QLoRA Papers
This is the de facto standard for fine-tuning models, whether on local models or on 4o (confirmed in the podcast). FSDP+QLoRA is a very good learning case.
https://arxiv.org/abs/2106.09685 http://arxiv.org/abs/2305.14314 https://www.latent.space/p/cosine https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html
DPO Papers
This is a popular, albeit slightly inferior to PPO, alternative method, currently supported by OpenAI as preference fine-tuning.
https://arxiv.org/abs/2305.18290 https://arxiv.org/abs/1707.06347 https://platform.openai.com/docs/guides/fine-tuning#preference
ReFT Papers
Unlike fine-tuning a small number of layers, ReFT focuses on feature fine-tuning.
https://arxiv.org/abs/2404.03592
Orca 3/AgentInstruct Papers
See the synthetic data selection at NeurIPS, but this is a very good way to obtain fine-tuning data.
https://www.microsoft.com/en-us/research/blog/orca-agentinstruct-agentic-flows-can-be-effective-synthetic-data-generators/ https://www.latent.space/p/2024-syndata-smolmodels
RL/Reinforcement Fine-tuning Papers
There is controversy over RL fine-tuning on o1, but “Let’s Verify Step By Step” and Noam Brown’s multiple public speeches provide hints on how it works.
https://www.interconnects.ai/p/openais-reinforcement-finetuning https://arxiv.org/abs/2305.20050 https://x.com/swyx/status/1867990396762243324
We recommend further understanding the entire process through Unsloth notebooks and HuggingFace’s “How to Fine-tune Open LLMs”. This is obviously an endless deep pit, and in extreme cases, it even intersects with the paths of research scientists.
https://github.com/unslothai/unsloth https://www.philschmid.de/fine-tune-llms-in-2025
https://www.latent.space/p/2025-papers#%C2%A7section-frontier-llms

Leave a Comment