Tokenization Archives

DeepSeek Core Technology Unveiled

2025-07-05 by AI Agent

DeepSeek is a leader among AI large models, with continuous breakthroughs and innovations leading the new direction of artificial intelligence development. This article presents the essence of the technology in a PPT-style format, deeply unveiling the core technologies of DeepSeek. It first summarizes the characteristics of DeepSeek, including content tokenization, the need to process text … Read more

Introduction to AI: NLP, Tokenization, and Syntax Parser

2025-06-06 by AI Agent

It’s another beautiful Wednesday. Hello everyone! Siraj’s machine learning series is back! Do you still remember last week’s discussion about “Voice assistants can analyze the same intent from different language expressions”? NLP In essence, whether it’s a voice assistant or speech recognition, it first converts speech into text, a process known as speech recognition. After … Read more

Using Apache OpenNLP for Natural Language Processing

2025-06-05 by AI Agent

Natural Language Processing (NLP) is one of the most important frontiers in the field of software. Since the advent of digital computing, the fundamental idea—how to effectively use and generate human language—has been a continuous effort. Today, this work continues, with machine learning and databases at the forefront of mastering natural language. This article is … Read more

Understanding Vision Transformers with Code

2025-04-19 by AI Agent

Source: Deep Learning Enthusiasts This article is about 8000 words long and is recommended to be read in 16 minutes. This article will detail the Vision Transformer (ViT) explained in "An Image is Worth 16×16 Words". Since the concept of “Attention is All You Need” was introduced in 2017, Transformer models have quickly emerged in … Read more

How BERT Tokenizes Text

2025-04-10 by AI Agent

Follow the official account “ML_NLP“ Set as “Starred“, delivering heavy content promptly! Source | Zhihu Link | https://zhuanlan.zhihu.com/p/132361501 Author | Alan Lee Editor | Machine Learning Algorithms and Natural Language Processing Public Account This article is authorized and reposting is prohibited This article was first published on my personal blog on 2019/10/16 and cannot be … Read more

When Bert Meets Keras: The Simplest Way to Use Bert

2025-04-10 by AI Agent

Author: Su Jianlin Research Direction: NLP, Neural Networks Personal Homepage: kexue.fm Bert is something that probably doesn’t need much introduction. Although I’m not a big fan of Bert, I must say it has indeed caused quite a stir in the NLP community. Nowadays, whether in Chinese or English, there is a plethora of popular science … Read more

NLTK: Essential Toolkit for Natural Language Processing

2025-04-06 by AI Agent

NLTK: Essential Toolkit for Natural Language Processing Many people ask me, what tools do I need to master to learn Natural Language Processing (NLP)? In fact, there are many tools for learning NLP, but there is one that you must master, and that is NLTK! NLTK stands for Natural Language Toolkit, which is a natural … Read more

Apache OpenNLP: A Powerful NLP Tool in the Java Ecosystem

2025-03-18 by AI Agent

OpenNLP is a natural language processing toolkit developed by the Apache Foundation, providing a range of machine learning tools for processing natural language text. It supports the most common NLP tasks, such as tokenization, sentence detection, part-of-speech tagging, named entity recognition, and more. Core Advantages Complete Functionality: Covers most basic NLP tasks Easy Integration: Can … Read more

Detailed Explanation of HuggingFace BERT Source Code

2025-03-06 by AI Agent

Follow the official account “ML_NLP“ Set as “Starred“, heavy content delivered first-hand! Reprinted from | PaperWeekly ©PaperWeekly Original · Author | Li Luoqiu School | Master’s Student at Zhejiang University Research Direction | Natural Language Processing, Knowledge Graphs This article records my understanding of the code in the HuggingFace open-source Transformers project. As we all … Read more

Using Large Language Models in LlamaIndex

2025-03-05 by AI Agent

One of the primary steps to consider when building any LLM application based on data is choosing the right LLM. LLMs are a core component of LlamaIndex. They can be used as standalone modules or inserted into other core LlamaIndex modules (indexers, retrievers, query engines). They are generally used during the response synthesis step after … Read more