Overview of Subword Tokenization Methods in Neural NLP

Latest Achievements Subword Tokenization Overview of Subword Tokenization Methods This article provides an overview of the Subword Tokenization methods in neural natural language processing techniques. It first explains the out-of-vocabulary (OOV) problem caused by closed vocabulary in neural network-based natural language processing methods and introduces three common solutions: Byte-Pair Encoding (BPE), WordPiece, and Unigram. Before … Read more