Youdao Cloud Note Supports OCR Recognition: Convert Images to Text, PDF to Word

Youdao Cloud Note Supports OCR Recognition: Convert Images to Text, PDF to Word

If you are not familiar with what the new OCR feature can do for you, you must have encountered the following situations, and the OCR feature is the “magic remedy” for these “issues”. In work or research studies, dealing with a large number of paper documents, magazines, and PDF electronic materials makes it difficult to … Read more

OCR Image Recognition Using Python

OCR Image Recognition Using Python

Data collection often encounters images that can only be viewed and not copied. Manually extracting text can require a significant amount of work. For example, in the table of prices for a property development, how can one find houses with lower unit prices? It can be difficult to discern with the naked eye. Can we … Read more

Eight Common Open Source OCR Tools

Eight Common Open Source OCR Tools

Author | Chen Xiaobing Reviewed by | Chong Lou OCR (Optical Character Recognition) is a technology that automatically converts text in images into editable text. Currently, various vendors provide OCR recognition APIs for different scenarios. However, there are also several open-source OCR frameworks and tools available that support customization and training, allowing developers to flexibly … Read more

RAGFlow: Next-Gen RAG Engine Based on OCR and Document Parsing

RAGFlow: Next-Gen RAG Engine Based on OCR and Document Parsing

Click the blue text above to follow us 1. Introduction In the wave of artificial intelligence, Retrieval-Augmented Generation (RAG) technology has become a hot topic in research and application due to its unique advantages. RAG technology combines the powerful generative capabilities of large language models (LLMs) with efficient information retrieval systems, providing users with a … Read more

Alibaba’s 7B Multimodal Document Understanding Model Achieves New SOTA

Alibaba's 7B Multimodal Document Understanding Model Achieves New SOTA

mPLUG Team Contribution QbitAI | WeChat Official Account New SOTA in Multimodal Document Understanding! Alibaba’s mPLUG team has released the latest open-source work mPLUG-DocOwl 1.5, proposing a series of solutions to tackle four major challenges: high-resolution image text recognition, general document structure understanding, instruction following, and external knowledge incorporation. Without further ado, let’s take a … Read more

How to Handle Table Data in RAG Knowledge Base Documents?

How to Handle Table Data in RAG Knowledge Base Documents?

In developing the RAG system, the data formats in the knowledge base can be diverse, and most of them are unstructured data content. For example, PDF documents in the knowledge base are likely to contain table data, and our approach to handling this needs special attention to ensure that the table information can be correctly … Read more

Recent Advances in Document Image Rectification: Introducing Transformer Framework and Polar Representation

Recent Advances in Document Image Rectification: Introducing Transformer Framework and Polar Representation

2025 1/22 TextIn.com TextIn —— Focused on Intelligent Text Recognition for 18 Years —— In the article “Overview of Document Digital Capture and Intelligent Processing: Image Distortion Correction Technology”, we introduced the development and representative schemes of document image correction technology. As the demand for intelligent document processing gradually upgrades, document image de-distortion technology is … Read more

The Rise of Deepfake: What Is Synthetic Data Used For?

The Rise of Deepfake: What Is Synthetic Data Used For?

Author | Astasia Myers Translator | Sambodhi Editor | Vincent AI Frontline Introduction: We previously published an article titled “AI Startups Competing to Commercialize Deepfake” on the 4th of this month. We believe readers have already recognized that Deepfake is a double-edged sword; when used correctly, it benefits society, but when misused, it can lead … Read more