Click the text above △ to visit the homepage for more content
01
Table and Chart Detection – Accurate Line-Level Text Detection and Recognition in Any Language
Surya: A multilingual document OCR toolkit that enables accurate text line detection, with text recognition capabilities and support for table and chart detection, capable of handling various document types and languages. It gained nearly 2k stars within just 3 days of being open-sourced.
Project Source Code

Surya Toolkit
Installation and Usage
Installation: pip install surya-ocr, the model weights will be automatically downloaded the first time you run Surya.
Detection: You can use the following command to detect text lines in images, PDFs, or folders containing images/PDFs. This will output a JSON file containing the detected bounding boxes, and you can optionally save the page images with the bounding boxes.
surya_detect DATA_PATH --images
-
— DATA_PATH can be an image, PDF, or a folder of images/PDFs
-
— images will save the page images and detected text lines (optional)
-
— max specify the maximum number of pages to process if you do not want to process everything
-
— results_dir specify a directory to save results instead of the default directory
Performance and Limitations
Surya is suitable for every language and is specifically designed for document OCR. It may not be suitable for photographs or other images, and it does not work well with handwritten text.
Model |
Time (s) |
Time per page (s) |
Precision |
---|---|---|---|
surya |
52.6892 |
0.205817 |
0.844426 |
tesseract |
74.4546 |
0.290838 |
0.631498 |
Photo and Handwriting Recognition – Transformer-Based Optical Recognition Model TrOCR

Project Source Code
-
TrOCR Paper
-
trocr-base-printed
-
TrOCR Homepage
Handwriting Recognition
!pip install transformers
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
from IPython.display import display
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-large-handwritten")
model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-large-handwritten")
def show_image(pathStr):
img = Image.open(pathStr).convert("RGB")
display(img)
return img
def ocr_image(src_img):
pixel_values = processor(images=src_img, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
return processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
ocr_image(hw_image)
--> Output → Dean Sister, but her you, feel bad for years,
03
The Most Reliable Open Source OCR Tool – CuneiForm Cross-Platform Open Source OCR Tool
CuneiForm is one of the most reliable open-source OCR tools available today.It specializes in converting scanned documents and images into editable text.Its focus is on providing precise OCR results with respect to input sources and output formats.The tool supports multiple languages and ensures compatibility across various operating systems.
Tool Features
CuneiForm is known for its accuracy in recognizing text from scanned images. It can generate reliable OCR results even for complex documents.
Flexible input and output.CuneiForm adapts to various input sources, such as TIFF and JPEG.It also allows users to output recognized text in formats like TXT, HTML, and PDF..
Accurate Open Source OCR Tool – EasyOCR Editor

User-friendly software package. EasyOCR lives up to its name, offering a user-friendly software package. Developers can use it, especially those in the computer vision field.
Versatile text processing. EasyOCR has a diverse dataset and excels at handling various text styles. It can also easily manage different fonts and orientations.
EasyOCR uses PyTorch, which is seen as a limitation by some users. Dependencies may affect the tool’s integration with other workflows or environments.
Other Noteworthy OCR Tools
-
DocTR
-
PaddleOCR
-
MMOCR
-
Tesseract
Introduction: Focused on multimodal large models and computer vision, learn AI with Mark.
Recommended
► Top-tier SAM: From segmenting everything to recognizing everything, then evolving to perceiving everything
► High-level interview questions on multimodal large models and deep learning: novel, high-frequency, and in-depth, covering hundreds of questions across six major topics
► Practical application of large models: Fine-tuning LLM using LoRA (Low-Rank Adaptation)