Molecular Image Recognition with Python and MolScribe

Molecular Image Recognition with Python and MolScribe

Scientific Illustration Business Consultation

Customer Service (17857127498, same WeChat number)
Please indicate ‘Scientific Illustration’ when adding friends
Email: [email protected]

Interpreting Ideas Through Images · Creating Art with Technology

Sphere Studio, Making Science More Colorful
A few days ago, I shared a website that intelligently recognizes molecular structures. After uploading an image, it can automatically convert it into a molecular file. For more details, click the image below:
Molecular Image Recognition with Python and MolScribe
However, every time I have to open this website and wait for the recognition process to complete. I find it cumbersome, especially when the internet connection is poor. Is it possible to complete this recognition process on my own computer?
Of course! For example, by installing software like Kingdraw that has built-in molecular structure recognition functionality…
Molecular Image Recognition with Python and MolScribe
However, this feels quite basic.
As a designer who has already learned to use Blender’s Python for plugin development, how could I miss out on a Python-based molecular image recognition tool? So I searched online and found a program called MolScribe. It also has its own web version, and unsurprisingly, the speed is like a turtle crawling.
Molecular Image Recognition with Python and MolScribe

ref: J. Chem. Inf. Model. 2023, 63, 7, 1925–1934

Since there is already a ready-made recognition tool, there must be a way to integrate it into my Mol3DStruct plugin. To do this, I first installed Python. Yes, I’ve been using Blender’s built-in Python for half a year, but I still don’t have the original version installed on my computer.
However, MolScribe claims to only support Python versions above 3.7 and below 3.11, so I downloaded version 3.10.6 for testing. I need to understand how this recognition program works before I can add it to the Blender plugin’s functionality.
At this point, I suddenly remembered a question. The built-in Python version in Blender isn’t above 3.11, right? If so, wouldn’t that be a waste of effort?
I opened Blender’s Python console window and carefully typed:
Molecular Image Recognition with Python and MolScribe
3.10.13! Thankfully, thankfully.
Next, I can confidently study how to use MolScribe.
According to its instructions on Github, I need to download MolScribe first. Press the shortcut key win+R to open the run window, type cmd, and press Enter.
Molecular Image Recognition with Python and MolScribe
Then type pip install MolScribe, and press Enter to run.
Molecular Image Recognition with Python and MolScribe
It will install a bunch of things into the Python installation path’s /lib/site_packages folder. These are all third-party libraries used for molecular recognition.
Molecular Image Recognition with Python and MolScribe
Next, there is a large model that a developer has already trained, referred to as the MolScribe checkpoint. It is hosted on the Hugging Face website, so it prompts us to install another huggingface_hub library for downloading the large model.

Molecular Image Recognition with Python and MolScribe

The installation of the huggingface_hub library is the same as above; type pip install huggingface_hub, and press Enter.
Molecular Image Recognition with Python and MolScribe
These are the preparatory steps; now we can open the Python application.
First, we need to import a few external libraries, as shown in the red box below. Of course, these were already installed when we installed MolScribe.
Molecular Image Recognition with Python and MolScribe
Next, we download the model trained by the developer, which will automatically save to the C:/Users/***/.cache folder. This ckpt_path is actually just a path. The first time you run it, there will be a download display, with a size of 1.13 GB. After downloading, you can save it elsewhere to avoid taking up space on the C drive.
Molecular Image Recognition with Python and MolScribe
Then we call the recognition function in MolScribe. Here we directly create a MolScribe class, with the previously mentioned model storage path as one of the variables. Note that I have already stored swin_base_char_aux_1m.pth elsewhere, so I changed the path (as shown below).

>>> path = ‘S:/Python/swin_base_char_aux_1m.pth’

>>> model = MolScribe(path, device=torch.device(‘cpu’))

Now that we have the model, we also need an image to recognize, which should also be in the format of path+filename.

>>> img_file=’C:/Users/24430/Desktop/mol_images/test_2.png’

Molecular Image Recognition with Python and MolScribe

How to recognize it? It’s a one-liner.

>>> output = model.predict_image_file(img_file)

The output is a dictionary, which by default does not include the ‘atoms’ and ‘bonds’ items. However, that’s okay; we only need the ‘molfile’. If you want the ‘atoms’ and ‘bonds’ items, you can add return_atoms_bonds=True in the predict_image_file() function.
Molecular Image Recognition with Python and MolScribe
The output is as follows:
>>> block = output[‘molfile’]
>>> print(block)
This is the mol format molecular data.
Molecular Image Recognition with Python and MolScribe
Exporting as a molecular file is also easy; just use the rdkit library’s Mol-related functions. Specifically:

>>> from rdkit import Chem

>>> m = Chem.MolFromMolBlock(block)

>>> Chem.MolToMolFile(m, ‘C:/Users/24430/Desktop/mol_images/test_2.mol’)

Open it and see if it matches the data above perfectly.
Molecular Image Recognition with Python and MolScribe
With the mol file, we can open it in Blender using the Mol3DStruct plugin. The molecule recognized from the image seems to have overly long bond lengths.
Molecular Image Recognition with Python and MolScribe
But that’s okay, since with the Mol3DStruct plugin, we don’t even need the molecular image recognition to be completely accurate. Because after importing, atoms and chemical bonds can be easily modified. For example, here, you can select all points or edges in edit mode and scale them to the appropriate size.
Molecular Image Recognition with Python and MolScribe
After scaling, click on Smart Display, then click Update Conformation, and a three-dimensional molecular stick model will be created.
Molecular Image Recognition with Python and MolScribe
I tried a few molecular images, and the results were quite good:
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe

Molecular Image Recognition with Python and MolScribe

Overall, there weren’t any major issues. Just need to install some Python libraries and also make some space for the large model. After all, to use advanced features, you have to pay some price.
Next, I will integrate this into my Mol3DStruct plugin. I spent an entire day on this, mainly because the installation of Python libraries in Blender is somewhat different from using Python applications directly. I almost tried each one individually, but I won’t go into detail about the specific process.
The most troublesome part was a library called cv2 (opencv-python), because the version installed by default with pip is the latest version 4.9.0.80. But this version is not compatible with Python 3.10, so I kept getting Import errors:
Molecular Image Recognition with Python and MolScribe
When encountering problems, of course, you have to find a way to solve them. But don’t forget, compared to professional programmers, I’m still a beginner!
Even worse, there was no one to ask, so I had to search various posts for reasons. Later, I simply took various versions of the cv2 library and tried them one by one, and finally found a suitable one—opencv_python-4.5.4.58-cp310-cp310-win_amd64.whl.
The subsequent process went much smoother, mainly involving exporting and reading mol files, converting to stick models, etc. These are all things I am quite familiar with. Thus, the functionality demonstrated in the following video was created:
The molecular image AI recognition modeling feature is expected to be released in the next version of Mol3DStruct. Are you all looking forward to it?
For friends who know nothing about the Mol3DStruct plugin, please check today’s previous tweet; you need to hurry up and get on board.Today’s sharing ends here; thank you all for reading~

Scientific Illustration Business Consultation

Customer Service (17857127498, same WeChat number)
Please indicate ‘Scientific Illustration’ when adding friends
Email: [email protected]

Interpreting Ideas Through Images · Creating Art with Technology

Sphere Studio, Making Science More Colorful

P.S. Our new book ‘3D Scientific Illustration’ is officially online and can be purchased on Dangdang or JD.com.

[JD New Book Best-Selling Monthly List No. 3] Computer and Internet/Software Engineering and Software Methodology

Molecular Image Recognition with Python and MolScribe

Dangdang Purchase Link:

http://product.dangdang.com/29610422.html

JD Purchase Link:

https://item.jd.com/14092196.html

Book Case File Download Address:

Link: https://pan.baidu.com/s/1I2jDwDWxebRICusYzxuogQ?pwd=9787

Extraction Code: 9787

Molecular Image Recognition with Python and MolScribe

Previous Works Showcase:
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe
Previous Works Showcase:
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe
Molecular Image Recognition with Python and MolScribe

Sphere—Bridging Science and Art.

Hangzhou Sphere Technology Co., Ltd. has long focused on the field of scientific visualization, providing professional services in scientific image design and scientific animation production. The company has a professional team of master’s and doctoral graduates covering biology, chemistry, materials, and art design, with a strong foundation in both science and art. Since 2016, it has provided design services to hundreds of universities and research institutions worldwide, including the Chinese Academy of Sciences, MIT, Stanford, ETH Zurich, EPFL, Tsinghua University, Peking University, Cornell University, etc., gaining international recognition. The company adheres to the purpose of interpreting ideas through images and creating art with technology, with works selected by journals such as Science, Nature, Cell, JACS, and Angew, and has been invited to produce animations for them. To better serve researchers, the company also offers long-term training courses in scientific illustration. Additionally, we have the largest WeChat public account for scientific illustration teaching in China, ‘3D Scientific Illustration,’ and a Q&A community, providing detailed tutorials and consultation services. We have provided professional illustration lectures and training for institutions such as Zhejiang University, Fudan University, Wuhan University, Tongji University, Shanghai University of Science and Technology, and the Dalian Institute of Chemical Physics, Chinese Academy of Sciences, with a total audience of over 100,000.

▼ Reference prices for journal covers and animations are shown in the images below
Molecular Image Recognition with Python and MolScribe

Molecular Image Recognition with Python and MolScribe

Leave a Comment