Reprinted from|PaperWeekly
©PaperWeekly Original · Author| Su Jianlin
Unit|Zhuiyi Technology
Research Direction|NLP, Neural Networks

“Profit and Loss Problems”, “Age Problems”, “Tree Planting Problems”, “Cows Eating Grass Problems”, “Profit Problems”… Have you ever been tortured by various types of math application problems during elementary school? No worries, machine learning models can now help us solve these application problems. Let’s see how many grades they can reach!
This article will provide a baseline for solving elementary math application problems (Math Word Problem), trained on the ape210k dataset [1], directly generating executable mathematical expressions using the Seq2Seq model. Ultimately, the Large version of the model achieves an accuracy rate of 73%+, surpassing the results reported in the ape210k paper.
Data Processing
Here we first observe the situation of the ape210k dataset:
{
"id": "254761",
"segmented_text": "小 王 要 将 150 千 克 含 药 量 20% 的 农 药 稀 释 成 含 药 量 5% 的 药 水 . 需 要 加 水 多 少 千 克 ?",
"original_text": "小王要将150千克含药量20%的农药稀释成含药量5%的药水.需要加水多少千克?",
"ans": "450",
"equation": "x=150*20%/5%-150"
}
{
"id": "325488",
"segmented_text": "一 个 圆 形 花 坛 的 半 径 是 4 米 , 现 在 要 扩 建 花 坛 , 将 半 径 增 加 1 米 , 这 时 花 坛 的 占 地 面 积 增 加 了 多 少 米 * * 2 .",
"original_text": "一个圆形花坛的半径是4米,现在要扩建花坛,将半径增加1米,这时花坛的占地面积增加了多少米**2.",
"ans": "28.26",
"equation": "x=(3.14*(4+1)**2)-(3.14*4**2)"
}
However, we need to do some preprocessing because the equation provided by ape210k is not always directly eval-able. For example, the expression 150*20%/5%-150 is an illegal expression for Python. The processing I did is as follows:
-
For percentages like a%, uniformly replace with (a/100); -
For mixed fractions like a(b/c), uniformly replace with (a+b/c); -
For true fractions like (a/b), remove parentheses in the problem to become a/b; -
For ratios represented by colons :, uniformly replace with /.
Model Overview
The model itself is not much to talk about, it simply takes original_text as input and equation as output, based on the architecture of “BERT+UniLM”, training a Seq2Seq model. If you have any doubts about the model, please read from language models to Seq2Seq: Transformers are like plays, all rely on Mask.
Cloud Disk Link:
https://pan.baidu.com/s/1Xp_ttsxwLMFDiTPqmRABhg
The results of the Large model are significantly higher than the 70.20% reported in the ape210k paper Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems [4], indicating that our model here is a reasonably good baseline.
Standard Output
From a modeling perspective, our task is essentially complete, as the model only needs to output the expression, and during evaluation, it only needs to check whether the result of eval-ing the expression matches the reference answer.
However, from a practical perspective, we also need to further standardize the output, determining whether to output a decimal, integer, fraction, or percentage based on different problems. This requires us to: 1) decide when to output which format; 2) convert the results according to the specified format.
The first step is relatively simple; generally, we can judge based on some keywords in the problem or equation. For example, if there are decimals in the expression, the output result is generally a decimal; if the problem asks for “how many vehicles”, “how many items”, or “how many people”, the output is always an integer; if it directly asks for “how many fractions” or “how many percentages”, then the corresponding output is a fraction or percentage.
The more difficult cases are those that require rounding, such as “Each cake costs 7.90 yuan, how many cakes can you buy for 50 yuan at most?” which requires us to take the floor of 50/7.90, but sometimes it requires rounding up. However, I was surprised to find that there are no rounding problems in ape210k, so this issue does not exist. If we encounter a dataset with rounding requirements, and the rules are challenging to judge, the most direct method is to include the rounding symbols in the equation and let the model predict.
The second step seems a bit complex, mainly concerning fractions. General readers may not know how to retain the fractional results of operations; for example, if we directly eval(‘(1+2)/4’), we get 0.75 (Python3), but sometimes we want the result to be the fraction 3/4.
In fact, retaining fractional operations falls under the category of CAS (Computer Algebra System), which means symbolic computation rather than numerical computation. Fortunately, Python has such tools, namely SymPy [5], which can achieve our goal. Please see the example below:
from sympy import Integer
import re
r = (Integer(1) + Integer(2)) / Integer(4)
print(r) # Output is 3/4 instead of 0.75
equation = '(1+2)/4'
print(eval(equation)) # Output is 0.75
new_equation = re.sub('(\d+)', 'Integer(\1)', equation)
print(new_equation) # Output is (Integer(1)+Integer(2))/Integer(4)
print(eval(new_equation)) # Output is 3/4
Article Summary
This article introduces a baseline for math application problems using the Seq2Seq model, mainly by directly converting the problem into eval-able expressions using “BERT+UniLM”, and shares some experiences in result standardization. With the BERT Large model’s UniLM, we achieved an accuracy of 73%+, surpassing the results of the original paper.
References
Download 1: Four Essentials
Reply with "Four Essentials" in the backend of the machine learning algorithm and natural language processing public account to get the learning materials for TensorFlow, Pytorch, machine learning, and deep learning!
Download 2: Shared Repository
Reply with "Code" in the backend of the machine learning algorithm and natural language processing public account to get 195 NAACL + 295 ACL2019 papers with open-source code. The open-source address is as follows: https://github.com/yizhen20133868/NLP-Conferences-Code
Heavyweight! The machine learning algorithm and natural language processing exchange group has been officially established! There are a lot of resources in the group, and everyone is welcome to join the group for learning!
Extra benefits! Deep learning and neural networks by Qiu Xipeng, official Chinese tutorial for Pytorch, data analysis with Python, machine learning notes, Chinese version of pandas official documentation, effective java (Chinese version), and other 20 welfare resources.
How to get: After entering the group, click on the group announcement to obtain the download link. Please modify the note when adding, such as [School/Company + Name + Direction]. For example —— Harbin Institute of Technology + Zhang San + Dialogue System. Please avoid adding if you are a micro-business. Thank you!
Recommended Reading:
12 Golden Rules for Solving NER Problems in Industry
Three Steps to Master the Core of Machine Learning: Matrix Derivation
Distillation Techniques in Neural Networks, Starting with Softmax