How to Combat Image Recognition Models in Cybercrime

How to Combat Image Recognition Models in Cybercrime

The full text is 8799 words, estimated reading time is about 25 minutes, recommended to save for later reading.

Captcha codes inherently carry a layer of semantic meaning in the answers, which is a natural distinction between humans and automated programs. Regardless of the method used to crack the captcha, the answer must be identified.
Through years of confrontation with cybercriminals, we found that defending against cybercrime by restricting their access to captcha images not only reduces operational costs and minimizes disturbances to customers but also allows us to maintain the initiative in the ongoing battle. After 11 years of practical experience in combating cybercrime, we summarize that: There are mainly two methods for cybercriminals to obtain captcha answers—exhaustive image cracking and model-based image cracking. The difference between model cracking and exhaustive cracking is essentially an improvement in efficiency for cybercriminals.
In the previous issue, we discussed how to deal with cybercriminals by traversing verification image resources for exhaustive cracking.

Click the image below to review the complete content of the previous issue.

How to Combat Image Recognition Models in Cybercrime
This issue, we will focus on model cracking and present the points of attack and defense: image recognition model countermeasures. We will analyze this from the perspectives of the attacker (cybercriminals), the attacked (customers), and the defender (the technology provider) to fully understand this point of contention.

Points of Attack and Defense

How to Combat Image Recognition Models in Cybercrime

In the previous issue, we talked about exhaustive traversal of verification images, during which we mentioned manual captcha solving. Manual captcha solving, also known as anti-CAPTCHA, refers to the practice where cybercriminals send captcha requests to manual solving platforms, where real people complete them, thus helping cybercriminals bypass captchas through human crowdsourcing.

“Those who haven’t read the first issue can click the link above to jump to the first issue.”

How to Combat Image Recognition Models in Cybercrime

However, manual input of captcha answers is time-consuming and problematic. In the entire cybercrime cheating chain, manual solving accounts for over 60% of the time. For example, cracking a batch of 300,000 verification images takes cybercriminals 8.33 hours to download this batch, while obtaining answers through low-cost manual solving requires 208.33 hours. Moreover, the combination of image traversal and manual solving is a method that requires continuous investment of time and money; once the verification provider updates the image set, cybercriminals must download a new batch of images repeatedly, incurring ongoing costs to pay for manual solving.
So how can these time and cost expenditures be reduced, and the proportion of manual solving minimized? Cybercriminals have devised a method—automating the recognition of image answers. That is, after obtaining a batch of image answers through traversing verification image resources, they conduct automated model training to achieve automated countermeasures.

How to Combat Image Recognition Models in Cybercrime

Compared to the image traversal and manual solving method, which becomes ineffective with automatic updates to the image set, cybercriminals only need to establish an accurate image recognition model; even if the verification provider dynamically updates the image set later, as long as the style of the icons on the images remains unchanged, the recognition model can still accurately identify the images, greatly improving efficiency.
How to Combat Image Recognition Models in Cybercrime

Before the update: Image A

How to Combat Image Recognition Models in Cybercrime

After the update: Image B

For example, if cybercriminals train a model capable of recognizing Image A, even if the verification provider updates the image set to Image B, the model can still recognize it. Furthermore, each time the image set is updated, as long as the icon style remains unchanged, the model can continuously recognize these images. This directly helps cybercriminals eliminate the process of scraping and downloading image sets, significantly reducing the time and cost required to find manual solving, thereby improving attack efficiency.

Benefits of Image Recognition Models from the Cybercriminal Perspective

How to Combat Image Recognition Models in Cybercrime

Using automated methods to recognize image answers is not a necessary means for attackers. However, if they can reduce the proportion of manual solving and largely use automated recognition, it will significantly shorten the attack time for cybercriminals and enable more sustained and effective recognition of image answers. So how do cybercriminals obtain an accurate and efficient model, and how do they utilize this model for automated attacks?

How to Combat Image Recognition Models in Cybercrime

Attack Principles

Let’s first understand the principles of the image recognition model. Currently, the model most commonly used for image recognition is CNN.

CNN (Convolutional Neural Networks) is a deep learning model used for image processing and analysis tasks. Its basic principle is to extract features from data and then use these features to recognize and classify data. It can be used for complex image recognition tasks such as image analysis, behavior recognition, and natural language processing. Here is an introduction to the basic principles of CNN:

1. Convolutional Layer: The convolutional layer uses a series of filters (convolutional kernels) to perform convolution operations on the input image. The filters slide over the image and compute the convolution results at each position, capturing local spatial information of the image. The convolution operation can be implemented through matrix multiplication and addition.

2. Activation Function: The output of the convolutional layer often goes through a nonlinear activation function, such as ReLU (Rectified Linear Unit), to introduce nonlinear transformations and increase the model’s expressiveness.

3. Pooling Layer: The pooling layer is used to reduce the size of the feature map and extract the main features. Common pooling operations include max pooling and average pooling. Pooling operations can reduce the number of parameters, improve computational efficiency, and enhance the model’s translational invariance.

4. Fully Connected Layer: The output from the pooling layer is connected to the output layer through a fully connected layer for final classification or regression. The fully connected layer flattens the feature map into a vector and performs calculations through matrix multiplication and addition.

5. Loss Function: The training process of CNN optimizes the model by minimizing the loss function. Common loss functions include cross-entropy loss and mean squared error loss, which measure the difference between the model’s predictions and the true labels.

6. Backpropagation: CNN uses the backpropagation algorithm to compute the gradient of the loss function with respect to the model parameters and updates the parameters through gradient descent. Backpropagation uses the chain rule to propagate the error from the output layer back to the input layer to update the parameters of both the convolutional and fully connected layers.

How to Combat Image Recognition Models in Cybercrime

Attack Process

From the perspective of cybercriminals, let’s experience the complete process of image recognition model training and attack:

Step 1: Cybercriminals initiate an attack targeting the web registration scenario of a gaming company.

How to Combat Image Recognition Models in Cybercrime

Step 2: Cybercriminals frequently send requests to the page interface to obtain verification image addresses.

How to Combat Image Recognition Models in Cybercrime

Step 3: Batch download and store verification images, requiring approximately 50,000 to 100,000 images.

How to Combat Image Recognition Models in Cybercrime

Step 4: Manually label the answers in the verification images.

How to Combat Image Recognition Models in Cybercrime

Step 5: Build the model’s network structure and write the model training code.

How to Combat Image Recognition Models in Cybercrime

How to Combat Image Recognition Models in Cybercrime

Step 6: Train the model using the labeled verification images and answer coordinates (the training device requires a graphics card).

The following image shows the model training process: training loss and validation loss steadily decrease with model iterations, ultimately converging; training accuracy gradually increases, eventually approaching 100%, while validation accuracy converges around 90%.

How to Combat Image Recognition Models in Cybercrime

Step 7: After cybercriminals train a highly accurate recognition model, they can directly obtain answer coordinates using the recognition model during subsequent captcha cracking.

How to Combat Image Recognition Models in Cybercrime

How to Combat Image Recognition Models in Cybercrime

Attack Methods

Cybercriminals, in order to reduce attack costs and improve cracking efficiency, usually utilize a large number of collected captcha image samples to label answers, build model networks, train models, and test models, thus obtaining a sustainable recognition of image answers. Currently, the most commonly used methods for cracking captcha images by cybercriminals are classification models and similarity models.

1. Classification Model:

A classification model refers to the detection of icon positions by cybercriminals and the classification of target elements and prompt labels. The classification model first encodes the features of the detected target elements and prompt icons separately to obtain feature representations, and then calculates the probability of possible categories using the softmax function, taking the highest probability value and assigning that image to each known category, thus enriching the model’s material.

The image below shows the complete training process of the classification model. Through icon extraction, feature encoding, and probability calculation, it determines that the probability of the target element being a switch is 0.91, and the probability of the prompt label being a switch is 0.86, thus classifying that image as a switch category.

How to Combat Image Recognition Models in Cybercrime

The cracking recognition mechanism of the cybercriminal classification model: during model training, the model is made to cover all existing icon elements comprehensively, and at least 50,000 images are trained. Once the model is trained and put into use, it can accurately and quickly locate the correct answer coordinates as long as it encounters an image element seen during training.

How to Combat Image Recognition Models in Cybercrime

The image above demonstrates the effectiveness of the classification model in recognizing captcha answers, where the classification model can automatically label the categories such as “book-open” and “add-on” under four icon elements, quickly locating the image answers.

2. Similarity Model:

A similarity model refers to the detection of icon positions by cybercriminals, encoding features of detected target elements and prompt labels, and then calculating the similarity between the two; those with high similarity can be considered the same type of icon.

How to Combat Image Recognition Models in Cybercrime

Click to view the larger image.

How to Combat Image Recognition Models in Cybercrime

The image above demonstrates the effectiveness of the similarity model in recognizing captcha answers. The similarity model does not need to know what the three icon elements are; it simply calculates the similarity between the prompt labels and target elements to determine the click order for passing the captcha.

3. Object Detection:

Object detection refers to the detection and recording of icon positions by cybercriminals, enumerating to determine the correct click order. This is the method with the lowest attack cost for cybercriminals. Currently, free object detection models available on the market can accurately detect the locations of target elements, and by using enumeration methods, randomly select a click order to send requests; if the verification is successful, the correct answer has been found; otherwise, continue trying. If a captcha has three target elements, the maximum number of attempts required to find the correct answer is six.

How to Combat Image Recognition Models in Cybercrime

The image above demonstrates the effectiveness of object detection in recognizing captcha answers. If there are three target elements in the background image, then the maximum probability of cybercriminals successfully enumerating is = 1 / A(3,3) = 1/(3x2x1) = 1/6, thus at most six attempts can yield the correct answer.

How to Combat Image Recognition Models in Cybercrime

Profit Methods

From the perspective of cybercriminals, why does using models for cracking cost less than manual solving? Cybercriminals require 8.33 hours to download a batch of 300,000 verification images; then, obtaining image answers through manual solving costs about 1.4 cents per image, totaling 4200 yuan, taking 208.33 hours. Although cybercriminals can generate their own database of image answers and can bypass verification smoothly for a month as long as the image set remains unchanged, once the verification provider dynamically updates the image set, cybercriminals must again spend 4200 yuan and nine days to find manual solving.
In contrast, image recognition models directly help cybercriminals save time and financial costs associated with manual solving. Cybercriminals only need to spend 8.33 hours downloading and storing this batch of images, utilizing technological means to train the model, continually trying and adjusting the network structure of the model, and using graphics cards to enhance training speed to obtain an effective model. When using the recognition model later, recognizing a brand new image only takes 0.1 to 0.2 seconds; even if the verification provider dynamically updates the image set, as long as the icon style does not change, this model remains effective. It can be said that model recognition is at least 25 times more efficient than manual solving.

How to Combat Image Recognition Models in Cybercrime

With cybercriminals achieving a 25-fold increase in efficiency, although it appears to be just a difference in data metrics, it poses a grave threat to the attacked party, mainly reflected in the following three aspects:
1. With the increase in efficiency, the time and financial threshold for attacks is greatly reduced. The number of cybercriminal groups flooding the market has surged, causing businesses to go from being targeted by one or two cybercriminals to being targeted by seven or eight groups simultaneously, significantly increasing the probability and amount of losses, potentially paralyzing the entire business.
2. Increased efficiency brings repeated threats to the business. Even if the verification provider successfully defends against the first dynamic update of the verification image set, when cybercriminals train their models, they can return with renewed vigor, making it difficult to control.
3. With lower costs and higher profits, cybercriminals’ willingness to continue cracking increases. High profits stimulate cybercriminals to further develop new technologies for model training, leading to increased difficulty in combating cybercrime.
For cybercriminals, the largest cost of recognition models lies in the technical costs of adjustment, training, and testing. Thus, after initially training a model, there is no need for continuous investment of time and money, significantly reducing the attack costs for cybercriminals and expanding their “profit space.” When cybercriminals’ manual solving attacks fail, they usually adjust their attack strategy to train image recognition models for automated and efficient cracking.

How Companies Can Respond to Cybercriminal Image Recognition Models

How to Combat Image Recognition Models in Cybercrime

As cybercriminals reduce the proportion of manual solving and switch to more cost-effective and efficient automated models for cracking, what issues will the attacked customers face? Let’s switch to the customer perspective to see the process of being attacked by cybercriminals.

How to Combat Image Recognition Models in Cybercrime

Problem Occurrence

How to Combat Image Recognition Models in Cybercrime

First Time Period: On March 10th in the morning, the customer success team of the technology provider conducted routine safety monitoring and discovered that the data of a certain gaming company (H) showed abnormalities, with the average number of captcha interactions reaching nearly 30,000 per hour.
Second Time Period: Since H company’s business interface deployed captchas, cybercriminals could not directly attack the business interface and had to crack the captcha first. During this stage, the captcha played a significant role in protecting the business interface. From 9 AM to 12 PM on March 10th, with the captcha’s protective role, the interaction and pass rates gradually decreased to normal levels.
Third Time Period: At 2 PM on March 10th, the data suddenly exhibited abnormal changes. Cybercriminals continuously requested captchas, conducting resource traversal of verification images. Consequently, the technology provider immediately implemented a dynamic image set update strategy for the customer. On March 10th at 9 AM, images of the same material and style were officially updated, and the dynamic update of the image set began to take effect. As the cybercriminals’ previous answer database became invalid, the pass rate significantly decreased, and the data returned to normal levels.
Fourth Time Period: At 12 PM on March 11th, the request and interaction volume from cybercriminals surged again. It is speculated that cybercriminals adjusted their attack methods after the first database became invalid, switching from image resource traversal and manual solving to automated cracking using image recognition models. From the pass rate, although cybercriminals may have completed training their models, their success rate against the technology provider’s captcha module was still not high, only around 50%. However, even with a pass rate of only 50%, it poses a heavy blow to the customer’s business, causing significant losses to H company’s economic benefits and reputation.
How to Combat Image Recognition Models in Cybercrime

Problem Localization

This prolonged confrontation with cybercriminals demonstrates that the struggle is not a one-time affair, but rather a continuous “war of attrition.” The technology provider’s security experts analyzed the attack methods and characteristics of cybercriminals during each time period, discovering that after being intercepted by captchas, cybercriminals first traversed the verification resource images, continuously requesting captchas online to scrape and exhaust the online captcha material images (third time period). Once the dynamic update of the image set took effect and the cybercriminals’ answer database became invalid, they began training image recognition models to quickly identify image answers using automated models. (fourth time period)
1. In the fourth time period, at the moment the dynamic update of the image set went live, the technology provider’s security experts observed through real-time statistical data that there were no fluctuations in abnormal data, ruling out the possibility that cybercriminals had pre-built an answer database.
2. By analyzing the captcha logs, it was found that even with dynamic updates to the verification images, the answer coordinates sent by different requests for the same new image remained completely identical, and the speed of sending answers was very rapid.
3. Research into various cracking forums and communities revealed that there are indeed some cybercriminals who train recognition models using convolutional neural networks, and then obtain verification image answers through these recognition models.
Thus, we confirmed the speculation of cybercriminals using model cracking in the fourth time period. At this point, it is imperative to implement counter-recognition measures against cybercriminals’ image recognition, causing their recognition programs to exhibit visual discrepancies.
When companies encounter such problems, how can they accurately determine whether the cracking is through exhaustive methods or model-based? We categorize the different phenomena reflected by these two cracking methods on the business side for quick problem localization in the future.

How to Combat Image Recognition Models in Cybercrime

When encountering cybercriminal cracking, if the first dynamic update of the image set is effective and cybercriminals do not initiate a second wave of attacks, the problem can be localized as ① image exhaustive cracking (image traversal + manual solving);

If the first dynamic update of the image set is only effective for a short time, and hours later, cybercriminals successfully launch a second wave of attacks, and the data rebounds immediately after recovery, the problem can be localized as ② image model cracking (image traversal + automated image recognition), requiring new solutions.

How the Technology Provider Responds to Cybercriminal Image Recognition Models

How to Combat Image Recognition Models in Cybercrime

Faced with the continuous switching of attack strategies by cybercriminals, which poses serious threats to customer businesses, how should defenders respond to cybercriminal model cracking and limit the efficiency of their attacks? Based on image model cracking, the technology provider has accumulated service experience from 360,000 customers and summarized a very mature visual discrepancy countermeasure plan.

How to Combat Image Recognition Models in Cybercrime

Defense Strategy

1. Visual Discrepancy

How did the technology provider’s security experts come up with the idea of visual discrepancies? We know that humans experience visual discrepancies, often seeing images that contradict reality due to factors such as perspective and environmental changes. Similarly, the image recognition models trained by cybercriminals, as fixed data models, can also exhibit so-called visual discrepancies due to human interference and limitations. The following image illustrates examples of human visual discrepancies:

How to Combat Image Recognition Models in Cybercrime
How to Combat Image Recognition Models in Cybercrime
How to Combat Image Recognition Models in Cybercrime
In the first issue about traversing verification image resources, the defenders only needed to update the image set once an hour to effectively thwart cybercriminals; at this time, the efficiency of the defender’s dynamic updates exceeded that of the attackers’ ability to regenerate their answer database. However, when the attackers trained image recognition models for automated recognition, recognizing a brand new image only takes 0.1 to 0.2 seconds, even exceeding the speed of updating image sets. Defensive effectiveness becomes less obvious, and the back-and-forth between attackers and defenders easily devolves into a tug of war. Therefore, defenders must further improve efficiency, causing the attackers’ image recognition programs to exhibit visual discrepancies, making it impossible to identify the corresponding answers to the verification images.

2. Model Analysis

Analyzing the working principles of image recognition models requires using object detection models to identify the positions of target elements, which includes two tasks:

(1) Predicting the bounding box (bbox) of target elements;

(2) Predicting whether a given location is background or a target element, i.e., binary classification. The visual deception of object detection models primarily focuses on the second task, which is to deceive the model into predicting target elements as background.

Typically, industry uses object detection models to identify positions of target elements, such as num_classes=1. Below is a structure diagram of the Yolov3 object detection model. The model inputs an image of size 416×416, passing through the darknet base network and the yolo object detection network, with output dimensions of: 13x13x(5+num_classes), 26x26x(5+num_classes), and 52x52x(5+num_classes), satisfying the detection needs of target elements of different sizes.

How to Combat Image Recognition Models in Cybercrime

Humans cannot imagine unseen images, nor can CNN models. For a category of icons never seen before, softmax will forcibly predict it into a known category, which is, of course, incorrect. This is a natural flaw of classification models.

As shown in the image below, the model encoder outputs a 10-dimensional vector [-4.28, 2.97, -0.39, 5.25, -7.57, -3.43, 8.64, 2.63, 6.30, 0.68] for the “espresso” image, corresponding to the 10 categories to be predicted. By normalizing each element between 0-1 using the softmax function, it can represent the probability of the image being classified into the corresponding category, with the probability of being classified as “espresso” being 0.8798.

How to Combat Image Recognition Models in Cybercrime

Similarity models only need to calculate the similarity between target elements and prompt icons, determining whether they are the same thing based on the similarity level, thus compensating for the flaws of classification models. Training similarity models aligns more closely with how humans learn new things. For instance, when a child learns to recognize a new object like a “cat,” it is not an inductive reasoning process but rather an estimation of similarity, finding the image with the highest similarity to the given cat image.

How to Combat Image Recognition Models in Cybercrime

Having analyzed the similarity model, we can construct icon styles for defense. By applying different types of contour interference to target image elements, we can restrict the model’s matching of image elements.
The image below shows the prediction results of the similarity model on two different styles of icon-based captchas. The left-side icon style is distinctive, with high similarity between two identical icons and low similarity among different icons, making it easy for the recognition model to automatically match the three target elements with the three prompt labels; whereas the right-side applies a floral shell to all characters, causing interference during feature encoding, resulting in high similarity between randomly selected pairs of icons. Therefore, it becomes challenging to accurately match the three target elements with the three prompt labels.

How to Combat Image Recognition Models in Cybercrime

How to Combat Image Recognition Models in Cybercrime

Defense Methods

To cause visual discrepancies in cybercriminal models and limit their matching of image elements, the technology provider has stored four types of images and generated as many as fifteen types of heterogeneous parallax countermeasure variants in actual confrontations.
1. Countermeasures for Classification Models:
The recognition mechanism of cybercriminal classification models involves comprehensive coverage of existing icon elements during model training. Once the model is trained and put into use, it can accurately and quickly locate the correct answer coordinates upon encountering any image element seen during training.
To counter the classification model’s cracking, we periodically update image elements comprehensively, making previous classification recognition models unable to identify answers in the images. The effect is illustrated in the following images:
How to Combat Image Recognition Models in Cybercrime

New Before

How to Combat Image Recognition Models in Cybercrime

Updated

After replacing the old image elements, some image elements can no longer be recognized by the original classification model. Even if the model can locate the position of the image elements, it cannot correctly match their order with the prompt labels.
2. Countermeasures for Similarity Models:
Similarity models enhance the recognition mechanism beyond classification models, extracting prompt labels from the prompt box and scanning their contours to match against the images.
To counter similarity model cracking, we also apply contour interference processing to image elements and test these similarity models.
(1) Floral Type:
Floral type verification images wrap different elements within floral patterns, preventing cybercriminal similarity models from directly recognizing and locating answer positions through the outer contours of the elements. The effectiveness of floral type image elements against similarity model cracking is illustrated in the following images:
How to Combat Image Recognition Models in Cybercrime

Example 1:

How to Combat Image Recognition Models in Cybercrime

Example 2:

How to Combat Image Recognition Models in Cybercrime

Example 2:

In Example 1, the cybercriminal similarity model can locate the positions of the three answer elements in the prompt words, but the actual recognized order is incorrect (correct order: 53P, actual recognized order: P53).
In Example 2, the cybercriminal similarity model only located the positions of two answer elements (5P), while one answer position could not be located (3).
(2) Honeycomb Type:
Honeycomb type verification images build upon floral types by adding judgments of the relative positions and numbers of elements, requiring cybercriminals to analyze the shapes, quantities, and relative positions of elements within the grid to determine the correct answer, significantly increasing the difficulty for recognition and positioning by cybercriminal models. The effectiveness of honeycomb type image elements against similarity model cracking is illustrated in the following images:
How to Combat Image Recognition Models in Cybercrime

Example 3:

How to Combat Image Recognition Models in Cybercrime

Example 3:

How to Combat Image Recognition Models in Cybercrime

Example 4:

In Example 3, the cybercriminal similarity model can locate the positions of four answer elements in the prompt words, but the actual recognized order is incorrect (correct order: ✖↓▲●, actual recognized order: ●✖ ↓▲).
In Example 4, the cybercriminal similarity model only located one answer position (●), while three answer positions could not be located (✖↓▲).
(3) Geometric Type:
Geometric type verification images use basic elements as templates, randomly combining different sizes of basic templates into various shapes to prevent models from directly locating answer positions through contours, and the random combinations of different sizes of basic templates interfere with the order judgment of prompt words in the prompt box.
How to Combat Image Recognition Models in Cybercrime

Images

How to Combat Image Recognition Models in Cybercrime

First Similarity Model

How to Combat Image Recognition Models in Cybercrime

Second Similarity Model

How to Combat Image Recognition Models in Cybercrime

Second Similarity Model

In the image above, the first similarity model can locate four answer positions but the recognized order is incorrect (correct order: ↓η⚙🔀, recognized order: ↓🔀⚙η).
The second similarity model locates two answer positions (η⚙) while two answer positions remain unlocated (↓🔀).
(4) Mixing Multiple Types of Images:
By flexibly combining floral, honeycomb, and geometric types, we can generate as many as fifteen different heterogeneous parallax countermeasure types, maximizing the difficulty of cracking for cybercriminals, as shown in the following images:
How to Combat Image Recognition Models in Cybercrime

Example 3:

How to Combat Image Recognition Models in Cybercrime

Example 3:

How to Combat Image Recognition Models in Cybercrime

Example 4:

In the first issue about traversing verification image resources, other verification providers on the market update their verification image sets only once a month, unable to intercept the first wave of cybercriminal exhaustive cracking, let alone face the second wave of image model cracking, lacking the capacity for model countermeasures;
In contrast, the technology provider can achieve high-frequency and efficient updates of verification image sets every hour, effectively intercepting the first wave of image exhaustive cracking, and can also apply contour interference processing for model visual discrepancies, pre-storing four types of different images and generating up to fifteen types of heterogeneous parallax countermeasure variants, effectively limiting the model’s matching of image elements and successfully preventing the second wave of cybercriminal image model cracking.

How to Combat Image Recognition Models in Cybercrime

How to Combat Image Recognition Models in Cybercrime

Effective Defense

The technology provider possesses four types of rich image types and fifteen types of heterogeneous parallax countermeasure variants, which allows for the rapid production of thousands of images in a short time to counter various ever-emerging recognition models used by cybercriminals. For different customers encountering different model cracking situations, we only need to adjust the corresponding parameters under the customer account in the management backend, switching to different types of verification images, effectively defending against cybercriminal attacks. This enables efficient batch production of images, achieving minute-level response times from production to online updates.
How to Combat Image Recognition Models in Cybercrime
After detailed communication with the security officer of Company H, we reviewed the entire attack and defense process and discovered that initially, the icon styles were not updated during the dynamic update of the image set. Even if a new batch of images was updated, the model trained by cybercriminals remained effective. Thus, on March 14th, 2023, at 4 PM, we specifically created and updated a batch of entirely new verification icon material models for Company H’s account ID, replacing all corresponding image sets with floral types.

How to Combat Image Recognition Models in Cybercrime

After updating to the high-difficulty icon verification model, the models that previously allowed cybercriminals to recognize image answers became ineffective, no longer able to recognize this new batch of images. Cybercriminals had to retrain new models according to the processes of collecting images, labeling answers, building model networks, training models, and testing models. The technical threshold and uncertainty of training models are quite high, so cybercriminals incur significant costs each time they train a new model.

How to Combat Image Recognition Models in Cybercrime

At 10 AM on March 15th, after several attempts, cybercriminals were unable to pass the new verification images and thus ceased their attacks, and the verification data volume returned to normal business levels. We continued to monitor the data performance for an hour afterward and found that cybercriminals did not initiate further attacks. The new material model had a very significant interception effect on the cybercriminals’ cracking methods, and Company H’s business ultimately returned to normal.

How to Combat Image Recognition Models in Cybercrime

How to Combat Image Recognition Models in Cybercrime

Record of conversations between the technology provider and the security officer of Company H.

How to Combat Image Recognition Models in Cybercrime

More Defense Methods

In this discussion, we have only mentioned position recognition for image understanding; there are actually other effective defense methods and various visual discrepancy misdirection algorithms behind them, which are solutions we have accumulated over eleven years of combating cybercrime.
1. Application of AIGC to Limit Model Recognition
The technology provider not only stores four types of images and possesses greater capabilities to counter cybercriminal models, but is also constantly exploring the balance between visual aesthetics and artistic experience, pursuing security and user experience.
SD light and shadow text is an artistic image that is more resistant to recognition than artistic QR codes, primarily created using the ControlNet plugin of Stable Diffusion. We can apply light and shadow text to text selection verification, which can make the strokes of the text very ambiguous, often generating incorrect characters that humans can recognize, while recognition models fail to identify due to incorrect strokes or low accuracy, thus causing visual discrepancies in cybercriminal recognition models, preventing them from passing verification.

How to Combat Image Recognition Models in Cybercrime

How to Combat Image Recognition Models in Cybercrime

The images above show examples of the technology provider applying light and shadow text in captcha. Upon closer inspection, the characters “ice,” “grab,” and “iron” are formed by the illusions of clouds and buildings’ light and shadow; while human eyes can identify them, cybercriminal recognition models fail to do so due to incorrect strokes, thus unable to pass verification.

How to Combat Image Recognition Models in Cybercrime

The images above illustrate the process of the technology provider using SD light and shadow text to generate images, adjusting parameters step by step to find the optimal balance between security and user experience. Currently, the technology provider is working to resolve the issue of high-quality batch image generation and is continually exploring artistic creation combining AIGC and captchas, expected to be officially deployed in practical applications in the next version update.

How to Combat Image Recognition Models in Cybercrime

How to Combat Image Recognition Models in Cybercrime

Black and white light and shadow text further enhances the difficulty for cybercriminal models to recognize.

2. Tracking Cybercriminal Algorithm Dynamics for Monitoring
Combining data feedback from confrontations and intelligence gathered from major forums and communities, we test and analyze the cracking algorithms of cybercriminals and adjust verification images based on the implementation logic and principles of these algorithms.
3. Proactive Update of Image Set Models for Business Activities
When clients have promotional or marketing events, we can conduct a full update of the image set models used by the client a few minutes or even seconds before the event starts, effectively catching cybercriminals off guard, providing excellent defense; when cybercriminals update their cracking models or scrape images for traversal, the business activity has already concluded.

Conclusion

The answers to captchas are a segment that no cybercriminal can bypass; ultimately, the speed of online image updates and the resistance of images to model training are key. Updating our image set models, in a sense, follows the route of rendering cybercriminals “unprofitable, leading them to abandon their efforts.” It is worth noting that we also need to manage the frequency and quantity of updates carefully, as updating too frequently can actually benefit cybercriminals by allowing them to acquire all resources easily and quickly. Currently, our image set updates can achieve minute-level responses from production to online publication.
Currently, the technology provider has over 360,000 global clients, with image set updates from production to online publication capable of achieving minute-level responses. Compared to other security providers, it bears the brunt of cybercriminal attacks, exhibiting the strongest security linkage effects and forming high-standard service solutions based on the accumulation of image countermeasures. For clients, there is no longer a need to worry about cybercriminals using image recognition models to automatically crack captcha image answers, leading to a series of business security issues.

How to Combat Image Recognition Models in Cybercrime

The Technology Provider’s Patent Wall

In addition to exhaustive cracking and model cracking of images, cybercriminals also target interface model cracking. In the next issue, we will continue our analysis of images to present the point of contention—protocol cracking recognition. We will introduce how cybercriminals forge response requests and parameters to directly access communication interfaces, thereby achieving their objectives.
“The Cybercrime Defense Path” provides a God-like perspective on the ongoing battle against cybercrime. If you are interested, feel free to share this; if you have other points of interest regarding cybercrime defense, please leave a comment or contact the technology provider’s editor~
How to Combat Image Recognition Models in Cybercrime

Add Eva on WeChat to join the technology provider’s reader group.

Related Reading:

[Cybercrime Defense Path 01] How to Respond to Cybercrime in Verification Image Resource Traversal

[Cybercrime Archive 02] Midnight Emergency Deployment

Friends who love technology and security are following the technology provider.

How to Combat Image Recognition Models in Cybercrime

How to Combat Image Recognition Models in Cybercrime

Leave a Comment