Cursor's Rise: The Future of AI Programming

The B2B markets in China and the US are fundamentally different.

Author: Liu Yangnan

Editor: Hai Yao

Image Source: Wenxin Yige

In 2021, Microsoft launched GitHub Copilot, which quickly became the most sought-after AI tool in programming.

GitHub Copilot can automatically generate complete code functions based on context information provided by users, such as function names, comments, and code snippets, earning it the title of a “game changer” in the programming world.

The reason for its impressive performance is its underlying access to OpenAI’s Codex model, which has 12 billion parameters and is an early version of GPT-3, specifically optimized for coding tasks. This large parameter model, based on the Transformer architecture, has truly emerged in the coding field for the first time.

GitHub Copilot ignited a global enthusiasm for AI programming, and four MIT undergraduates gathered together, aspiring to change software development, founding a company named Anysphere in 2022.

Anysphere once openly challenged Microsoft, claiming the company was its main competitor. Anysphere co-founder Michael Truell stated that although Microsoft’s Visual Studio Code dominates the integrated development environment (IDE) market, Anysphere saw an opportunity to offer different products.

Michael Truell (far right)

Microsoft probably did not expect that in less than three years, this relatively unknown small team would throw a heavy “bomb” into the industry, sparking a new wave of AI programming enthusiasm globally, with the company becoming a unicorn valued at $2.5 billion within just four months.

How Cursor Stood Out

In August 2024, Tesla’s former AI director, Andrej Karpathy, tweeted multiple times praising a code editor named “Cursor,” claiming it has overwhelmingly surpassed GitHub Copilot.

In the same month, the company behind Cursor, Anysphere, completed a $60 million Series A financing round, with a valuation of $400 million.

Cursor's Rise: The Future of AI Programming

The brilliance of Cursor lies in its features such as multi-line editing, cross-file context completion, questioning, and next action prediction. Developers only need to keep pressing the Tab key to automatically complete the entire file’s code modifications, and Cursor’s processing results are more accurate and faster, with almost no noticeable delay.

Those who understand programming know how deep this matter is.

“Cross-file multiple completions and predictions are very subtle needs that developers might find hard to express accurately themselves, but once they use it, they feel very ‘smooth’,” said Zhang Hailong, founder and CEO of Gru.

Tom Yedwab, who has decades of development experience, also shared in an article that the Tab completion feature aligns best with his daily coding habits and saves the most time. “This tool feels like it reads my mind, predicting my next actions, allowing me to focus less on code details and more on building the overall architecture,” wrote Tom Yedwab.

The key to Cursor’s success is not how high the technical barriers are, but rather their early identification of a subtle new need and their courage to gamble on a path never taken before.

Cursor is built on top of VS Code, which is a free, open-source, cross-platform code editor developed by Microsoft, equipped with some basic code completion features.

Previously, developers would create various plugins to extend the functionality of VS Code, but VS Code’s own plugin mechanism has many limitations. For example, when handling large projects, some plugins may slow down code indexing and analysis; for some complex plugins, the configuration process can be cumbersome, requiring users to manually modify configuration files, which inadvertently increases the usage threshold.

Therefore, to eliminate these limitations, the Cursor team adopted a bold approach: instead of creating plugins in the traditional way on VS Code, they “magically modified” the code of VS Code, achieving compatibility with multiple AI models at the underlying level and optimizing the entire IDE user experience through extensive engineering.

Zhang Hailong stated that in the early stages of Cursor’s development, many practitioners, including himself, were skeptical because this path was very difficult, a huge “non-consensus.” The internal architecture of VS Code is complex, involving multiple modules such as code editing, syntax analysis, code indexing, and plugin systems, and different versions of VS Code may have differences, making compatibility a concern during the “magical modification” process. Moreover, when embedding multiple AI models into VS Code, challenges arise regarding the interaction between models and the editor, such as how to effectively pass code context to the model, how to handle the model’s output and apply it to the code, and how to minimize the latency of code generation.

Resolving a series of issues involves a complex engineering optimization system. In just 2023, Cursor underwent three major version updates and nearly 40 functional iterations. This is a significant test of patience for both the entire R&D team and the investors behind the company.

Ultimately, Silicon Valley once again proved its ability to nurture disruptive innovation. Cursor’s success is a classic Silicon Valley startup template: a group of obsessive tech geeks, harboring grand visions, bravely venturing into uncharted territory with the support of Silicon Valley’s mature VC system, facing countless doubts, and being the first to take the plunge, ultimately achieving remarkable success through their product.

“This is the charm of entrepreneurship; even such an ‘unreliable’ project made it,” Zhang Hailong remarked.

Recently, Anysphere announced the completion of a $100 million Series B financing round, with a valuation of $2.6 billion. According to Sacra’s estimates, by November 2024, Cursor’s annual recurring revenue (ARR) reached $65 million, a year-on-year increase of 6400%. Since its establishment in 2022, Anysphere has only 12 employees.

Copilot Clear, Agent Confused

Cursor is not the first product to break into the AI programming field.

In March 2024, Devin, touted as the “world’s first AI programmer,” emerged, igniting the industry’s enthusiasm for AI programming for the first time.

Devin is an Autonomous Agent with full-stack skills, capable of self-learning, end-to-end building and deploying applications, debugging itself, and even training and fine-tuning its own AI models. The company behind it, Cognition AI, is also a shining “dream team” in the AI space.

However, the initial announcement of Devin was just a demo, and developers could not experience it hands-on. It wasn’t until December 11, 2024, that Devin officially launched, with a monthly subscription fee of $500. In contrast, Cursor’s monthly subscription fee of $20 seems much more affordable.

Cursor's Rise: The Future of AI Programming

Compared to the widespread love for Cursor, developers’ evaluations of Devin have been contentious. Some believe Devin performs excellently in handling code migration and generating pull requests (PRs), significantly reducing repetitive work for developers; however, some users point out that Devin still requires substantial human intervention when dealing with complex business logic, especially when project documentation is insufficient or code quality is poor.

Zhang Hailong stated that the fundamental reason for the difference in reputation between Cursor and Devin lies in the different failure rates and costs of failure experienced by developers using the products.

Currently, the failure rate in Copilot scenarios is relatively low, with the corresponding evaluation HumanEval accuracy approaching 100%, while the evaluation SWE benchmark for Agent scenarios is still below 60%.

Moreover, the results of AI’s work require human acceptance and confirmation. The interaction mode of Copilot-type products results in a low cost for developers to review AI-generated results, and the cost of modifying or rejecting after failure is also low. However, for Agent-type products, the confirmation cost for users is significantly higher than that of Copilot, and the cost of modifications after failure is also higher.

The different trajectories of Cursor and Devin reflect, to a large extent, the current status of Copilot and Agent product forms in general scenarios.

Cursor represents Copilot, where AI and humans need to work synchronously, with humans in the lead and AI as support.

Currently, Copilot is the one that has truly achieved product-market fit.Copilot can be embedded in IDEs like VS Code as a plugin, assisting human developers in completing various coding tasks. After the emergence of GitHub Copilot, users have gradually become accustomed to the collaborative form of Copilot. The introduction of GPT-3.5 has transformed Copilot from a demo into a usable product.

However, Zhang Hailong once wrote about the “hidden worries” of Copilot-type products. “The real moat is VS Code. VS Code has evolved from a simple editor into a platform. Users can easily migrate from GitHub Copilot to Cursor because they both reside within VS Code, and the users’ habits, experiences, functionalities, and plugins are entirely the same. Cursor also proves that there is no ‘data flywheel’ for the Copilot product; the data you can access is available to large models and has already become part of the model.”

In contrast, Agent is a new species spawned by GPT-3.5, a concept that can more readily stimulate the sensitive nerves of entrepreneurs and VCs. Devin represents the Agent form, requiring AI and humans to work asynchronously, with AI having stronger initiative and the ability to autonomously complete part of the decision-making and execution.

Zhang Hailong believes that Agents present opportunities for entrepreneurs. However, he is not optimistic about the all-encompassing Agent vision advocated by Devin, “Doing everything means doing nothing well; specialized field Agents have higher application value.”

However, due to the early-stage nature of the Agent concept, various companies are exploring, and the parasitic environment and capability boundaries of Agents are still unclear, with various entrants in areas like code generation, code completion, unit test generation, and defect detection.

Gru chose to enter the unit testing segment. Before officially launching the product, Gru also went through a trial-and-error period, attempting automatic file generation, bug fixing, E2E testing, and other directions, but was unable to advance due to model capabilities and software post-iteration and maintenance pain points.

Ultimately, Gru identified unit testing as a common yet understated need. Zhang Hailong stated that many developers dislike writing unit tests because it is tedious. Moreover, for projects with low requirements, unit testing is not a must-have in software engineering. However, Gru believes that from a technical capability standpoint, AI products must address the continuity of business context and engineering context, and unit testing is the segment that depends the least on both contexts and aligns best with current model capabilities.

However, whether Copilot or Agent, both are means and not ends, and the two are not mutually exclusive but will coexist to solve different problems.

For many individual developers and some small and medium-sized enterprises, general products like Cursor or some open-source models may suffice to meet most needs; however, for many large enterprises and complex business scenarios in different fields, it is difficult to simply meet the requirements through a specific “Copilot” or “Agent” type of general product, necessitating stronger domain service capabilities from technology vendors.

And therein lies the opportunity for domestic AI programming companies.

Opportunities in Vertical Fields Domestically

Looking back at 2024, AI programming is undoubtedly one of the hottest investment directions in Silicon Valley, having produced unicorns like Cursor, Poolside, Cognition, Magic, Codeium, and Replit.

In contrast, major domestic internet companies and large model vendors have basically launched their own “code models,” but very few entrepreneurial projects have developed well. According to reports from Silicon Star, last year, Qiji Chuangtan invested in six startups in the AI programming field, but nearly all of them collapsed soon after, and most of the more than ten code teams that briefly surfaced last year have exited the market this year.

After the emergence of ChatGPT, Qingliu Capital looked at dozens of projects in the AI programming track but ultimately only invested in one, Silicon Heart Technology (referred to as “aiXcoder”).

Cursor's Rise: The Future of AI Programming

For domestic AI programming projects, many opinions suggest that the products are relatively “shallow.” “Developers in the community complain that many products generate code in minutes, but they spend half a day or more debugging,” said Liu Daoquan, founder and CEO of Shizhi AI.

The superficiality of products is a reflection of the environmental differences that have formed in the B2B markets of China and the US over the years. Zhang Hailong analyzed the reasons as threefold: the large number of junior programmers in the US, coupled with higher labor costs, makes it beneficial for companies to introduce AI products to significantly reduce costs; the US SaaS market has already validated the PLG model, leading to a strong willingness to pay for general products; and the exit paths in the foreign B2B market are clear, with strong investor interest, a very clear logic for first-round funding, and a highly active angel investor community, allowing startups to nearly always secure first-round funding to validate their ideas.

Zhang Hailong has also spent many years in the domestic B2B market, having worked in open-source communities and SaaS. In his view, the technological wave of large models will not change the current state of the domestic B2B market. “The difference may just be the technology being sold; in the cloud computing era, we sold cloud services, and now with AI, we sell AI.” he said.

So this time, he wants to venture into the overseas market. However, while Gru is Zhang Hailong’s fourth startup, it is his first in Silicon Valley. Upon arriving in Silicon Valley, he was struck by a strong sense of unfamiliarity. “For the first time, I physically felt like I didn’t know anyone,” Zhang Hailong said. Throughout 2024, he spent half his time in Silicon Valley, actively socializing, attending various events, and trying to meet as many people as possible in a shorter time.

In September 2024, Gru launched Gru.ai and ranked first with a high score of 45.2% in the swe-bench verified evaluation released by OpenAI. Zhang Hailong clearly felt that with the product, it became easier to be accepted in Silicon Valley.

However, for the domestic B-end market, the age-old issues still persist. “It’s quite difficult to do B2B in China, as the sales chain involved is quite long, and the ultimate buyers are mostly large enterprises, but sometimes large enterprises won’t buy just because your product is good,” said Liu Daoquan. Qingliu Capital investment manager Fu Rui also stated, “Many enterprises have a lot of security and compliance requirements internally, and due to concerns about information leakage risks, they cannot use cloud-based products and require locally deployed code tools.”

Therefore, domestic AI programming companies must have their feet firmly planted in the ground, solving specific problems across various industries.

“In the actual implementation process, models must consider business continuity. From the perspective of evaluation results, domestic code models have shown performance improvements, but in specific application scenarios, specific analyses are required,” Liu Daoquan stated. After communicating with an industrial manufacturing company, it was found that some software systems used in industrial scenarios do not employ common languages like Python or C++, but rather specific coding tools, which requires technology vendors to make targeted adjustments to their products.

This is not a unique demand for industrial scenarios; every industry has its own characteristics, and each company has specific business logic and engineering systems, which requires AI programming companies to have stronger domain service capabilities.

After studying dozens of companies, Fu Rui found that: “For various software development needs, the functionalities of AI programming, in addition to code generation, at least include search, defect detection and repair, testing, and a series of tasks; besides functionalities, it is also necessary to consider how to integrate these capabilities with the clients’ business logic, allowing the model to possess deeper domain knowledge, which indeed has a high threshold.”

Therefore, Qingliu Capital is more optimistic about the approach of deeply coupling models and products with the private knowledge, data, and software development frameworks of enterprises, having invested in aiXcoder in September 2023.

“In this validated demand, aiXcoder is the team that is most aligned technically and commercially. Additionally, many core members of the company’s commercial team have over ten years of experience serving large B clients both domestically and internationally, providing deep insights into customers and the market. In the second quarter of 2023, they proposed a ‘domain-oriented’ implementation plan, which states that AI programming should deeply couple with enterprises’ private knowledge, data, and software development frameworks, and this strategy has received recognition from numerous leading enterprise clients based on the actual implementation results.” Fu Rui stated.

aiXcoder, incubated by the Software Engineering Research Institute of Peking University, is one of the earliest teams globally to apply deep learning technology to code generation and understanding, as well as one of the first teams to apply deep learning in programming products. The team has published over 100 papers in top international journals and conferences, many of which are the first papers and the most cited papers in the field of intelligent software engineering.

aiXcoder’s business partner and president Liu Dexin stated that when targeting B-end privatized deployment scenarios, the general large models have not learned the private domain data, resulting in a lack of deep integration with the internal business needs, industry standards, software development frameworks, and operating environments of enterprises, failing to incorporate domain background knowledge such as requirement analysis and design documents into model training, leading to generated or completed code lacking specificity and reliability at the business logic level.

The outcome of this is that the accuracy and usability of large models in enterprise deployment applications fall below expectations. “Many large models perform commendably in general scenarios or mainstream evaluation sets, achieving accuracy rates of up to 30%, but during internal deployment in enterprises, the accuracy rates often plummet to below 10%. Conventional fine-tuning methods also struggle to meet the expectations of enterprises. Therefore, acquiring and mastering ‘domain knowledge’ is the key to the successful deployment of AI programming systems in enterprises. Addressing domain-specific issues for enterprise clients is our differentiated value proposition,” Liu Dexin stated.

To address the aforementioned pain points, aiXcoder conducts targeted incremental training based on the various internal data provided by enterprises—this includes code, business documents, requirement documents, design documents, testing documents, as well as industry-specific terminology and process specifications, and technical standards and regulations. In addition to model training, it also integrates with multiple Agents, RAG, software development tools, and an “engineering-oriented prompt system” that aligns with the enterprises’ software development frameworks, thereby enhancing code generation quality and overall R&D capabilities.

In terms of delivery, Liu Dexin stated that domain-oriented solutions are not equivalent to traditional highly customized project-based delivery. aiXcoder extracts capabilities and tools with general value from the clients’ personalized needs, forming standardized products and processes to deliver to clients; at the same time, aiXcoder maintains high-frequency communication with clients through regular meetings, not only assisting clients in solving periodic issues but also continuously iterating products based on the clients’ genuine needs.

There Have Been Too Many “The Wolf Is Coming” in the AI Industry

From a results-oriented perspective, whether targeting small B or large B, whether training models or not, whether doing Copilot or Agent, there may not be an optimal answer; each needs to be determined based on the actual needs of clients and the resources of the entrepreneurial team.

Regardless of which path is taken, AI programming companies have a straightforward goal: to improve software development efficiency. However, the current market is still in its early stages, correctly guiding client needs is a challenge that every company entering the field must face.

Zhang Hailong admitted that the biggest challenge currently is how to make clients recognize the value of specialized Agents. “Even in Silicon Valley, many potential clients react with skepticism when they hear about new AI products, rather than excitement. This is because there have been too many ‘The Wolf Is Coming’ stories in the AI sector, with many unusable demos produced.” Currently, Gru is investing a lot of effort in engaging with clients to build a reputation among seed users, which will become the foundation for large-scale commercialization later on.

For the domestic market, the demand side of AI programming systems also needs to clarify its needs and the capability boundaries of models. “Currently, AI programming systems driven by large models have promising prospects for enhancing software productivity,” Liu Dexin stated. “To truly realize the value of this technology in an enterprise environment, it is essential to deeply integrate the code large model with the enterprises’ domain knowledge and continuously iterate and validate it within specific business scenarios.”

In fact, the development of large models has reached a point where market sentiment has largely returned to rationality, but noise still exists. For example, in 2024, information related to large model tenders is often seen, but some of that data may be misleading.

“The ecological division of labor is relatively clear abroad, but many projects targeting B in China ultimately turn into tenders, with many enterprises competing fiercely for bids.” Liu Daoquan stated. However, in the AI programming field, from publicly available tender information, even several large companies have not secured many orders.

The reason is that winning a bid does not equate to the model or product being successfully implemented.

On one hand, the personnel responsible for procurement in many purchasing parties are often not the same individuals using the products, which can lead to a disconnect between procurement decisions and actual business needs. On the other hand, these implementations often rely on standardized products paired with fine-tuning methods, without in-depth domain-specific training and adaptation to the enterprises’ business scenarios and internal logic, which may lead programmers to find the results unsatisfactory during use.

An industry insider revealed that currently, tenders for hardware orders often reach millions, while pure software orders, such as intelligent software development and code assistants, are generally around 300,000. Many enterprises find that their purchases do not solve the problem and must seek more suitable vendors, leading to resource wastage.

However, after filtering out the falsehoods, some consensus is beginning to form. More and more enterprises are realizing that decoupling products from model capabilities is the trend.

In the first half of 2024, Zhang Hailong recognized that as model capabilities become stronger, the programming abilities of various models will converge, and products should no longer be tailored to model capabilities but should be designed to be “model-agnostic.” “Starting from the first half of 2024, we basically stopped making specific optimizations for different models and instead focused on enhancing the capabilities of our product architecture, allowing any model on the market to be integrated as long as it passes our benchmark tests,” Zhang Hailong stated.

Liu Dexin also emphasized: “Enterprise clients should fully recognize the importance of business continuity and should not be bound by any single large model vendor. Currently, merely procuring standardized products is unlikely to genuinely meet the large model deployment needs of enterprise clients. Enterprises need to achieve decoupling in large models, data layers, domain-specific and engineering aspects, allowing them to flexibly choose models and service providers that better fit their needs. The key is to effectively address the practical issues of software development in enterprises.”

As an industry third-party perspective, Liu Daoquan believes that in the future, integrating models is just one part of the industry’s deployment. “There is still a considerable distance from models to applications. If technology vendors can standardize the capabilities from models to the last 95-99 kilometers, the remaining 1-5 kilometers can be handled by the application party themselves.”