Model Native Operating Systems: Opportunities, Challenges, and Prospects

This article is based on the needs of users, applications, and systems in the era of artificial intelligence. It analyzes the development dilemmas of operating systems under the evolution path of “plug-in models” and proposes the construction of model-native operating systems through full-stack collaborative design of “model-system-chip.” Furthermore, it discusses the opportunities and challenges faced, as well as the relevant preliminary explorations by the industry and the author team.

Model Native Operating Systems: Opportunities, Challenges, and Prospects

Generative AI, represented by ChatGPT, is profoundly transforming human society. Large models, with their exceptional language understanding, content generation, and logical reasoning capabilities, have become a new paradigm and an important driving force for technological development. Major economies and tech giants worldwide are laying out their strategies, with the U.S. and the European Union investing over a hundred billion dollars to promote technological research and development. Companies like Microsoft, Nvidia, Apple, and Google are engaging in comprehensive competition in model architecture, systems, and chips.

However, as the core hub of computer systems, the integration of operating systems with AI technology still faces many challenges. Current mainstream solutions, such as Microsoft’s Windows Copilot and Apple’s Apple Intelligence mechanism, adopt a model as a plug-in service approach. While this solution can be quickly implemented, structural issues such as the probabilistic nature of models, the complexity of the software and hardware stack, and the fragmented state of model-system-chip lead to shortcomings in current model-based intelligent applications, including poor controllability, high development difficulty, and low computational efficiency. The ecological landscape faces issues such as “focusing on poetry rather than action,” fragmented ecosystems, and limited intelligence levels.

This article focuses on the needs of users, applications, and systems in the AI era, analyzing the development dilemmas of operating systems under the evolution path of “plug-in” AI, and proposes building model-native operating systems through full-stack collaborative design of “model-system-chip.” This new type of operating system will reconstruct interaction paradigms, interface abstractions, execution modes, and security mechanisms, breaking down data barriers between intelligent applications, promoting organic collaboration among multiple models, optimizing resource supply to enhance operational efficiency, and ultimately achieving the organic unity of the probabilistic intelligence of models and the deterministic rules of operating systems. This article will explore the opportunities and challenges faced by model-native operating systems and introduce relevant preliminary explorations by the industry and the author team.

Demand and Challenges of Operating Systems in the AI Era

Algorithms, data, and computing power are hailed as the three engines of AI development. The ubiquity of operating systems is the key link that connects and drives these three engines, serving as an important foundation for AI to move toward general intelligence and empower various industries. From the perspective of operating systems, “models,” including large models, small models, and traditional machine learning methods, are evolving into a basic underlying capability. Operating systems in the AI era urgently need to achieve native support for models to meet the intelligent demands from users, applications, and the system itself, realizing transformation in execution efficiency, intelligence levels, interaction paradigms, privacy protection, and system security.

Execution Efficiency

With the development of generative AI, the scale of model parameters is growing exponentially, and the demand for computing power is surging. To provide stronger computing support, heterogeneous, mixed-precision computing systems composed of new computing hardware such as GPUs and NPUs have become the infrastructure for current AI scenarios, with mainstream chip design manufacturers launching heterogeneous computing chips. Heterogeneous computing also promotes the emergence of new application scenarios such as virtual digital humans, which often require multiple computing hardware such as GPU/NPU/CPU to work collaboratively to support real-time interaction and intelligent performance of virtual digital humans. In this trend, the resources managed by the operating system need to shift from traditional discrete domain governance to heterogeneous integrated architecture, bringing dual challenges: (1) The integration of heterogeneous computing makes programming models and frameworks increasingly complex, necessitating the operating system to provide unified and easy-to-use programming abstractions to lower development thresholds; (2) Different hardware has unique performance characteristics and applicable scenarios, such as GPUs and NPUs being suitable for high-throughput computing of large-scale data, while CPUs are more suitable for low-latency processing of small-scale data, requiring the operating system to intelligently plan task distribution and achieve collaborative scheduling of heterogeneous hardware to ultimately enhance the overall execution efficiency of the system.

Intelligence Level

The rapid development of large models is reshaping the forms of intelligent terminal devices, potentially empowering intelligent devices with billions of parameters. Leading global smartphone, personal computer (PC), and automobile manufacturers are actively exploring the deployment of large models on mobile devices and on-board chips to achieve better privacy protection and leverage localization advantages. Google’s AI Core can run the Gemini Nano model with 2 billion parameters on smartphones, Apple plans to integrate a local large model with 3 billion parameters in iOS 18, and Huawei is also embedding AI capabilities into terminals through HarmonyOS’ native intelligence. Faced with this trend, operating systems must innovate to enable larger-scale, higher-intelligence models to run efficiently on terminal devices. However, existing model compression methods such as quantization and pruning often lead to significant performance degradation. This technical bottleneck requires operating systems to deeply understand model characteristics and explore more targeted optimization solutions rather than relying solely on simple compression and quantization methods.

Interaction Paradigm

Large models are leading a revolutionary change in human-computer interaction paradigms, evolving users from traditional graphical interfaces to natural spatio-temporal interactions such as voice, gestures, and eye movements. Many industries are actively exploring this new form of interaction; for example, OpenAI’s GPT-4o supports real-time voice conversations and can act as an intelligent tutor for homework assistance; Zhiyuan AI’s AutoGLM can understand and execute complex instructions across applications, including “online shopping,” “hotel booking,” and even “sending red envelopes.” It is foreseeable that as digitization and intelligence deepen, large models will become an important gateway for human-computer interaction, even dynamically adjusting the interaction interface based on user habits and contexts. However, current models still face many challenges in understanding user interfaces. For instance, interface elements often have diverse resolutions, proportions, and layouts, containing a large number of fine-grained texts and buttons, which makes it difficult for existing models to accurately understand interface semantics, thus limiting the effectiveness of natural language interactions.

Privacy Protection and System Security

With the widespread application of large models, issues of security and privacy have become increasingly prominent. Recently, the European Data Protection Board (EDPB) and the Italian privacy regulatory authority conducted strict reviews of OpenAI services, with Italy even temporarily suspending ChatGPT. Privacy protection of large models involves multiple layers: (1) At the data level, models may leak sensitive information during training and inference, while model parameters themselves are also important digital assets; (2) At the system level, complex software and hardware stacks inevitably have security vulnerabilities; (3) At the behavioral level, the uncertainty of model outputs may lead to unauthorized operations or deviations from expectations; (4) Current systems overly focus on performance metrics and lack systematic behavioral auditing and constraint mechanisms. This requires us to build a multi-dimensional security protection system that achieves comprehensive privacy protection across data, systems, behaviors, and audits.

Current Development Dilemmas of the “Plug-in” AI Evolution Path

The AI development path, represented by Microsoft’s Windows Copilot, provides intelligent services in operating systems in a “plug-in” manner. The Windows 11 operating system includes a dedicated Copilot window, allowing users to quickly invoke the AI assistant through taskbar icons, shortcut keys, or dedicated buttons. Copilot can not only engage in text conversations but also perform diverse tasks such as adjusting system settings, assisting in daily operations, and providing writing suggestions. However, this approach of integrating AI as a plug-in service into existing operating systems, while enabling quick implementation and user experience of intelligent services, fundamentally faces four core dilemmas: poor computing power, weak intelligence, low appeal, and security issues, hindering the deep integration of AI technology and operating systems and affecting the delivery of higher-level intelligent services to users.

Poor Computing Power: Unable to Fully Utilize the Computing Power of Heterogeneous Hardware

Currently, computing hardware is experiencing rapid iteration and diversification. On one hand, mainstream GPU hardware is continuously upgraded; on the other hand, various AI chips provide more choices in terms of energy consumption and cost-effectiveness while offering specific computing power optimizations. Scenarios such as smartphones and AI PCs on the client side generally adopt hybrid computing architectures, striving to enhance absolute computing power while reducing overall power consumption. However, when deploying large models on the client side, their parameter scale currently hovers around 7 billion, significantly limiting the system’s intelligence level. With the proliferation of intelligent applications, computing hardware resources are no longer monopolized by a few applications, necessitating operating systems to unify, efficiently, and collaboratively schedule and manage heterogeneous computing resources, thus translating the absolute computing power of hardware into the intelligence of applications and systems. The dilemma faced by plug-in intelligent services lies in the lack of native support from the operating system, making it impossible to effectively utilize and manage hybrid and heterogeneous computing resources, which may even lead to competition and conflicts over computing resources.

Poor Intelligence: Intelligent Technology Difficult to Deeply Integrate with Operating Systems

Integrating intelligent technology in a plug-in manner can only provide one-way, limited intelligent support, making it difficult to achieve deep integration with various modules of the operating system, thus limiting the level of intelligence of the system and obstructing its transition from “poetry” to “action.” Plug-in intelligent technology struggles to leverage the underlying data and resources of the operating system, greatly restricting its intelligence level. For example, Windows Copilot cannot directly access deep system data, failing to achieve deep integration with the system and has been downgraded to a Progressive Web Application (PWA). Additionally, plug-in intelligent technology faces issues such as data silos and high response latency, further highlighting the probabilistic issues of models, which severely limit the intelligence of system services such as resource allocation, task scheduling, and memory management, making it challenging to achieve deep system-level optimization.

Poor Appeal: Failed to Break Through the Fixed Interaction Modes of Traditional Operating Systems

The fixed interaction logic of traditional operating systems fails to meet the diverse preferences of user groups and the interaction needs in different scenarios. On one hand, existing interaction logic is still statically designed by developers, providing all users with the same interaction logic, which cannot offer personalized user experiences. On the other hand, the interaction logic of different applications is entirely independent, even a simple user task may involve multiple applications, forcing users to switch repeatedly between applications. The fixed interaction modes of traditional operating systems significantly limit the integration of intelligent technology into the human-computer interaction process. For example, Apple’s Ferret-UI can utilize multimodal large models to control the user interface (UI) screen to complete tasks, but constrained by the fixed interaction modes, it is still forced to switch and operate frequently between multiple applications, leading to low task completion rates, long completion times, and high inference costs.

Poor Security: Risks of System Security Vulnerabilities and Privacy Leaks

Current intelligent technologies themselves pose safety hazards such as lack of interpretability and uncertainty, while the execution modes and interaction methods of intelligent services are still rapidly evolving, with their reliability far below that of operating systems. In recent years, incidents of confidential leaks caused by using ChatGPT have frequently occurred, reflecting the security challenges faced by large models and intelligent services. Integrating intelligent services in a plug-in manner does not incorporate them into the overall security protection mechanisms of the system, which may introduce security vulnerabilities and amplify the attack surface of the operating system, inevitably affecting the security and stability of the system. If plug-in intelligent services are allowed to directly access the system and process data, they can easily be exploited as attack entry points and springboards. Conversely, if they are simply isolated, the user data used by intelligent services may face privacy leak risks.

The above dilemmas fundamentally stem from the failure of plug-in intelligent service solutions to achieve organic integration of AI and operating systems, ultimately limiting the overall level of intelligence. To truly unleash the potential of AI, deeper changes are needed at the architectural level of operating systems.

Breaking the Dilemma: Model Native Operating Systems

Models, as a basic underlying capability, will inevitably play an important role in the evolution of operating systems in the AI era, but the path of how operating systems integrate with models is an important and open question. Figure 1 compares different development paths. Figure 1 (a) represents the incremental route, which adopts the previously mentioned method of treating large models as applications plugged into existing operating systems. The advantage is minimal changes to existing operating systems, allowing for rapid implementation; the downside is the division between models and operating systems, which restricts the capabilities of models. Figure 1 (b) represents the radical route, which replaces the operating system with a large model entirely, allowing for full exploitation of the intelligent capabilities of large models; however, this approach completely disrupts the existing software landscape and ecosystem, relies excessively on the capabilities of the model itself, and lacks the ability to interact with the physical world, as well as certainty guarantees for outputs.

Model Native Operating Systems: Opportunities, Challenges, and Prospects

Figure 1 Comparison of System Architectures for Different Development Paths of Integration between Operating Systems and Models in the Intelligent Era

Unlike the previous two technical routes, we propose exploring model-native operating systems through an integration route, as shown in Figure 1 (c). The integration route is based on the idea of mutual advancement between models and operating systems, that is, the system is natively designed for models, and models are natively adjusted for the system. By exploring the service model control, resource supply model, and integration of model capabilities, we can achieve multi-level deep integration of models and operating systems. Through full-stack collaborative design of “model-system-chip,” we can reconstruct core elements such as interface abstraction, execution modes, operational efficiency, and security mechanisms, achieving the organic unity of probabilistic intelligence and deterministic rules, enhancing the intelligence level of systems in areas such as smartphones, PCs, general-purpose robots, and intelligent manufacturing, and providing users with smoother, smarter, and more personalized services. Here, we will elaborate on our thoughts on model-native operating systems from six dimensions.

Thought One: Intelligent Interaction Paradigm (Towards Interaction Paradigm Innovation)

Large models are reshaping the interaction between operating systems and users. Traditional operating systems adopt direct user-facing interaction designs, where users control hardware through command lines, graphical interfaces, voice, and other interfaces. In the era of model-native systems, users will increasingly control devices through interactions with intelligent agents. However, the interaction framework of current operating systems mismatches with intelligent agents on multiple levels. At the interface level, existing frameworks primarily rely on graphical interfaces to express functional semantics, while large models still have significant shortcomings in understanding UI accuracy and certainty; at the logical level, traditional frameworks require developers to statically design the interaction logic for each application, making it difficult to support large models in providing users with dynamically personalized interaction experiences; at the capability level, the fragmented state of data and functions between applications hinders operating systems from leveraging large models to provide intelligent services across applications.

To address these challenges, model-native operating systems need to innovate collaboratively on multiple levels. At the interaction interface level, new types of operational interfaces oriented toward models should be designed so that models can accurately and efficiently invoke system functions; at the interaction logic level, a development framework supporting multi-modal generative interaction logic should be provided, allowing developers to dynamically construct natural spatio-temporal interactions using large models; at the interaction capability level, a data intercommunication mechanism of “system-model-application” should be established to break down barriers between applications and support context-based intelligent interactions.

Thought Two: Innovative System Abstract Interfaces (Towards Improving Execution Efficiency)

With the integration of models and operating systems, traditional operating system abstractions face challenges. Existing interfaces are designed for imperative services, making it difficult to meet the needs of intelligent agents: (1) They cannot fully express application semantics or achieve end-to-end intelligent optimization; (2) They lack native intelligent capabilities, increasing development burdens and limiting cooperation among intelligent agents. At the same time, existing interfaces struggle to expose underlying system services and heterogeneous hardware capabilities, leading to performance losses.

To address these challenges, operating systems must introduce intelligent abstractions that support dynamic and flexible expressions from single commands to end-to-end requirements. At the system level, support for multi-level system service interfaces should be provided, with high-level interfaces serving traditional applications and programs, while low-level interfaces optimize high-performance intelligent applications, fully balancing the intelligence of applications and the real-time nature of systems. Additionally, historical behaviors and model intelligence should be leveraged to utilize the new features of generative interfaces to achieve self-optimization and intelligent evolution.

Thought Three: Inherent Intelligence in the System (Towards Improving Intelligence Levels)

The current capabilities of intelligent agents are limited to individual application scopes, with each application achieving intelligence through plug-in models. For example, users expect intelligent agents to automatically complete scheduling, which requires collaboration across multiple applications such as calendars, meetings, and emails. However, current domain-specific dedicated models exhibit significant deficiencies in adaptability and robustness. A typical case is the current smartphone intelligent assistants, which can execute simple shopping tasks, but once faced with untrained pop-ups and other distractions, they easily interrupt operations.

Therefore, model-native operating systems should possess inherent intelligence, organically integrating the capabilities of intelligent agents from different applications to achieve seamless coordination and complex collaboration across applications. This requires constructing a universal foundational model for operating systems, providing system-level intelligent agent services, and achieving deep integration of model capabilities and system functions to support seamless collaboration across applications; simultaneously, an efficient continuous learning mechanism should be established to explore new training methods for inherent intelligence in operating systems, achieving efficient and low-cost continuous training and fine-tuning to better meet users’ diverse intelligent needs.

Thought Four: Intelligent Knowledge Storage (Towards Improving Intelligence Levels)

The intelligence level of models is closely related to knowledge-based data, while existing storage systems mainly focus on data rather than knowledge, making it difficult to meet large models’ needs for knowledge generation, management, and utilization. In the vertical dimension, existing systems emphasize accessing raw data in design, lacking efficient support for data semantic hierarchy and knowledge representation; in the horizontal dimension, the fuzzy input of model applications makes the association of multi-modal and multi-type storage systems tighter, while heterogeneous expression forms pose challenges for cross-application horizontal data flow.

To address these challenges, at the vertical level, systems should maximize the use of soft-hardware collaborative communication mechanisms for knowledge storage, providing programming abstractions that evolve with hardware; at the horizontal level, systems should achieve data intercommunication and collaboration among different storage systems through higher integration data structures and communication mechanisms, providing models with cross-application data fusion capabilities, making data collaboration in tasks such as multi-modal more efficient, thus enhancing the intelligence level of models in complex scenarios.

Thought Five: Efficient Computing Power Supply (Towards Improving Execution Efficiency)

Models have extremely high demands for computing power. However, when providing computing power for models, model-native operating systems cannot only consider the needs of the models themselves but must also comprehensively consider multiple key indicators such as computing power, memory, power consumption, and intelligence levels that are interrelated. The current heterogeneous computing hardware system provides critical computing support for intelligent applications, but existing operating systems still manage computing hardware in a special peripheral manner, with intelligent applications relying mainly on specialized adaptations and designs to utilize specific computing hardware, leading to severe fragmentation of the system and significantly limiting the development and popularization of intelligent applications.

To address these challenges, optimization strategies need to be implemented on multiple levels. At the model level, research should focus on how to reduce the number of parameters while maintaining model performance; exploring methods for dynamically adjusting model structures, and proposing targeted model fine-tuning and lightweight design for complex scenarios such as intelligent agents. At the system level, efficient inference solutions should be explored for scenarios with limited resources on the client side, including model sparsification and speculative inference tailored to the operating system’s target tasks; at the same time, new computing offloading schemes should be designed to fully utilize computing resources from powerful devices while ensuring user data privacy and maintaining transparency for applications. At the hardware level, exploring heterogeneous computing architectures that provide dedicated processing units for different types of tasks and achieve collaborative scheduling and unified management to improve overall efficiency is essential.

Thought Six: System Security and Reliability (Towards Privacy Protection and System Security)

As the application of large model systems becomes increasingly widespread, the challenges regarding privacy security and reliability are also becoming more prominent. In terms of data privacy, on one hand, large models may leak privacy data and sensitive information when accessing and generating data; on the other hand, the parameters of large models themselves are also important digital assets that need protection. In terms of reliability, the outputs of large models possess uncertainty, which may lead to unauthorized operations or deviations from expected behaviors; current systems often focus on enhancing performance and accuracy metrics while lacking effective behavioral auditing mechanisms and constraint technologies.

To address these challenges, in terms of data security, full lifecycle encryption protection for the parameters and data of large models should be implemented during “storage-transmission-computation,” strictly controlling access permissions during data flow, and constructing a lightweight, customized trusted AI software stack to reduce the attack surface and enhance overall system security; in terms of reliability, an inherent security auditing mechanism oriented toward intelligent applications should be proposed, dynamically constraining the behavior of models through flexible rules, effectively enhancing determinism while ensuring the intelligence level of models.

Current Explorations and Practices

Model-native operating systems, as an emerging technological paradigm, are becoming the key to transitioning large AI models from “poetry” to “action.” Although there has not yet been a mature product that fully realizes this design concept, preliminary explorations have already begun in academia and industry both domestically and internationally.

Andrej Karpathy, co-founder of OpenAI, proposed the LLM OS concept in November 2023, radically replacing operating systems with large language models (LLM as OS). The AIOS proposed by Rutgers University is a preliminary practice of the LLM OS concept. However, such explorations mainly face two limitations: first, they propose a completely new interaction paradigm and software development model, making it difficult to effectively integrate with the existing software ecosystem; second, they treat models as black boxes without optimizing for model inference efficiency, intelligence levels, and other aspects. As a compromise solution, SWE-agent proposed the Agent-Computer Interface (ACI), which improved the problem-solving capabilities of large model agents in software engineering tasks from 3.8% to 18%, validating the importance of collaboration between operating systems and large models.

In addition to innovations at the system architecture and programming interface levels, the industry is actively exploring operating system agent technologies. Systems such as Anthropic’s Computer Use in the U.S. and Zhiyuan’s AutoGLM in China have achieved intelligent interface operation automation by deeply integrating cloud-based large models with graphical user interfaces (GUIs), demonstrating near-human-level operational capabilities in office applications, social media, online shopping, and other scenarios. However, current plug-in multimodal large model solutions still face challenges in handling dynamic UIs and complex operations. Additionally, issues related to network latency and privacy security brought about by reliance on cloud-based large models also need urgent resolution.

To support large model inference on edge devices, various edge large model inference frameworks have been proposed in the industry, such as ExecuTorch, llama.cpp, and MLC LLM. However, these systems have limited utilization of heterogeneous computing resources, and edge devices typically can only run small models with no more than 10 billion parameters, restricting the performance level of model-native operating systems. To address this issue, Shanghai Jiao Tong University launched the PowerInfer series of work, exploring the “model-system-chip” collaborative design concept for the first time, achieving significant acceleration for large models on edge devices. This solution adopts the Turbo Sparse method at the model level to enhance model sparsity and optimizes operating system inference design at the system and chip levels, achieving techniques such as heterogeneous computing scheduling and neuron cluster pipelining. Experimental results indicate that PowerInfer-1 achieves 11.7 times acceleration on personal computers and can run models with 175 billion parameters; PowerInfer-2 enables smartphones to smoothly run models with 47 billion parameters, with a performance improvement of 27.8 times.

In addition to addressing edge inference performance issues, model-native operating systems also need to resolve the efficient scheduling of intelligent applications at the hardware accelerator level. Currently, commercial GPUs lack effective preemptive scheduling mechanisms, and real-time tasks face a dilemma: either monopolizing GPU resources, increasing costs; or waiting for low-priority tasks to complete, affecting performance. The Shanghai Jiao Tong University team was the first to implement task preemption within a hundred microseconds on commercial GPUs and achieve fine-grained resource sharing, improving overall throughput by 7.7 times compared to dedicating the GPU to real-time tasks, while reducing latency interference by 99% compared to sharing the GPU among multiple tasks. With the emergence of various hardware accelerators, system-level support has become a key competitive factor. The team further proposed a universal abstraction for hardware accelerator scheduling and a multi-layer hardware model, capable of rapidly supporting preemptive scheduling of various architectures, brands, and generations of hardware accelerators while developing hardware-agnostic universal scheduling strategies.

Conclusion

Model-native operating systems are an important development direction for operating systems in the AI era, requiring theoretical innovation in the deep integration of models and systems. This integration is not a simple functional overlay but requires comprehensive innovation from architecture to implementation, thereby promoting the intelligent evolution of operating systems. Although academia and industry have made a series of preliminary explorations in architecture design, performance optimization, and scheduling mechanisms, to realize truly usable product-grade systems, further in-depth research is still needed in areas such as interaction paradigm innovation, model inference efficiency, system integration levels, and ecosystem evolution strategies.

(The content of this article is an expansion based on the report “Some Thoughts on Model Native Operating Systems” presented at the 2024 China Computer Conference (CNCC2024) at the forum on “Basic Software for Large Models.”)

Model Native Operating Systems: Opportunities, Challenges, and Prospects

Scan to watch related videos

Chen Haibo

CCF Fellow, Vice Chair of the System Software Committee, Executive Member of the Open Source Development Committee. Distinguished Professor at Shanghai Jiao Tong University, ACM/IEEE Fellow. His main research areas include operating systems, distributed systems, and machine learning systems. [email protected]

Xia Yubin

CCF Outstanding Member, Executive Member of the System Software Committee, Executive Member of the Open Source Development Committee, Assistant Director of the Education Working Committee. Professor at Shanghai Jiao Tong University. His main research areas include operating systems and architecture. [email protected]

Chen Rong

CCF Outstanding Member, Editorial Board Member of CCCF. Professor at Shanghai Jiao Tong University. His main research areas include operating systems, distributed systems, and intelligent computing systems. [email protected]

Other Authors: Wang Zhaoguo, Mi Zeyu, Gu Jinyu

Leave a Comment Cancel reply