The Disruption of Graphics Technology by AI in Three Steps

Jensen Huang and Mark Zuckerberg recently discussed the value of AI technology at SIGGRAPH. It seems that SIGGRAPH has largely been taken over by AI, which is a leading conference in graphics technology, a trend that started last year…

During this year’s SIGGRAPH (Special Interest Group on Graphics) conference, Jensen Huang (NVIDIA CEO) and Mark Zuckerberg (Meta CEO) mentioned that every business and individual will have their own AI agent in the future. Mark compared the AI agents for businesses and individuals to email, websites, and social networks. He believes that AI agents will become such labels for businesses or individuals.

Jensen Huang took “Jensen AI,” his own AI, as an example, “Injecting all the content I’ve written and said into Jensen AI, fine-tuning it based on how I answer questions.” “Over time, through reasoning and accumulation during use, Jensen AI can become a truly outstanding personal assistant and companion, answering questions and providing ideas.”

The importance of generative AI is evidently no longer a matter of debate. However, it is important to note that this discussion took place at SIGGRAPH. Historically, SIGGRAPH has focused on graphics and imaging technology, and the significant “invasion” by AI technology began last year — and as soon as it was “infiltrated,” AI took center stage.

The Disruption of Graphics Technology by AI in Three Steps

Jensen Huang stated at the beginning of the discussion that SIGGRAPH is now an important conference related to computer graphics, AI, robotics, simulation, and other technologies. This year, all 20 papers published by NVIDIA at SIGGRAPH are related to generative AI and simulation. To some extent, although SIGGRAPH has existed far longer than NVIDIA, the role change of NVIDIA’s GPU products reflects the shift in the themes discussed at SIGGRAPH.

Fortunately, the academic and technical presentations at SIGGRAPH largely focus on the intersection of AI and computer graphics, especially AI for Graphics. With NVIDIA’s product announcements at this year’s SIGGRAPH and the two fireside chats featuring Jensen Huang, we will discuss the intersection of AI and the graphics world, and how AI technology is having a profound impact on graphics.

From the perspective of graphic designers, artists, and developers actually using generative AI tools, we can roughly summarize these works into three steps.

First, Bring Generative AI to OpenUSD

Based on our history of participating in NVIDIA media events, if we were to list the ways AI has helped or transformed graphics technology, we could at least provide the following two major directions:

(1) From the Perspective of Content Presentation, with AI super-resolution represented by DLSS, frame generation, ray reconstruction, and other image enhancement technologies. Previously, Jensen Huang mentioned that in gaming today, for every 8 pixels a player sees, only 1 pixel is rendered, while the other 7 are generated by AI.

As a result, the frame rate and clarity of games and professional visual designs have improved due to AI. Just like the recently popular game “Black Myth: Wukong,” without AI super-resolution and frame interpolation, most players would probably not be able to play a game with such outstanding visual quality.

(2) From the Perspective of Content Creation, leveraging generative AI for 3D content generation. Now, domestic digital artists have begun to use Stable Diffusion for texture generation; some CG creators have built new workflows using text-to-image and image-to-3D to assist in generating certain 3D models in animations — completely overturning the traditional workflow of creating concept art, orthographic views, and modeling.

NVIDIA’s RTX Remix tool for game modders also includes certain features that can be categorized here: not only adding ray tracing and DLSS features to old games; based on outdated game visuals, generative AI can infer modern materials and textures, replacing them and reshaping high-definition and enhanced textures.

Additionally, a representative application for 3D content generation is NVIDIA Picasso, released last year — which features a text-to-3D service (based on Edify 3D models) that can generate relatively detailed 3D geometries based on textual descriptions.

It is worth mentioning that the 3D models generated by Picasso, as well as NVIDIA’s metaverse Omniverse, are expressed in USD format. USD can be compared to HTML for 3D graphics; it is a standard for expressing 3D data. It was first proposed by Pixar and is widely used in 3D animation and CG fields.

In August last year, the OpenUSD Alliance (AOUSD) was established, with initial members including Pixar, Adobe, Apple, Autodesk, NVIDIA, and the Linux Foundation. The existence of OpenUSD allows design tools from different vendors to adopt a unified language to express the graphics world, achieving interoperability between content. It is easy to see why the Omniverse, as part of the metaverse, and more collaborative design tools would choose OpenUSD.

From a certain perspective, OpenUSD has also become an open interface between different ecosystems. For example, NVIDIA announced early this year a collaboration with Apple to transfer RTX-rendered images to Vision Pro, and the Omniverse SDK for Vision Pro developers has also entered an early access phase. Previously, NVIDIA also collaborated with developers like Kantana, PTC, and Rockwell Automation, allowing for physically accurate AR rendering in Vision Pro…

In other words, as a language for expressing 3D graphics, developing generative AI based on OpenUSD can solve a wider range of problems. This year, NVIDIA specifically mentioned applying OpenUSD to two new fields: robotics and computational fluid dynamics (CFD) simulation — the latter should be related to industrial design engineering.

Regarding robotics, NVIDIA has built a “connector” to connect OpenUSD to URDF (Universal Robotics Data Format), which is the most widely used robot model format, making it compatible with OpenUSD — although this work is still in its early stages; regarding CFD simulation, it naturally involves rendering in OpenUSD format. These two efforts are of considerable value for advancing the OpenUSD and Omniverse ecosystem.

More importantly, NVIDIA first introduced several Omniverse Cloud APIs at last year’s SIGGRAPH, including ChatUSD, RunUSD, and DeepSearch. ChatUSD is a conversational AI that can generate USD Python code based on natural language dialogue, creating objects in a 3D scene and placing them within that scene; RunUSD checks the compatibility and usability of the USD content written by developers; DeepSearch, as the name suggests, can be used to search for 3D asset data…

This year at SIGGRAPH, there were three important releases corresponding to these — making their capabilities no longer exist solely as Omniverse Cloud APIs: the newly released models are now available in the form of NIM (NVIDIA Inference Microservice), currently previewed on the NVIDIA API Catalog.

These three new generative AI tools are: USD Code — a generative AI model that understands OpenUSD language, including the generation of geometry, materials, physics, and spatial elements, capable of answering questions about OpenUSD and generating OpenUSD Python code;

USD Search, which can search in large 3D and image databases using natural language or image input; USD Validate, which checks the compatibility of files based on OpenUSD standards and releases, and generates RTX-rendered path-traced images (via Omniverse Cloud API).

Then Package It into the NIM Container

These three models and capabilities are also provided to developers in the form of NIM microservices. Regarding what NIM is, we have previously explained it multiple times. NIM itself contains a lot of things; in addition to AI models, it also includes various dependencies, software stacks, and related optimizations, providing standard API interfaces for developers.

In summary: NIM is a comprehensive solution for developers to easily deploy generative AI, “a simple microservice that can be embedded into existing product platforms to achieve differentiated generative AI features and possess excellent competitive advantages”; it does not require businesses and developers to deal with the messy work related to AI, significantly reducing the complexity of deploying AI; in NVIDIA’s words, it achieves plug-and-play AI.

Therefore, for enterprises or industries, the value of NIM is simply to enable generative AI technology to be quickly and truly put into production.

Moreover, NVIDIA has repeatedly emphasized the performance of NIM in its promotions. Therefore, when Hugging Face provides Inference-as-a-Service based on NVIDIA NIM running on DGX Cloud, it is said that its throughput is up to 5 times faster than services that do not use NIM (the preliminary deployment data provided at Computex this year was still 3 times faster).

Currently, NVIDIA has released more than 100 NIM microservices, including models from Google, Meta, Microsoft, Mistral, and others, spanning different fields and modalities. The newly released NIM includes new models like Llama 3.1, NeMo Retriever, Mistral Nemo 12B, etc. For models like Llama 3.1, NVIDIA provided the corresponding NIM the day after its release.

During the media Q&A session, Kari Briski (NVIDIA’s Vice President of Generative AI Software Product Management) stated that NVIDIA focuses on popular models in the community, as well as their licensing methods and training data; at the same time, it considers the release of NIM based on NVIDIA’s own focus on vertical fields and business units, and customer needs.

“Hugging Face has 750,000 models, and developers find it challenging to determine which ones are of high quality, which can be used for business production, and which require commercial licenses,” Kari mentioned, “However, with NIM, developers can trust that these have been thoroughly reviewed and optimized.”

The previously mentioned USD Code, USD Search, and USD Validate, as part of the 3D development models to promote AI for Graphics, naturally also become components of this release of NIM microservices.

At the same time, NVIDIA also previewed upcoming NIM related to OpenUSD and 3D development, including:

USD Layout — to compose multiple OpenUSD scenes based on text input;
USD SmartMaterial — to intelligently predict and apply real materials for 3D objects;
fVDB Mesh Generation — to generate meshes based on point cloud data using OpenUSD, rendered via Omniverse API;
fVDB Physics Super-Res — to execute AI super-resolution for individual or a series of frames, generating high-resolution physical simulations based on OpenUSD;
fVDB NeRF-XL — to generate large-scale NeRF using Omniverse API…

At this year’s SIGGRAPH, the NIM releases related to graphics and images also included an upgrade to Edify NIM for text-to-image. Getty Images updated its API service based on this, which not only has higher resolution and better image quality but also aligns more closely with prompt requirements, allowing for control over parameters such as focal length and depth of field in the prompts;

Edify 3D NIM has also officially entered commercial use, which refers to the NIM microservice for the previously mentioned text-to-3D model. Based on Edify 3D NIM, Shutterstock’s generative 3D service has entered commercial use — this service was just previewed in March at GTC.

In addition to generating 3D content, Shutterstock’s service now also provides the ability to generate lighting and 360° backgrounds for 3D scenes — primarily based on the Edify 360 HDRi model. This feature is now available for early access.

With these new models, NIM, or commercial services being released, one might feel that with the support of numerous technologies in AI for Graphics, in the future, it may not require much professional knowledge in 3D graphics for ordinary people to quickly engage in 3D design. This is similar to the application of AI in non-graphic fields, where it seems everyone can draw, program, or compose music.

“Aside from generative AI, I can’t recall any technology that has influenced individuals, businesses, industries, and even different fields of science — climate, biology, physics — at such a rapid pace: in any field we face, generative AI is at the core of fundamental change.” This is what Jensen Huang said during the fireside chat with Mark Zuckerberg. And AI for Graphics is clearly a part of this.

Next, Make AI More Controllable

Applying generative AI to content creation or potential industries can certainly make work more time-efficient and labor-saving. However, one major issue with AI is the uncontrollability of results. For example, using Stable Diffusion to create an image may not yield the expected outcome for the artist.

Digital artists we interviewed previously generally referred to this process as “drawing cards” — whether one can draw the desired outcome is left to chance; the rapid output of high-performance GPUs can significantly reduce the cost of drawing cards, yet the issue of uncontrollable results still exists.

During SIGGRAPH, Jensen Huang cited an example in a fireside chat with Lauren Goode, a senior writer for Wired, where he generated an image using the Edify.Image model: the text prompt was a wooden table under afternoon light, with a bottle of Coca-Cola and fried chicken beside it. From the PPT, the output looked quite good, but the details were uncontrollable, such as how many pieces of fried chicken, how the Coca-Cola was placed, and from what angle, etc.

To address the controllability issue, NVIDIA proposed a solution that first utilizes multimodal information from text, images, and videos to generate 3D scenes in Omniverse Cloud (including using Edify.3D for text-to-3D operations). Jensen Huang described Omniverse as “a place that can combine different modal data to form content output.”

In the Omniverse design environment, various assets can be aggregated, added, or modified. Designers and artists thus have complete autonomy — more importantly, compared to a purely text-to-image process, this solution allows for team collaboration.

The scene can then be rendered in 2D and used as input for Edify.Image or other generative models, possibly combined with other text prompts, ultimately generating images that are truly aligned with needs and more exquisite. In such images, specific requirements and factors, including the relationship between the Coca-Cola and fried chicken, framing, etc., are genuinely satisfied and controllable.

Currently, WPP has begun to adopt this workflow, and creating global advertising solutions for Coca-Cola is one of their earliest use cases. Therefore, it makes sense that the design industry was the first to embrace the baptism of generative AI — it has evolved to the point where this industry is now restructuring and exploring different workflows related to AI.

Jensen Huang described this process as somewhat similar to RAG (retrieval augmented generation); generally, we say RAG is a conventional solution to make generative AI more controllable, but this example is more accurately termed “3D augmented generation.”

Following this line of thought, not only in the graphics and imaging field but also when generative AI enters production or work processes, adding intermediate nodes to perform “augmented generation” will be key to achieving controllable final outputs and reducing model uncertainty in the future.

In conclusion: First, comprehensively promote the OpenUSD standard and ecosystem, applying this format and standard in the design field, digital twins, and the metaverse — “OpenUSD is the first format that aggregates multimodal data from different tools, allowing for interaction between different tools and ecosystems, and serving as a gateway to the virtual world” — this is Jensen Huang’s summary of OpenUSD.

Extending OpenUSD to robotics and industrial control fields reflects this statement. Then, based on OpenUSD, develop generative AI models and technologies that can contribute to graphics and potentially more fields: including understanding and validating the USD format, enabling 3D asset searches, and various AI that directly generates 3D content.

Finally, provide these AIs in the form of NIM microservices to developers and enterprise customers, lowering the barriers to AI development while accelerating the comprehensive commercial application and production of generative AI. Ultimately, based on the established AI models and NIM microservices, strive to create more controllable and collaborative AI through workflow innovation.

NVIDIA’s logic is quite clear and grounded. Over the past six months, we have also communicated with many digital artists, graphic designers, and upstream model developers in the tool chain. The transformation brought by AI in this field has been profound and rapid within just six months.

When many people question whether generative AI can be used for production, it is not just top conferences like SIGGRAPH that discuss it as a primary topic; many industry participants have already begun to use it to generate revenue and create industry value.

Recommended Hot Articles

Dissecting Yadea Electric Vehicle Controllers, Truly Cost-Effective!
After Adding a Soft Start Circuit, the Impact Current Has Improved Significantly
Xiaomi SU7 and Zeekr 007 Collision! Who Is Really Nervous?
Chinese Power Banks Unexpectedly Promoted by Trump
General Motors China Layoffs and Restructuring

Leave a Comment Cancel reply