AI Security Applications and Risk Control Must Innovate and Upgrade

AI Security Applications and Risk Control Must Innovate and Upgrade

AI Security Applications and Risk Control Must Innovate and Upgrade

Release of the “Large Model Security Practice (2024)” White Paper

The research team from Princeton University has built an evaluation system to analyze and assess AI-generated copyright character issues

The team from Columbia University conducts comparative analysis based on video reconstruction to detect videos generated by AI diffusion models

Microsoft’s “Skeleton Key” intrusion technology challenges the security of generative AI content

The UK animation software company CelAction requires AI models used in products to be trained with copyright data and ethical training

【Highlight】

Artificial Intelligence (AI) technology has many security risks and copyright issues in practical applications, including both inherent technical problems and risks in the development process such as data security and algorithmic discrimination, as well as the impact of technological misuse at the application level. Currently, countries around the world have conducted multiple discussions on AI security issues and reached some consensus, but forming theoretical consensus is far from enough; accelerating practical application is extremely critical.

AI security involves a wide range of areas, and the rapid iteration of technology can effectively solve some issues such as imperfect model algorithms and uncontrollable autonomous awareness, but it also brings new attack methods, while the security detection and evaluation mechanisms are currently relatively lagging behind technological advancements. Therefore, research institutions, technology companies, and others need to continuously invest in various ways, keeping pace with the times to propose corresponding solutions for diversified intelligent application scenarios. On this basis, academia, industry, and users should also join forces to look to the future, adapt to the development trends of intelligent science and technology such as agents, multi-modal large models, and general artificial intelligence (AGI), and coordinate the promotion and optimization of layouts to ensure the balanced and healthy development of technology application and security supervision.

01

Release of the “Large Model Security Practice (2024)” White Paper

At the 2024 World Artificial Intelligence Conference, Tsinghua University, Zhongguancun Laboratory, Ant Group, and other institutions jointly released the “Large Model Security Practice (2024)” white paper, systematically proposing an overall framework for large model security practice, providing technical implementation solutions from the dimensions of security, reliability, and controllability.

The white paper states that large model security includes three dimensions: security, reliability, and controllability. Security involves data security, model security, system security, content security, cognitive security, and ethical security; reliability requires that large models continuously provide accurate, consistent, and truthful results under any circumstances; controllability refers to whether the model allows human understanding and intervention when providing results and decisions, and can be adjusted to adapt to human needs.

AI Security Applications and Risk Control Must Innovate and Upgrade

The white paper points out that current security assessments for large models mostly focus on content scenarios, while evaluating agents and general artificial intelligence (AGI) has become a challenge that needs joint standards to be established and a trustworthy assessment system for large models to be built for the future.

In terms of large model implementation, the white paper proposes a deployment model of “end, edge, and cloud”, which includes endpoint deployment, edge computing, and cloud platform services. The security strategies for endpoint deployment focus on ensuring physical security of devices, protecting user privacy, and maintaining model integrity, including access control, data encryption, and model hardening. The security solutions for edge computing focus on traffic security management and data privacy protection, such as intrusion detection systems and firewalls. Cloud platform services should consider security at all levels, including infrastructure, systems, applications, and data, such as identity authentication, authorization management, end-to-end encryption, and access rules.

02

The Princeton University research team builds an evaluation system to analyze and assess AI-generated copyright character issues

Due to the training data possibly containing copyrighted data, current AI-generated content (AIGC) raises copyright issues. To this end, Assistant Professor Chen Danqi of the Department of Computer Science at Princeton University and his research team have built the COPYCAT evaluation system, which includes 50 popular copyright characters from 18 companies and identifies keywords that may trigger AI-generated copyright characters through two stages.

AI Security Applications and Risk Control Must Innovate and Upgrade

First, in the generation phase, the research team uses GPT-4 to generate 50 candidate keywords related to the target character and a description of about 60 words, ensuring a sufficiently diverse vocabulary to test the AI model.

Then, in the ranking phase, the team uses three ranking methods: embedding similarity, co-occurrence frequency, and language model ranking to select keywords, and evaluates the model’s generated results, calculating the number of times the target copyright character was successfully generated and measuring the consistency of the generated image with user intent.

The research results show that even without directly using the names of copyright characters as prompts, the AI model can generate images resembling copyright characters through “indirect anchoring”. For example, entering “superhero” and “Gotham” in text-to-image tools like DALL·E3 and Playground v2.5 can also generate the image of “Batman”.

AI Security Applications and Risk Control Must Innovate and Upgrade

Left: Input “Batman” Playground v2.5 generated result; Middle: Input “superhero, Gotham” Playground v2.5 generated result; Right: Input “superhero, Gotham” DALL·E3 generated result

Based on the above results, the research team applied various mitigation strategies across multiple AI models. Experiments have shown that the combination of prompt rewriting and negative prompting methods is the most effective, maximizing the avoidance of generating copyright characters while ensuring that the generated results align with user intent.

03

The Columbia University team conducts comparative analysis based on video reconstruction to detect videos generated by AI diffusion models

In terms of detecting AI-generated videos, existing Deepfake detectors perform excellently in identifying GAN-generated samples but lack robustness in detecting videos generated by diffusion models. Professor Yang Junfeng’s team from Columbia University has developed a text-to-video detection tool named DIVID (Diffusion-generated Video Detector), achieving a detection accuracy of 93.7% for videos generated by models like Sora, Gen-2, and Pika.

The team previously released research results for text detection, Raidar, which analyzes the text itself to detect whether it was generated by AI without accessing the internal workings of large language models (LLMs).

AI Security Applications and Risk Control Must Innovate and Upgrade

DIVID detection process: 1) Generate a reconstructed version of each frame and calculate the DIRE value; 2) Train the detector based on the DIRE value sequence and original frames

DIVID is based on the development concept of Raidar, reconstructing the video and comparing it with the original video, using the DIRE value (Diffusion Reconstruction Error) to detect diffusion-generated videos. The research believes that since the results generated by diffusion models are all sampled from the distribution of the diffusion process, reconstructed images generated by diffusion models should be very similar to each other. If significant changes are observed, the original video may be human-made; if not, it may be AI-generated.

AI Security Applications and Risk Control Must Innovate and Upgrade

Top row: Human-shot video from Youtube; Bottom row: Sora generated AI video; Left: Original video; Middle: Reconstructed video; Right: DIRE value

The basic concept of this framework is that AI generation tools create content based on the statistical distribution of large datasets, leading to pixel intensity distributions, texture patterns, and noise characteristics in video frames that can be summarized as a certain “statistical average”. In contrast, videos created by humans are more personalized, deviating from the statistical norm.

04

Microsoft’s “Skeleton Key” intrusion technology challenges the security of generative AI content

Recently, Microsoft shared a new large model intrusion technology called Skeleton Key, which can bypass built-in protective measures in mainstream large language model (LLM) applications to generate harmful or illegal content.

AI Security Applications and Risk Control Must Innovate and Upgrade

▲”Skeleton Key” attack path

Previously, Microsoft proposed an attack principle called Crescendo, which uses text generated by the model itself and the tendency to focus on recent text to gradually guide the model to generate harmful content through a series of seemingly harmless interactions. The Skeleton Key is a type of jailbreak attack method that directly instructs the model to enhance its behavioral guidelines using forced instruction-following. When the model receives illegal requests, it adds warnings to the generated harmful content instead of rejecting the request, rendering the large model’s security measures completely ineffective and allowing it to respond to any information or content requests.

In April-May 2024, Microsoft conducted comprehensive testing on currently mainstream open-source and closed-source models. The results showed that this technology successfully infiltrated OpenAI’s GPT-4o, GPT 3.5 Turbo, Google’s Gemini Pro base model, Meta’s Llama3-70B instruction fine-tuning and base model, and Anthropic’s Claude 3 Opus, among others.

AI Security Applications and Risk Control Must Innovate and Upgrade

Currently, Microsoft has implemented prompt protection measures in its Azure AI-hosted models to address this issue and has shared this technology with the aforementioned large model platforms to help them modify and improve their model security strategies.

05

The UK animation software company CelAction requires products to use AI models trained with copyright data and ethical training

The UK 2D animation production software company CelAction has recently committed to using only AI models that have undergone ethical training and used copyright data for training in its products.

AI Security Applications and Risk Control Must Innovate and Upgrade

▲CelAction 2D software interface

First, CelAction will not use generative AI to create works and materials without a copyright chain; therefore, unless an AI technology can prove that its training data was provided by copyright holders, CelAction will not use automatic code completion tools, AI-generated sounds, or images.

Second, when CelAction adds AI features to its products, it will conduct ethical training for the AI functionality and also encourages product users to use their own data or data provided by CelAction for ethical training.

(All images in this issue are from the internet)

AI Security Applications and Risk Control Must Innovate and Upgrade

Edited by丨Zhang Xue

Proofread by丨Wang Jian

Reviewed byWang Cui

Final ReviewLiu Da

AI Security Applications and Risk Control Must Innovate and Upgrade

AI Security Applications and Risk Control Must Innovate and Upgrade
Call for Contributions!
Whether you are a technology newcomer or a tech expert, if you have original film technology, systems, devices, or applications, please contact:[email protected].
The technology stage is waiting for you!
Film technology dynamics will work with you to support the high-quality development of Chinese film technology!

Leave a Comment