How To Build A Secure Machine Learning Environment

Children’s Day is approaching, but this year’s June 1st is a bit different. As the pandemic eases, many places have made it clear that after June 1st, various stages of basic education need to gradually return to school, and strict personnel control and temperature measurement must be implemented according to safety requirements. Currently, as the pandemic prevention and control enters a normalized stage, schools urgently need a solution that can achieve student identity recognition while ensuring students’ health. At present, the integrated solution of facial recognition and temperature measurement is the best solution to address schools’ pandemic joint prevention and control efforts.

Such efficient and intelligent solutions require the support of emerging technologies such as big data, artificial intelligence, and machine learning. This pandemic has brought about a “black swan” effect, which has truly made people realize the practical value of “technology for epidemic prevention”.

Machine learning is closely related to big data, with both interdependent and mutually reinforcing. The rapid development and popularity of machine learning in recent years are largely attributed to the support of big data. Machine learning is a technology that generates value based on data training. With more and more real data, machine learning can quickly improve the model’s accuracy. For machine learning, big data is always an indispensable part.

How to protect data throughout the entire machine learning process?

How To Build A Secure Machine Learning Environment

The typical workflow of a machine learning model is complex and can be divided into three parts. The first part is generating sample data, which is further divided into data acquisition, data cleaning, and data transformation. The second part is model training, which consists of training and evaluation. The third part is deployment, including monitoring, inference, and bias detection.

Simplifying machine learning is the ultimate goal of Amazon SageMaker. Due to the extreme complexity of the entire workflow of machine learning projects, along with the debugging and settings at various stages, it has made the implementation of machine learning projects quite challenging. Amazon SageMaker aims to help enterprises and developers quickly build, train, and deploy machine learning models, allowing users to start training simply by providing data.

Data runs through the entire machine learning project and must be an important object of protection. With the popularity of non-structured data recognition such as voice and images, the accuracy of machine learning in executing user commands has become crucial, which also makes data sources vulnerable to exploitation by attackers. This type of attack is called an adversarial attack, where attackers make subtle modifications to the data source, causing machine learning to make incorrect decisions, leading to more erroneous behaviors.

Using Amazon SageMaker can help users build a secure machine learning environment that meets users’ security protection needs in data storage, transmission, training, and other aspects. First, to create a secure network operating environment for Amazon SageMaker, a secure VPC must be established. This mainly includes the following points:

First: Gateway settings. Set up internet gateways and intranet gateways to create a private subnet for users.

Second: Build multi-subnets across availability zones. The goal is to achieve high availability for Amazon SageMaker.

Third: Introduce the concept of VPC Endpoint. This can prevent relevant resources from being connected only to the internet, reducing the risk of more resources being directly exposed on the internet.

Fourth: Security group settings. Control access to and from the entire VPC or security group.

Secondly, Amazon SageMaker’s approach to data protection is mainly divided into encrypting static data and encrypting dynamic data. On one hand, Amazon SageMaker typically uses KMS operational management services to encrypt static data, meeting encryption storage services including S3, EBS, CodeCommit git repository, etc. On the other hand, all inter-network data in transit supports TLS 1.2 encryption, with all requests sent to Amazon SageMaker API and console being conducted over secure (SSL) connections, ensuring that machine learning model projects and other system projects are encrypted both in transit and at rest.

Finally, for Jupyter Notebook instances and Docker containers for processing, training, and hosting models, AWS allows developers to specify AWS KMS keys. By default, Amazon SageMaker uses AWS SageMaker Service-Managed Key to encrypt data, but developers can also use their own keys to create CMK for encryption.

Team Collaboration and Clear Personnel Permission Levels

Due to the complex and tedious project processes of machine learning, access permission control needs to be clarified for developer teams. Typically, a machine learning project is completed by a developer team, but many enterprises or teams are still at the level of how to apply machine learning without paying attention to the security issues involved. From the developer’s perspective, the developer team needs to clarify which personnel correspond to which security levels to ensure the safe operation of machine learning projects.

For example, some startup internet companies lack operational experience with machine learning. After creating a virtual machine, they directly log into Jupyter Notebook and start writing and running code. All data contained in Jupyter Notebook does not implement clear isolation level protection, which can easily pose serious security risks to machine learning projects.

Amazon SageMaker addresses such issues by clarifying personnel permission levels. In Amazon SageMaker, it is possible to ensure basic permission configurations while facilitating internal collaboration. When a creator opens a Notebook, it is by default in their Amazon SageMaker VPC, and if others wish to access it, the creator needs to configure permissions. The elastic notebook included in Amazon SageMaker, Amazon SageMaker Notebooks, can also enable Jupyter Notebook with one click for easier internal collaboration among developer teams.

A more important innovation of Amazon SageMaker is decoupling programming and training settings. Since programming and training are separated in Amazon SageMaker, it allows each user to have single-container or single-machine support, enabling developers to use minimal computing resources to develop programs. This not only saves resources but also ensures data security isolation.

For enterprises or developer teams lacking experience in machine learning, Amazon SageMaker provides more detailed solutions. These mainly include the following four points:

First: Use AWS’s managed policies.

Second: Grant the minimum permissions for different roles to reduce unnecessary excessive permissions.

Third: Enable multi-factor authentication for sensitive operations.

Fourth: Build different policy conditions and API permission settings to enhance machine learning security.

Amazon also hopes to use SageMaker to help enterprises quickly apply machine learning in practical production. Tianjin Huai Technology has already begun using this service. As a startup established in 2015, the company has faced many challenges as its revenue continues to grow. The company’s development team applies Amazon SageMaker to integrate artificial intelligence with industry technological innovations into products and services, solving issues related to data processing and overseas market expansion and business transformation. Meanwhile, the technology and privacy protection certifications provided by AWS have avoided corporate privacy leaks and potential security risks.

Shared Responsibility Model Creates a Safer Environment for Machine Learning

As mentioned above, big data and machine learning are interdependent, and cloud computing has also played a decisive role in the development of machine learning. Cloud computing not only provides the storage required for big data but also offers high computing power, which is urgently needed to support complex machine learning algorithms through key technologies such as distributed computing. Amazon SageMaker, as a fully managed service based on AWS, can help enterprises and developers quickly build, train, and deploy models in the cloud.

For enterprises and developers, network security issues are inevitable. When using Amazon SageMaker, applying the shared responsibility model means that all aspects of the end-to-end process, including the physical security of components responsible for running, managing, and controlling from the host operating system and virtual layer to the service operation facilities, are managed by AWS, significantly reducing the operational burden on customers.

In the face of attacks, a dynamic security assurance system needs to be constructed, with logs being a key tool for establishing such a system. Amazon CloudWatch can monitor Amazon SageMaker in real-time, collect raw data, and process the data into readable metrics. Amazon CloudWatch Logs can also monitor information in log files, allowing users to set specific thresholds to notify users or take action when specified metrics are reached.

In addition, excellent log monitoring tools require efficient intelligent threat detection services for assistance. Amazon GuardDuty, as a threat detection service, can continuously monitor for malicious activities and unauthorized actions occurring in the AWS cloud, thereby protecting users’ AWS accounts and workloads. At the same time, Amazon GuardDuty comes with a machine learning engine that can quickly identify potential threats and prioritize them. It is evident that log monitoring tools combined with intelligent threat detection services can significantly shorten users’ response times.

Globally, Amazon SageMaker is the most commercially mature machine learning platform service, with a very well-established framework, integration, and ecosystem, and tens of thousands of enterprises worldwide use AWS to run machine learning algorithms. On April 30, 2020, Amazon SageMaker officially entered the Chinese market, opening in the AWS China (Beijing) region operated by Guanghua New Network and the AWS China (Ningxia) region operated by West Cloud Data.

Click read the original text to get the Amazon SageMaker gift package.

How To Build A Secure Machine Learning Environment

Click to see fewer bugs 👇

Leave a Comment Cancel reply