An Efficient Secure KNN Classification Protocol

The new issue of high-level papers is here

This issue introduces the paper

An efficient secure k nearest neighbor classification protocol with high-dimensional features

Published in the INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS

The authors are Yang Ruidi and her advisor, Professor Sun Maohua

Part 01

Author Introduction

An Efficient Secure KNN Classification Protocol

Sun MaohuaINTERVIEW

Born in 1986 in Shandong, China. Graduated from Beijing University of Posts and Telecommunications in July 2013, receiving a Ph.D. in Engineering. Currently, he is the party secretary of the Department of Data Science and Big Data Technology. His main research directions include secure multi-party computation, information security, and applied cryptography. He teaches courses such as “Management Information Systems” and “Network Programming (Programming)”.

An Efficient Secure KNN Classification Protocol

Yang RuidiINTERVIEW

Born in 1997 in Beijing, China. Bachelor of Science, currently a master’s student in Software Engineering at the School of Management Engineering, Capital University of Economics and Business. Her main research directions include secure multi-party computation and privacy protection technology.

Part 02

Journal Information

The paper An efficient secure k nearest neighbor classification protocol with high-dimensional features was published in the INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, Issue 35, Page 11 (JCR Zone 1)

Part 03

Abstract

The k Nearest Neighbor (kNN) classification algorithm is a prediction model widely used for real-life applications such as healthcare, finance, computer vision, personalized recommendations, and precision marketing. The arrival of the data explosion era has resulted in a significant increase in feature dimension, which has also raised privacy concerns over the available samples and unlabeled data in machine learning applications. In this paper, we present a secure low communication overhead kNN classification protocol that can handle high-dimensional features represented in real numbers. First, to manage feature values given in real numbers, we develop a specific data conversion algorithm used in the chosen fully homomorphic scheme. This conversion algorithm is generic and applicable to other algorithms that need to handle real numbers using the fully homomorphic scheme. Second, we present a privacy-preserving Euclidean distance protocol (PPEDP), which computes the Euclidean distance between two points represented in real numbers in a high-dimensional space. Then, based on the novel PPEDP and oblivious transfer, we propose a new classification approach, the efficient secure kNN classification protocol (ESkNN), with low communication overhead, suitable for a sample set with high-dimensional features and real number feature values. Moreover, we implement ESkNN in C++. Experimental results show that ESkNN is several orders of magnitude faster than existing works and scales up to 18,000 feature dimensions in a memory-limited environment.

Part 04

Innovative Points and Significance

#Innovation Points

In this paper, we propose a secure, low-communication overhead kNN classification protocol that can handle high-dimensional features given in real numbers.

#Significance

① We propose a new data conversion algorithm that can handle real numbers and can be used with the BGV homomorphic scheme. This conversion algorithm is generic and applicable to other algorithms that need to handle real numbers with the BGV homomorphic scheme.

② We propose a privacy-preserving Euclidean distance protocol (PPEDP) based on the BGV homomorphic scheme. By using the novel data conversion algorithm, PPEDP can compute the Euclidean distance between two points given in real numbers in a high-dimensional space.

③ We propose a new classification method, the efficient secure kNN (ESkNN) classification protocol based on PPEDP and OT. Similarly, our ESkNN can handle real numbers. We use the HElib and EMP toolkits to construct the protocol, making our building block PPEDP usable in other applications.

Part 04

Conclusion

In this article, we present a privacy-preserving Euclidean distance protocol (PPEDP) that allows us to compute the Euclidean distance between two points given in real numbers in a high-dimensional space without worrying about privacy leakage.

Based on PPEDP and OT, we designed the classification protocol ESkNN. As shown in the figure, we compared our classification protocol ESkNN with OCkNN and PPkNN by varying k. Experimental results show that our ESkNN is faster than existing works when handling real numbers, especially when the sample dataset has more features than samples.
An Efficient Secure KNN Classification Protocol

Part 06

Experience and Insights from Yang Ruidi

Through a year of graduate study, I have transitioned from a clueless freshman to the graduation stage. In this fleeting year, I have opened the door to the world of scientific research, attempted to write and submit papers for the first time, participated in international conferences, and spoken at them… so many firsts have enriched my graduate life, giving me a new understanding of “scientific research”—a term both familiar and strange.

As the century-old elder Bingxin said: “Reading is good, good at reading, and reading good books.” Even at the graduate stage, reading and thinking remain an essential part of scientific research. Upon entering school, my advisor recommended professional books and papers, requiring me to read extensively each week and write reading notes. Through reading related to information security, I gradually gained a certain understanding of this unfamiliar field, focusing on the current state of research and previous works, which provided me with new insights. Additionally, through reading numerous papers, I mastered some useful skills: familiarizing myself with paper formats, learning various methods to search and download foreign papers, improving my English reading ability, and gaining more understanding of foreign journals and universities… all of these were practical experiences gained from extensive reading. Meanwhile, many websites now offer numerous video courses, such as those on the Computer Society’s official website and Chinese university MOOC, which are very beneficial for research in target fields. Only by paying attention to accumulation regularly can we achieve great progress when needed.

Scientific research must not be wasted at any time, and one must clarify their goals. Excellence comes from diligence, while negligence leads to failure; action stems from thought, while carelessness leads to ruin. I have a habit of summarizing weekly, reflecting on the work done that week, and setting plans for the next week. Scientific research avoids superficial understanding; it is not enough to just finish reading; the important part is “digestion”—one must think deeply, learn useful knowledge, and summarize it.

Lastly, I would like to express my gratitude to my graduate advisor, Professor Sun Maohua, who has guided my exploration in the target field like a bright lamp. Professor Sun’s encouragement and assistance have been an indispensable part of my graduate career.

Conclusion

Thanks to the scholarship recipient Yang Ruidi for her sharing. I hope everyone can gain something from this sharing and keep up the good work!

An Efficient Secure KNN Classification Protocol

Leave a Comment