Hello everyone! Today I want to share with you a very useful Python library – Pinecone. In the field of AI and machine learning, vector search has become a key technology, and Pinecone provides us with a powerful cloud vector search solution. Through today’s learning, you will understand how to use Python to operate Pinecone and achieve efficient vector retrieval functionality.
1. Introduction to Pinecone and Environment Setup
First, we need to install the Pinecone Python client:
# Install pinecone-client using pip
pip install pinecone-client
# Import necessary libraries
import pinecone
import numpy as np
Tip: Make sure you have registered an account on the Pinecone official website and obtained an API key, which is necessary for subsequent operations.
2. Initializing the Pinecone Client
Let’s see how to connect to the Pinecone service:
# Initialize the Pinecone client
pinecone.init(
api_key="your-api-key", # Replace with your API key
environment="us-west1-gcp" # Choose the region closest to you
)
# Create or connect to an index
index_name = "product-search"
dimension = 384 # Vector dimension
# Check if the index exists
if index_name not in pinecone.list_indexes():
pinecone.create_index(
name=index_name,
dimension=dimension,
metric="cosine" # Similarity calculation method
)
# Connect to the index
index = pinecone.Index(index_name)
3. Uploading and Managing Vector Data
Now let’s add some vector data:
# Generate example vector data
def generate_random_vector(dim=384):
return list(np.random.random(dim))
# Batch upload vectors
vectors_with_metadata = [
(
f"vec_{i}", # Vector ID
generate_random_vector(), # Vector data
{"category": "electronics", "price": 299.99} # Metadata
)
for i in range(5)
]
# Use the upsert method to upload data
index.upsert(vectors=vectors_with_metadata)
Tip: In practical applications, vectors usually come from embeddings of images, texts, or other data. Here we use random vectors as an example.
4. Vector Retrieval Operations
Let’s see how to perform vector retrieval:
# Execute vector retrieval
query_vector = generate_random_vector()
search_results = index.query(
vector=query_vector,
top_k=3, # Return the 3 most similar results
include_metadata=True # Include metadata
)
# Process the retrieval results
for match in search_results['matches']:
print(f"ID: {match['id']}")
print(f"Score: {match['score']:.4f}")
print(f"Metadata: {match['metadata']}\n")
5. Advanced Features: Metadata Filtering
Pinecone supports filtering queries based on metadata:
# Use metadata filter for retrieval
filtered_results = index.query(
vector=query_vector,
top_k=3,
include_metadata=True,
filter={
"category": "electronics",
"price": {"$lte": 300} # Price less than or equal to 300
}
)
Important Note:
-
Vector dimension must be determined when creating the index and cannot be changed later.
-
It is recommended to use batch uploads instead of single inserts for better efficiency.
-
Be careful to protect your API key and do not expose it in your code.
6. Index Management and Maintenance
Regular maintenance of your index is important:
# Get index statistics
stats = index.describe_index_stats()
print(f"Total number of vectors: {stats['total_vector_count']}")
# Delete specific vectors
index.delete(ids=["vec_1", "vec_2"])
# Clear the index
# index.delete(delete_all=True)
Today’s Python learning journey ends here! Remember to practice coding. Through practice, you will find that Pinecone is not only easy to use but also helps you quickly build powerful vector search applications. Happy learning, and may your Python skills improve rapidly!