Pinecone-Python: A Cloud Solution for Vector Search

Hello everyone! Today I want to share with you a very useful Python library – Pinecone. In the field of AI and machine learning, vector search has become a key technology, and Pinecone provides us with a powerful cloud vector search solution. Through today’s learning, you will understand how to use Python to operate Pinecone and achieve efficient vector retrieval functionality.

1. Introduction to Pinecone and Environment Setup

First, we need to install the Pinecone Python client:

# Install pinecone-client using pip

pip install pinecone-client


# Import necessary libraries

import pinecone

import numpy as np

Tip: Make sure you have registered an account on the Pinecone official website and obtained an API key, which is necessary for subsequent operations.

2. Initializing the Pinecone Client

Let’s see how to connect to the Pinecone service:

# Initialize the Pinecone client

pinecone.init(

    api_key="your-api-key",  # Replace with your API key

    environment="us-west1-gcp"  # Choose the region closest to you

)


# Create or connect to an index

index_name = "product-search"

dimension = 384  # Vector dimension


# Check if the index exists

if index_name not in pinecone.list_indexes():

    pinecone.create_index(

        name=index_name,

        dimension=dimension,

        metric="cosine"  # Similarity calculation method

    )


# Connect to the index

index = pinecone.Index(index_name)

3. Uploading and Managing Vector Data

Now let’s add some vector data:

# Generate example vector data

def generate_random_vector(dim=384):

    return list(np.random.random(dim))


# Batch upload vectors

vectors_with_metadata = [

    (

        f"vec_{i}",  # Vector ID

        generate_random_vector(),  # Vector data

        {"category": "electronics", "price": 299.99}  # Metadata

    )

    for i in range(5)

]


# Use the upsert method to upload data

index.upsert(vectors=vectors_with_metadata)

Tip: In practical applications, vectors usually come from embeddings of images, texts, or other data. Here we use random vectors as an example.

4. Vector Retrieval Operations

Let’s see how to perform vector retrieval:

# Execute vector retrieval

query_vector = generate_random_vector()

search_results = index.query(

    vector=query_vector,

    top_k=3,  # Return the 3 most similar results

    include_metadata=True  # Include metadata

)


# Process the retrieval results

for match in search_results['matches']:

    print(f"ID: {match['id']}")

    print(f"Score: {match['score']:.4f}")

    print(f"Metadata: {match['metadata']}\n")

5. Advanced Features: Metadata Filtering

Pinecone supports filtering queries based on metadata:

# Use metadata filter for retrieval

filtered_results = index.query(

    vector=query_vector,

    top_k=3,

    include_metadata=True,

    filter={

        "category": "electronics",

        "price": {"$lte": 300}  # Price less than or equal to 300

    }

)

Important Note:

Vector dimension must be determined when creating the index and cannot be changed later.
It is recommended to use batch uploads instead of single inserts for better efficiency.
Be careful to protect your API key and do not expose it in your code.

6. Index Management and Maintenance

Regular maintenance of your index is important:

# Get index statistics

stats = index.describe_index_stats()

print(f"Total number of vectors: {stats['total_vector_count']}")


# Delete specific vectors

index.delete(ids=["vec_1", "vec_2"])


# Clear the index

# index.delete(delete_all=True)

Today’s Python learning journey ends here! Remember to practice coding. Through practice, you will find that Pinecone is not only easy to use but also helps you quickly build powerful vector search applications. Happy learning, and may your Python skills improve rapidly!