Retrieving NebulaGraph Using GPT2 Model in Langchain-Chatchat Project

In the official example, chain = NebulaGraphQAChain.from_llm(ChatOpenAI(temperature=0), graph=graph, verbose=True) is used to retrieve from the NebulaGraph database. This article introduces the idea and implementation of replacing ChatOpenAI with GPT2, without considering the performance for now. The reason for not using ChatGLM2 is due to the slow model loading and debugging inconvenience, but replacing GPT2 with ChatGLM2 is also straightforward.

1. Retrieving NebulaGraph with ChatOpenAI1. Implementation of NebulaGraph_OpenAI.py
Without the ChatGPT key and proxy, it cannot run as shown below:

"""
Example of connecting Langchain to NebulaGraph
"""
from langchain.chat_models import ChatOpenAI
from langchain.chains import NebulaGraphQAChain
from langchain.graphs import NebulaGraph

graph = NebulaGraph(
    space="basketballplayer",
    username="root",
    password="nebula",
    address="172.21.31.166",
    port=9669,
    session_pool_size=30,  # Set connection pool size
)
print(graph.get_schema)

chain = NebulaGraphQAChain.from_llm(  # Create a Q&A chain from the language model
    ChatOpenAI(temperature=0), graph=graph, verbose=True
)
chain.run("Who played in The Godfather II?")

2. Default prompt of NebulaGraphQAChain The basic idea is to introduce, give examples, show the graph schema, and limitations, as follows:

> Entering new NebulaGraphQAChain chain...
Generated nGQL:
Task: Generate NebulaGraph Cypher statement to query a graph database.

Instructions:

First, generate cypher then convert it to NebulaGraph Cypher dialect (rather than standard):
1. It requires explicit label specification only when referring to node properties: v.`Foo`.name
2. Note explicit label specification is not needed for edge properties, so it's e.name instead of e.`Bar`.name
3. It uses double equals sign for comparison: `==` rather than `=`
For instance:
diff
< MATCH (p:person)-[e:directed]->(m:movie) WHERE m.name = 'The Godfather II'
< RETURN p.name, e.year, m.name;
---
> MATCH (p:`person`)-[e:directed]->(m:`movie`) WHERE m.`movie`.`name` == 'The Godfather II'
> RETURN p.`person`.`name`, e.year, m.`movie`.`name`;

Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
Node properties: [{'tag': 'player', 'properties': [('name','string'), ('age', 'int64')]}, {'tag': 'team', 'properties': [('name','string')]}]
Edge properties: [{'edge': 'follow', 'properties': [('degree', 'int64')]}, {'edge':'serve', 'properties': [('start_year', 'int64'), ('end_year', 'int64')]}]
Relationships: ['(:player)-[:follow]->(:player)', '(:player)-[:serve]->(:team)']

Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.

The question is:
player100'age is what?
Full Context:
{}

2. Retrieving NebulaGraph with GPT21. Implementation of NebulaGraph_GPT2.py
Simply replace ChatOpenAI(temperature=0) with a custom GPT2(), as shown below:

"""
Example of connecting Langchain to NebulaGraph
"""
from langchain.chains import NebulaGraphQAChain
from langchain.graphs import NebulaGraph
from examples.GPT2 import GPT2

graph = NebulaGraph(  # Connect to NebulaGraph
    space="basketballplayer",
    username="root",
    password="nebula",
    address="172.24.211.214",
    port=9669,
    session_pool_size=30,  # Set connection pool size
)
print(graph.get_schema)  # Get the schema of the graph

chain = NebulaGraphQAChain.from_llm(  # Create a Q&A chain from the language model
    GPT2(), graph=graph, verbose=True
)
chain.run("player100'name is what?")  # Run the Q&A chain
chain.run("player100'age is what?")  # Run the Q&A chain

2. Implementation of GPT2.py The main task is to inherit the LLM class and implement the def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str: function, as shown below:

import time
import logging
import requests
from typing import Optional, List, Dict, Mapping, Any

import langchain
from langchain.llms.base import LLM
from langchain.cache import InMemoryCache

logging.basicConfig(level=logging.INFO)
# Start llm cache, if the same question is asked a second time, the model can quickly provide an answer without calling the model again, saving time
langchain.llm_cache = InMemoryCache()


class GPT2(LLM):
    # Model service URL
    url = "http://127.0.0.1:8595/chat"

    @property  # This decorator turns a method into a property
    def _llm_type(self) -> str:
        return "gpt2"

    def _construct_query(self, prompt: str) -> Dict:
        """
        Construct request body
        """
        query = {
            "human_input": prompt
        }
        return query

    @classmethod  # This decorator turns a method into a class method
    def _post(cls, url: str, query: Dict) -> Any:
        """
        POST request
        """
        _headers = {"Content_Type": "application/json"}
        with requests.session() as sess:  # This with statement creates an object in this block, which will be automatically destroyed after executing the block
            resp = sess.post(url, json=query, headers=_headers, timeout=60)
        return resp

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        """
        Note: This method is used to call the model, with parameters prompt and stop, where prompt is user input and stop is the end flag.
        """
        query = self._construct_query(prompt=prompt)  # Construct request body
        resp = self._post(url=self.url, query=query)  # Post request

        if resp.status_code == 200:  # Check if the request is successful
            resp_json = resp.json()  # Get the return result
            predictions = resp_json['response']  # Get the response field in the return result
            return predictions  # Return model result
        else:
            return "Request model"

    @property  # This decorator turns a method into a property
    def _identifying_params(self) -> Mapping[str, Any]:
        """
        This method retrieves identifying parameters
        """
        _param_dict = {
            "url": self.url
        }
        return _param_dict


if __name__ == "__main__":
    llm = GPT2()  # Instantiate GPT2 class
    while True:  # This while loop allows the user to continuously input
        human_input = input("Human: ")  # Get user input
        begin_time = time.time() * 1000  # Get current time

        response = llm(human_input, stop=["you"])  # Call the model
        end_time = time.time() * 1000  # Get current time
        used_time = round(end_time - begin_time, 3)  # Calculate model call time
        logging.info(f"GPT2 process time: {used_time}ms")  # Print model call time

        print(f"GPT2: {response}")  # Print model return result

3. Implementation of GPT2_Flask.py This mainly wraps GPT2 into an API using Flask, as shown below:

import os
import json
import torch
from flask import Flask
from flask import request
from transformers import GPT2LMHeadModel, GPT2Tokenizer

os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # Specify GPU, 0 means using the first GPU

pretrained_model_name_or_path = "L:/20230713_HuggingFaceModel/gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(pretrained_model_name_or_path, trust_remote_code=True)
model = GPT2LMHeadModel.from_pretrained(pretrained_model_name_or_path, trust_remote_code=True).half().cuda()
model.eval()

app = Flask(__name__)


@app.route("/", methods=["POST", "GET"])
def root():
    return "Welcome to gpt2 model"


@app.route("/chat", methods=["POST"])
def chat():
    data_seq = request.get_data()  # Get request data
    data_dict = json.loads(data_seq)  # Convert request data to dictionary
    human_input = data_dict["human_input"]  # Get human_input field from request data

    # response, _ = model.chat(tokenizer, human_input, history=[])  # ChatGLM can use this method

    # Encode the input text into tokens
    input_ids = tokenizer.encode(human_input, return_tensors="pt")
    input_ids = input_ids.cuda()
    # Perform model inference
    with torch.no_grad():  # This with statement creates an object in this block, which will be automatically destroyed after executing the block
        output = model.generate(input_ids, max_length=50, num_return_sequences=1)  # Generate model output, max_length indicates the maximum length of generation, num_return_sequences indicates the number of generated sequences
    output = output.cuda()
    # Decode the generated tokens into strings, skip_special_tokens=True means skipping special characters, clean_up_tokenization_spaces=True means cleaning up tokenization spaces
    response = tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)

    result_dict = {  # Construct return result
        "response": response
    }
    result_seq = json.dumps(result_dict, ensure_ascii=False)  # Convert return result to json string
    return result_seq  # Return result


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8595, debug=False)

Because general LLMs convert text to nGQL through prompts, which is not professional, I believe the future development should still focus on specialized LLMs as agents to accomplish this task.

References:[1]https://huggingface.co/gpt2[2] Using LLMs module to connect custom large models: https://blog.csdn.net/zhaomengsen/article/details/130585397[3]https://github.com/ai408/Langchain-Chatchat/blob/master/examples/NebulaGraph_GPT2.py[4]https://github.com/ai408/Langchain-Chatchat/blob/master/examples/GPT2.py[5]https://github.com/ai408/Langchain-Chatchat/blob/master/examples/GPT2_Flask.py

Leave a Comment