In the official example, chain = NebulaGraphQAChain.from_llm(ChatOpenAI(temperature=0), graph=graph, verbose=True)
is used to retrieve from the NebulaGraph database. This article introduces the idea and implementation of replacing ChatOpenAI with GPT2, without considering the performance for now. The reason for not using ChatGLM2 is due to the slow model loading and debugging inconvenience, but replacing GPT2 with ChatGLM2 is also straightforward.
1. Retrieving NebulaGraph with ChatOpenAI1. Implementation of NebulaGraph_OpenAI.py
Without the ChatGPT key and proxy, it cannot run as shown below:
"""
Example of connecting Langchain to NebulaGraph
"""
from langchain.chat_models import ChatOpenAI
from langchain.chains import NebulaGraphQAChain
from langchain.graphs import NebulaGraph
graph = NebulaGraph(
space="basketballplayer",
username="root",
password="nebula",
address="172.21.31.166",
port=9669,
session_pool_size=30, # Set connection pool size
)
print(graph.get_schema)
chain = NebulaGraphQAChain.from_llm( # Create a Q&A chain from the language model
ChatOpenAI(temperature=0), graph=graph, verbose=True
)
chain.run("Who played in The Godfather II?")
2. Default prompt of NebulaGraphQAChain The basic idea is to introduce, give examples, show the graph schema, and limitations, as follows:
> Entering new NebulaGraphQAChain chain...
Generated nGQL:
Task: Generate NebulaGraph Cypher statement to query a graph database.
Instructions:
First, generate cypher then convert it to NebulaGraph Cypher dialect (rather than standard):
1. It requires explicit label specification only when referring to node properties: v.`Foo`.name
2. Note explicit label specification is not needed for edge properties, so it's e.name instead of e.`Bar`.name
3. It uses double equals sign for comparison: `==` rather than `=`
For instance:
diff
< MATCH (p:person)-[e:directed]->(m:movie) WHERE m.name = 'The Godfather II'
< RETURN p.name, e.year, m.name;
---
> MATCH (p:`person`)-[e:directed]->(m:`movie`) WHERE m.`movie`.`name` == 'The Godfather II'
> RETURN p.`person`.`name`, e.year, m.`movie`.`name`;
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
Node properties: [{'tag': 'player', 'properties': [('name','string'), ('age', 'int64')]}, {'tag': 'team', 'properties': [('name','string')]}]
Edge properties: [{'edge': 'follow', 'properties': [('degree', 'int64')]}, {'edge':'serve', 'properties': [('start_year', 'int64'), ('end_year', 'int64')]}]
Relationships: ['(:player)-[:follow]->(:player)', '(:player)-[:serve]->(:team)']
Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.
The question is:
player100'age is what?
Full Context:
{}
2. Retrieving NebulaGraph with GPT21. Implementation of NebulaGraph_GPT2.py
Simply replace ChatOpenAI(temperature=0)
with a custom GPT2()
, as shown below:
"""
Example of connecting Langchain to NebulaGraph
"""
from langchain.chains import NebulaGraphQAChain
from langchain.graphs import NebulaGraph
from examples.GPT2 import GPT2
graph = NebulaGraph( # Connect to NebulaGraph
space="basketballplayer",
username="root",
password="nebula",
address="172.24.211.214",
port=9669,
session_pool_size=30, # Set connection pool size
)
print(graph.get_schema) # Get the schema of the graph
chain = NebulaGraphQAChain.from_llm( # Create a Q&A chain from the language model
GPT2(), graph=graph, verbose=True
)
chain.run("player100'name is what?") # Run the Q&A chain
chain.run("player100'age is what?") # Run the Q&A chain
2. Implementation of GPT2.py The main task is to inherit the LLM
class and implement the def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
function, as shown below:
import time
import logging
import requests
from typing import Optional, List, Dict, Mapping, Any
import langchain
from langchain.llms.base import LLM
from langchain.cache import InMemoryCache
logging.basicConfig(level=logging.INFO)
# Start llm cache, if the same question is asked a second time, the model can quickly provide an answer without calling the model again, saving time
langchain.llm_cache = InMemoryCache()
class GPT2(LLM):
# Model service URL
url = "http://127.0.0.1:8595/chat"
@property # This decorator turns a method into a property
def _llm_type(self) -> str:
return "gpt2"
def _construct_query(self, prompt: str) -> Dict:
"""
Construct request body
"""
query = {
"human_input": prompt
}
return query
@classmethod # This decorator turns a method into a class method
def _post(cls, url: str, query: Dict) -> Any:
"""
POST request
"""
_headers = {"Content_Type": "application/json"}
with requests.session() as sess: # This with statement creates an object in this block, which will be automatically destroyed after executing the block
resp = sess.post(url, json=query, headers=_headers, timeout=60)
return resp
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
"""
Note: This method is used to call the model, with parameters prompt and stop, where prompt is user input and stop is the end flag.
"""
query = self._construct_query(prompt=prompt) # Construct request body
resp = self._post(url=self.url, query=query) # Post request
if resp.status_code == 200: # Check if the request is successful
resp_json = resp.json() # Get the return result
predictions = resp_json['response'] # Get the response field in the return result
return predictions # Return model result
else:
return "Request model"
@property # This decorator turns a method into a property
def _identifying_params(self) -> Mapping[str, Any]:
"""
This method retrieves identifying parameters
"""
_param_dict = {
"url": self.url
}
return _param_dict
if __name__ == "__main__":
llm = GPT2() # Instantiate GPT2 class
while True: # This while loop allows the user to continuously input
human_input = input("Human: ") # Get user input
begin_time = time.time() * 1000 # Get current time
response = llm(human_input, stop=["you"]) # Call the model
end_time = time.time() * 1000 # Get current time
used_time = round(end_time - begin_time, 3) # Calculate model call time
logging.info(f"GPT2 process time: {used_time}ms") # Print model call time
print(f"GPT2: {response}") # Print model return result
3. Implementation of GPT2_Flask.py This mainly wraps GPT2 into an API using Flask, as shown below:
import os
import json
import torch
from flask import Flask
from flask import request
from transformers import GPT2LMHeadModel, GPT2Tokenizer
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Specify GPU, 0 means using the first GPU
pretrained_model_name_or_path = "L:/20230713_HuggingFaceModel/gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(pretrained_model_name_or_path, trust_remote_code=True)
model = GPT2LMHeadModel.from_pretrained(pretrained_model_name_or_path, trust_remote_code=True).half().cuda()
model.eval()
app = Flask(__name__)
@app.route("/", methods=["POST", "GET"])
def root():
return "Welcome to gpt2 model"
@app.route("/chat", methods=["POST"])
def chat():
data_seq = request.get_data() # Get request data
data_dict = json.loads(data_seq) # Convert request data to dictionary
human_input = data_dict["human_input"] # Get human_input field from request data
# response, _ = model.chat(tokenizer, human_input, history=[]) # ChatGLM can use this method
# Encode the input text into tokens
input_ids = tokenizer.encode(human_input, return_tensors="pt")
input_ids = input_ids.cuda()
# Perform model inference
with torch.no_grad(): # This with statement creates an object in this block, which will be automatically destroyed after executing the block
output = model.generate(input_ids, max_length=50, num_return_sequences=1) # Generate model output, max_length indicates the maximum length of generation, num_return_sequences indicates the number of generated sequences
output = output.cuda()
# Decode the generated tokens into strings, skip_special_tokens=True means skipping special characters, clean_up_tokenization_spaces=True means cleaning up tokenization spaces
response = tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
result_dict = { # Construct return result
"response": response
}
result_seq = json.dumps(result_dict, ensure_ascii=False) # Convert return result to json string
return result_seq # Return result
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8595, debug=False)
Because general LLMs convert text to nGQL through prompts, which is not professional, I believe the future development should still focus on specialized LLMs as agents to accomplish this task.
References:[1]https://huggingface.co/gpt2[2] Using LLMs module to connect custom large models: https://blog.csdn.net/zhaomengsen/article/details/130585397[3]https://github.com/ai408/Langchain-Chatchat/blob/master/examples/NebulaGraph_GPT2.py[4]https://github.com/ai408/Langchain-Chatchat/blob/master/examples/GPT2.py[5]https://github.com/ai408/Langchain-Chatchat/blob/master/examples/GPT2_Flask.py