How to Make NPCs ‘Live’? Use CrewAI to Crack New Virtual Dialogue!

Dialogue analysis uses output from:Using proxies to bring NPCs to life with CrewAI

Analysis

  • • Simulation 1: A group of software engineers, computer scientists, and computer engineers
  • • Conclusion
  • • Support

Analysis Methods

  • • Extracted Features
  • • Split global_conversations.txt
  • • Sentiment, Topic, Vocabulary Diversity, Emotion
  • • Self-Similarity
  • • Notebook

Background

Previously, I discussed in my articles Injecting Life into NPCs Using Proxies and Injecting Life into NPCs with CrewAI why I am interested in simulating two-dimensional societies. Let’s analyze a brief dialogue conducted using a dialogue party simulator. Note that this is prior to the development of any NPCs interacting with the world.

Population of Software Engineers, Computer Scientists, and Computer Engineers

We added NPCs with backgrounds in relevant research fields. Their initial tasks should enable them to easily discuss with each other.

NPCs

Non-playable characters in the simulation

Carly Cummings — Computer Scientist, Initial Task: Apply Bitwise Operations

Katherine Jones — Computer Engineer, Initial Task: Create Half Adder

Ashley Brown — Electrical Engineer, Initial Task: Apply Boolean Logic

Conclusion

  • • The three NPCs consistently revolve around the topic, or more specifically, they rarely deviate from the initial topic.
  • • In specific dialogues, the range of vocabulary used is impressively diverse; however, echoes of the discussion can be seen in the responses, which is evident when comparing similarity between texts.
  • • Topic analysis reveals two dialogues. It is important to note that this may be because the two individuals did not interact. The combinations of bidirectional dialogues are: Carly<->Katherine, Carly<-Ashley

Support

Text

The dialogue text can be found at the beginning of this notebook

Sentiment Analysis:

Remember: -1 is negative, 0 is neutral, +1 is positive

From the analyzed dialogues, we can see that some very neutral and relatively positive dialogues occurred.

How to Make NPCs 'Live'? Use CrewAI to Crack New Virtual Dialogue!

Topics

Identified two topic groups:

Topic 0 includes: ‘queries’, ‘caching’, ‘query’, ‘optimization’, ‘database’, ‘efficient’, ‘strategy’, ‘performance’, ‘retrieval’, ‘data’

Topic 1 includes: ‘enhancement’, ‘like’, ‘algorithm’, ‘data’, ‘efficient’, ‘boolean’, ‘operation’, ‘design’, ‘cpu’, ‘logic’

How to Make NPCs 'Live'? Use CrewAI to Crack New Virtual Dialogue!

Vocabulary Diversity

Remember: This is the ratio of unique words to all words. In other words, as vocabulary diversity approaches 1 (maximum uniqueness), more words are unique; as vocabulary diversity approaches 0 (minimum uniqueness), more words are repeated.

We can see from this chart that the use of words in each dialogue has medium to high uniqueness.

How to Make NPCs 'Live'? Use CrewAI to Crack New Virtual Dialogue!

Emotion Recognition

Using Hugging Face Transformers for emotion recognition: emotion_pipeline = pipeline(‘sentiment-analysis’, model=’bhadresh-savani/distilbert-base-uncased-emotion’)

The identified emotions in the communication are surprise and joy.

How to Make NPCs 'Live'? Use CrewAI to Crack New Virtual Dialogue!

Semantic Similarity

Remember: 1 indicates complete similarity, while -1 indicates complete dissimilarity

Clearly, the conversations remain on similar topics; for example, our least similar cosine similarity is about 0.8, which is quite similar.

How to Make NPCs 'Live'? Use CrewAI to Crack New Virtual Dialogue!

Analysis Methods

Extracted Features

  1. 1. Sentiment Analysis: Quick polarity assessment using TextBlob
  2. 2. Topic Modeling: Using LDA to identify topics; ensure data preprocessing for better results (e.g., removing stop words, lemmatization).
  3. 3. Vocabulary Diversity: Simple type-token ratio.
  4. 4. Emotion Recognition: Using a BERT-based model on Hugging Face to identify emotions beyond simple sentiment.
  5. 5. Semantic Similarity: Basic example of embedding similarity using BERT, indicating contextual alignment between dialogue turns.

Split global_conversations.txt

We have a pattern of who is talking to whom, so we split on these vectors; ignoring the speaker and audience

import re
def split_conversation(raw_conversation):

# Our dialogue is roughly "&lt;person talking&gt; (talking to &lt;person&gt;):..."
# Let's split based on this pattern
  lines = re.split(r'^.*?\(talking to.*?\):', raw_conversation, flags=re.MULTILINE)

# Remove empty lines and strip whitespace from each line
return [line.strip() for line in lines if line.strip()]

Emotion Themes and Sentiment

import nltk
nltk.download('punkt_tab')
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('vader_lexicon')
import gensim
import numpy as np
import pandas as pd
from textblob import TextBlob
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from nltk.corpus import stopwords
import transformers
from transformers import pipeline, BertTokenizer, BertModel
from gensim import corpora, models
from collections import Counter
import networkx as nx

def sentiment_analysis(conversations):
    sentiments = []
    for conversation in conversations:
        blob = TextBlob(conversation)
        sentiments.append(blob.sentiment.polarity)
    return sentiments

def topic_modeling(conversations, num_topics=2):
    cv = CountVectorizer(stop_words='english')
    dtm = cv.fit_transform(conversations)
    lda = LatentDirichletAllocation(n_components=num_topics, random_state=0)
    lda.fit(dtm)
    topic_results = lda.transform(dtm)
    topic_words = {}
    for i, topic in enumerate(lda.components_):
        topic_words[f"Topic {i}"] = [cv.get_feature_names_out()[j] 
                                     for j in topic.argsort()[-10:]]
    return topic_words

def lexical_diversity(conversations):
    diversities = []
    for conversation in conversations:
        words = nltk.word_tokenize(conversation)
        diversity = len(set(words)) / len(words) #"set of words"/"num words"
        diversities.append(diversity)
    return diversities

### Use Hugging Face Transformers for emotion recognition
emotion_pipeline = pipeline('sentiment-analysis', 
                            model='bhadresh-savani/distilbert-base-uncased-emotion')

def detect_emotions(conversation_texts):
    return emotion_pipeline(conversation_texts)

sentiments = sentiment_analysis(conversations)
topics = topic_modeling(conversations)
diversities = lexical_diversity(conversations)
emotions = detect_emotions(conversations)

print(f'Sentiments: {sentiments}')
print(f'Topics: {topics}')
print(f'Lexical Diversities: {diversities}')
print(f'Emotions: {emotions}')

Self-Similarity

We initialize a BERT tokenizer and model from Hugging Face transformers, which are pre-trained on the “bert-base-uncased” corpus to compute semantic similarity between a given list of texts. The get_embeddings() function tokenizes sentences, generates their embeddings using BERT, and averages the vectors to produce a single embedding. The calculate_cosine_similarity() function computes the cosine similarity between two embedding vectors, quantifying their semantic similarity. The semantic_similarity() function processes all dialogues, obtains their embeddings, and fills a matrix with pairwise cosine similarities. When filling the matrix, be sure to set self-similarity comparisons to 1.0. This matrix forms the basis of a heatmap, where yellow indicates more similarity and blue indicates less similarity.

from transformers import BertTokenizer, BertModel
import torch
import numpy as np
import matplotlib.pyplot as plt

### Init BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

def get_embeddings(sentence):
    inputs = tokenizer(sentence, return_tensors='pt', truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).squeeze()

def calculate_cosine_similarity(embed1, embed2):
    cos_sim = np.dot(embed1, embed2) / (np.linalg.norm(embed1) * np.linalg.norm(embed2))
    return cos_sim

def semantic_similarity(conversations):
    embeddings = [get_embeddings(conversation).numpy() for conversation in conversations]
    similarities = np.zeros((len(conversations), len(conversations)))

    for i in range(len(conversations)):
        for j in range(len(conversations)):
            if i != j:
                similarities[i][j] = calculate_cosine_similarity(embeddings[i], embeddings[j])
            else:
                similarities[i][j] = 1.0 # Similarity of a sentence with itself

    return similarities

### Calculate semantic similarities for the entire conversation set
similarities = semantic_similarity(conversations)

### Semantic similarity as a heatmap
plt.figure(figsize=(8, 6))
plt.imshow(similarities, cmap='viridis', interpolation='nearest')
plt.colorbar(label='Cosine Similarity')
plt.xticks(ticks=range(len(conversations)), labels=[f"Conv-{i+1}" for i in range(len(conversations))], rotation=45)
plt.yticks(ticks=range(len(conversations)), labels=[f"Conv-{i+1}" for i in range(len(conversations))])
plt.title('Semantic Similarity Heatmap Among Conversations')
plt.xlabel('Conversation Index')
plt.ylabel('Conversation Index')
plt.show()

Notes

If you find this article enlightening, please consider liking it — this not only supports the author but also helps others discover valuable insights. Also, don’t forget to subscribe for more in-depth articles on innovative technologies and developments in artificial intelligence. Thank you very much for your participation!

How to Make NPCs 'Live'? Use CrewAI to Crack New Virtual Dialogue!
Visit 200+ LLM aggregation platforms: https://rifx.online
!!! Note: To access the links in the article, please click the original link to view

Leave a Comment