Generate Subtitles Using Python and OpenCV

Click on the above Beginner’s Visual Learning” to select and add Star or Top.

Important content delivered at the first time

View the code on GitHub
https://github.com/antoneev/66DaysOfData/tree/main/captionSuggestionsUsingLyrics
View the deployed application
https://share.streamlit.io/antoneev/66daysofdata/main/captionSuggestionsUsingLyrics/app.py

Table of Contents

  • Introduction
  • Color Detection
  • Object Detection
  • Similar Word Suggestions
  • Lyrics Genius API
  • Main Function
  • Streamlit
  • Deployment
  • Resources

Introduction

The purpose of this project is to provide subtitles for images using lyrics. One of the problems many people face today is the lack of witty, clever, insightful captions, or simply good subtitles.
With the rise of social media, everything needs a caption. From photos with grandma to cute pictures with dogs, the problem is that most of us don’t have the right words to caption our photos, which makes us wait weeks to post, and sometimes we never post at all. Other times, we simply post without a caption.
This project attempts to solve this problem by using object and color detection to find elements in photos. The algorithm then uses the selected artist’s lyrics from Lyrics Genius API to search for these elements throughout the system and returns them.

Color Detection

The first stop in the algorithm list is color detection. Color detection is done using K-Means clustering, OpenCV, and the colors.csv file.
K-Means Clustering
  1. After opening the file and converting it on the RGB channel and resizing it, K-means is applied. K-means is used to return the “n main” colors found in the image based on the number chosen by the user.
  2. If the user inputs a black image and tells the system to search for 10 colors, this instance will also be handled. The system will return a single black.
def palette(clusters):
    width = 300
    palette = np.zeros((50, width, 3), np.uint8)
    steps = width/clusters.cluster_centers_.shape[0]
    for idx, centers in enumerate(clusters.cluster_centers_):
        palette[:, int(idx*steps):(int((idx+1)*steps)), :] = centers
    return palette
OpenCV
  1. After this value is set, the algorithm detects n main colors in the image. Then we save the image and the palette in compare_img().
  2. Next, we use all_colors_in_img() to return a list of all colors in the image. This repeatedly returns the same colors.
  3. Thus, np.unique() is used to return unique values.
def compare_img(img_1, img_2):
    f, ax = plt.subplots(1, 2, figsize=(10,10))
    ax[0].imshow(img_1)
    ax[1].imshow(img_2)
    ax[0].axis('off')
    ax[1].axis('off')
    f.tight_layout()
    #plt.show()
    f.savefig('outputImgs/colorDetected.png')
Colors.csv
  1. Then call predict_color(). This function receives the RGB color channel returned by all_colors_in_img(). It then uses colors.csv to find the closest color.
  2. Finally, due to certain colors returning “Alice Blue” and other specific color needs, we use the return_root_color() function to return “blue”.
  3. Additionally, for example, colors that may end with parentheses. Colors are also handled by returning the word before the parentheses. This is done because all root colors are noted as the last word in the color phrase.
def recognize_color(color_index):
    csv = pd.read_csv('files/colors.csv', names=index, header=None)

    R = color_index[0]
    G = color_index[1]
    B = color_index[2]
    minimum = 10000
    for i in range(len(csv)):
        d = abs(R- int(csv.loc[i,"R"])) + abs(G- int(csv.loc[i,"G"]))+ abs(B- int(csv.loc[i,"B"]))
        if(d<=minimum):
            minimum = d
            cname = csv.loc[i,"color_name"]
    return cname

Object Detection

After color detection, the system then begins object detection. Object detection is done using the OpenCV dnn_DetectionModel function.
  1. We first load the configuration and model files (noted in resources).
  2. Then we load the coco.names file, which lists all object names.
  3. Next, we configure the parameters of our model.
  4. Then, we pass the image to our detection loop. It loops through each image surrounding the image drawing boxes and places the name of the image in the photo. Each unique object is then appended to a list.
  5. This for loop is contained within a try-except to handle the case where no objects are found.
font_scale = 1 ### Font size
font = cv.FONT_HERSHEY_PLAIN ### Font type
try:
    for ClassInd, conf, boxes in zip(ClassIndex.flatten(), confidence.flatten(), bbox):
        cv.rectangle(img,boxes,(255,0,0),2) ### Places boxes around image
        ### Outputs the text around image
        cv.putText(img,classLabels[ClassInd-1],(boxes[0]+10,boxes[1]-5), font, fontScale=font_scale, color=(0,255,0), thickness=1)
        ### Checks if object is already in list
        if classLabels[ClassInd-1] not in ListofObjects and classLabels[ClassInd-1] != 'person':
            ### Places new object in list
            ListofObjects.append(classLabels[ClassInd-1])
except:
    return print('Object Detection exited...') ### Exits algorithm if no object found

Similar Word Suggestions

Next, we call similar word suggestions. Similar word suggestions are only called for objects found in the image. When looking for similar words for colors, words considered irrelevant were found; thus, colors were not used.
  1. Once the algorithm starts, it first loads the gloVe file found in the resources section.
  2. Next, the algorithm iterates through each found object to find similar words. The find_closest_embeddings() function will return the closest words. It then saves each of these objects into a directory. A directory is chosen to track each similar word and its corresponding object.
  3. Finally, this is wrapped in a try-except to handle the case where the user has 3 objects. If the algorithm does not find similar words for object 2, it will skip object 2 and continue as expected.
def find_closest_embeddings(embedding):
    return sorted(embeddings_dict.keys(), key=lambda word: spatial.distance.euclidean(embeddings_dict[word], embedding))

Lyrics Genius API

After this, our algorithm starts searching for songs by the selected artist. Lyrics Genius wraps the genius.com API and returns JSON information about the artist.
The LyricGenius file is divided into 2 parts.
Part 1:
  1. This part contains the findArtistSongs() function. This function only runs on the first iteration of the loop.
  2. It creates a JSON file containing n songs chosen by the user. The UI limit is 20 songs.
  3. Then it saves this JSON file in the format artistFile = ‘Lyrics_’ +artistFileName + ‘.json’. This is done to ensure that there are no errors when trying to open the file regardless of user input. It seems safer than using the default method since we cannot predict user input.
  4. Additionally, search.save_lyrics(artistFile,overwrite=True) is also used. Setting overwrite=True is very important to eliminate the receiving overwrite verification message. This verification will prevent the algorithm from running.
  5. Next, the file is moved to a folder. You cannot perform this action in the .save_lyrics() function because it will not place the file in the expected folder.
  6. Then search the newly created JSON. Then loop through the songs and lyrics and save them to a CSV file.
def findArtistSongs(api, artist, maxSongs, sortBy):
    try:
        #search = api.search_artist(artist, max_songs=maxSongs, sort=sortBy) ### Select songs based on ASC order of song name
        search = api.search_artist(artist, max_songs=maxSongs) ### Random selection of songs

        artistFileName = re.sub(r'[^A-Za-z0-9]+', '', artist) ### Removing all alphanumeric characters from string
        artistFile = 'Lyrics_' + artistFileName + '.json' ### Lyrics file name used instead of default to ensure consistancy of file names when weird characters used
        search.save_lyrics(artistFile,overwrite=True) ### Creation JSON file overwrite=True overides JSON with same name
        shutil.move(artistFile, "outputLyrics/"+artistFile) ### Moving file as a personal perference so individuals can see JSON on git rather than deleting it
        print('JSON Created ...')

        Artist = json.load(open("outputLyrics/" + artistFile))  ### Loading JSON file

        ### Looping through each song while calling the collectSongData function
        for i in range(maxSongs):
            collectSongData(Artist['songs'][i])
        updateCSV_file(file)  ### Updating CSV calling updateCSV_file function

        return artistFile
    except:
        artistFile = 'Timeout: Request timed out: HTTPSConnectionPool'
        return artistFile
Part 2
  1. For all iterations of the loop, the algorithm will hit this part. It first loops through each element.
  2. Each element will be passed to search_csv_file(). This function will then search for the element in each long lyric.
  3. If the lyrics are found, they will be saved to the directory. The directory key is saved as the song name, element search, and lyric line number. This information is saved in a directory so that it can be displayed in the UI and contains the required information.
def search_csv_file(file,currentElement):
    with open(file) as f_obj:
        reader = csv.reader(f_obj, delimiter=',')
        for line in reader:  ### Iterates through the rows of your csv
            allSongs= line[1].splitlines() ### Split entire song into new lines to be read 1 by 1
            for i in range(len(allSongs)):
                if currentElement in allSongs[i].lower(): ### Searching for element and making all lowercased for searching
                    print('Song: ',line[0],'| Lyrics: ',allSongs[i])
                    keyName = line[0] +'_'+currentElement+'_'+str(i) ### Setting key to be unqiue
                    allLyrics[keyName.upper()] = allSongs[i] ### Setting key and setting it to uppercase personal preference
    return print('Element Search completed...') ### Indicates element search completed

Main Function

As expected, when creating this project without a UI, the main function handles all calls to this function. The main content includes informative statements. Thus, for informational purposes, it has multiple if/else statements.
I found it important to highlight how information is passed to the LyricGenius file.
for i in range(len(AllItems)):
        ### Eliminates creating the same JSON file n times
        if i == 0:
            artistFile = lyricsGenius.findArtistSongs(api, artist, maxSongs, sortBy)
            if artistFile == 'Timeout: Request timed out: HTTPSConnectionPool':
                break
        for j in range(len(AllItems[i])):
            currentElement = AllItems[i][j]
            print('\nSearching for element:', currentElement)
            lyricsGenius.main(currentElement)

Streamlit

Streamlit makes deploying applications very easy. However, this application is not dynamic. Handling this information is a bit tricky because the buttons and UI elements do not have the necessary files built-in with streamlit. Therefore, the requirements of the fields are handled by if/else statements. The if/else statements also handle visible UI validation.

Deployment

To deploy your application, you need to register for Streamlit and enter the invitation queue. It’s quite fast. I did have to use an external system like Dropbox to handle loading larger files that I do not have on GitHub so that this application can display as needed.
Download 1: OpenCV-Contrib Extension Module Chinese Version Tutorial

Reply “OpenCV Extension Module Chinese Tutorial” in the “Beginner's Visual Learning” public account to download the first Chinese version of the OpenCV extension module tutorial, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters of content.

Download 2: Python Vision Practical Project 52 Lectures

Reply “Python Vision Practical Project” in the “Beginner's Visual Learning” public account to download 31 practical vision projects including image segmentation, mask detection, lane line detection, vehicle counting, adding eyeliner, license plate recognition, character recognition, emotion detection, text content extraction, and face recognition, to help quickly learn computer vision.

Download 3: OpenCV Practical Project 20 Lectures

Reply “OpenCV Practical Project 20 Lectures” in the “Beginner's Visual Learning” public account to download 20 practical projects based on OpenCV to achieve advanced learning of OpenCV.

Discussion Group

Welcome to join the public account reader group to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually be subdivided in the future). Please scan the WeChat number below to join the group, with remarks: “nickname + school/company + research direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format for remarks, otherwise, it will not be approved. After successful addition, you will be invited to enter the relevant WeChat group based on your research direction. Please do not send advertisements in the group, otherwise you will be removed from the group, thank you for your understanding~

Leave a Comment