Idea: Take a podcast (with transcripts), analyze the language in it, and display visuals to match up with the audio.
Why: People keep telling me to listen to all sorts of different podcasts, but I usually just drift off and stop paying attention when I try to listen to one.
Slightly More Details: I specifically used The Adventure Zone (TAZ), a Dungeons and Dragons podcast, because it has a large and active fan base (even more details in the Process section). Also tabletop games in general tend to involve a lot of describing things with words and leaving the visuals up to the players’ imaginations.
Process: I actually managed to use quite a few different techniques that we learned in class! I shall talk about them in the order in which they appear in the code.
- Regular Expressions: In the transcript, there were a lot of useless things, mostly a lot of interjections of “umm” and “uhh” as well as square brackets demarcating actions such as [laughter]. I used regular expressions to remove all of these before doing anything else with the text. I also used them to separate when someone was talking normally v.s. speaking while roleplaying their character.
- NLTK (POS stuff): analyzePrepositions() is one of the most important functions in the entire program. I iterate through every line in the transcript that isn’t explicit roleplaying and pos tag it. If there are prepositions in the sentence, then I try to figure out the noun before it and the noun that follows it (ex. “You manage to hoist this wolf into the air” extracts out ‘wolf into air’). All of these nouns then have a staging object created from them with self.name (the first noun), another object that it is located relative to (the word following the preposition), and where it is relatively (the preposition ex. ‘above’). Each of these objects is then stored in a dictionary.
- API (Tumblr): I use the tumblr API in order to find images to represent the staged objects. Originally, I tried using twitter, but most posts there were text based. findMedia() takes in the noun and the part of speech and then searches tumblr for posts that are tagged with that noun. I get the url of the photo from the API and then use url lib to download it.
- JSON: I dump all the info from my staged objects into a json and load that json into Processing. Wow, learning how to use json in Processing was intense. There are a lot of json specific methods and a list of json objects is its own special json array type. Most of my time in Processing was spent trying to figure out how to use jsons instead of working on visual stuff 🙁
- Processing: Processing pretty much just reads data from the master json. It draws the images using the file location reference from the json to the co-ordinates stored in the json. The majority of the code here is just making sure that the index references are correct.
- you can see that it processes ‘crate near center’ the dog is in a crate so it sort of makes sense, but also is a bit silly.
Various Setbacks: The visualization itself was the hardest part. I had planned to use a library for p5 in python, but the library is still in development and didn’t support critical functions like image() and text(). Figuring out when to draw what on the screen was very difficult for me to wrap my head around. Right now I’m going for a rough amalgamation of math based on the number of lines in the transcript, number of seconds in the podcast, and the frameRate/frameCount in Processing. Verbs are much harder to derive meaning programatically than nouns. JSON is simultaneously a very elegant way of storing data, but a little difficult to get started with as a complete beginner.
Current Bugs: Some objects seem to be missing. There should be 6 of each timestamp in the json, but not many have the full 6, and some are totally missing. findMedia() stopped working for proper nouns. Need to sync audio and text/images, though this is likely related to the missing objects. Images overlap a lot, might help to do the relativity in processing and take into account the image size for proportions.
Number One Feature Improvement: I originally only intended to use tumblr to get images for TAZ specific nouns (basically all of the proper nouns) and use google images for more common place nouns. Tumblr can have some… weird images for certain search terms, and other times the relationship between the picture and the tags is just dubious at best. Splitting up findMedia() like this would lead to more consistent and less nonsensical results. Unfortunately, I didn’t really have the time to learn another new API.
What I Learned: The tumblr API, a deeper understanding of dictionaries probably, storing object data from Python into JSONs, downloading images from websites with code, importing JSONs into Processing, getting data from JSONs in Processing. JSON. JSON. JSON.
Conclusions: It’s actually quite hard to say if my project met my expectations or not. On the one hand, I ran into a lot of issues and the final result is not nearly as comprehensive as what I was imagining in my head. On the other hand, pretty much everyone that I told my idea to said it was either “really difficult” or “quite ambitious” (both essentially meaning I was in over my head, with varying levels of encouragement), and I’m honestly kind of amazed at how far I made it in. This is the most code I’ve ever written for a single project, and even though it’s not extremely well polished, the bugs are very few. Also see the above What I Learned section. I feel a lot more confident in programming after this project (and also comment my code a bit more).
Interesting Stuff To Consider: My intention with this project was to create a tool to try and visualize podcasts. However, while I was programming analyzePrepositions() and trying to figure out how to determine where objects were in relationship to one another, I started wondering about how my project could work as a tool for directors and actors to use while staging a play. I’m curious to see how an implementation of my project that goes through a script and visualizes according to the stage directions would work.