The first thing I tried for this project was using the RegexpParser to chunk select grammatical structures (VB TO VB in particular). The example I found used the corpus brown tagged corpus sentences, which conserves the sentence structure even though it is the words that are POS tagged. This makes it easier to parse longer grammatical structures like VB TO VB in context. So I spent quite some trying to tag my own text in a similar form but I was unable to maintain that [(sententence with tags)] structure.
So instead I used CFG to generate some sentences from Kafka’s Metamorphosis in a way similar to what we had done in class (generative-grammar. py), but I wanted to get rid of all the lists/dictionaries. So I used template sentences instead, like the ones we used for our Mad Libs sketch. I wanted to see what could be generated if these sentences where put together.
I used four template sentences: one with (adj, noun, verb to verb), (det, noun, verb, adj), (prep, det, adj, noun), (adj, adj, noun, verb, adj); and joined them together to produce a text. Unfortunately, the final product made no sense at all. Clearly my template sentences need more work.
The intention was to spend some more time splitting the original text into sentences, pos-tagging those sentences, storing those pos-tag sentence structures into n-grams to markovify that. The n-gram sentences could then be used to produce a text that hopefully would have made more sense. (Maybe this can be the workflow for the final?)
One of the main issues I had (aside from the fact that the text makes no sense) was being able to generate these different sentences from different templates inside a loop. The position of all indents and fors and ifs is very important, and because the template sentences can be replaced by the n-gram sentences, I intend keep working on that code.