November 03, 2021

Generating Text from NYTimes Headlines

fall-2021itpmarkov-chainp5jsprogramming-from-a-to-ztext-generationweb-experiments

This week, I tried using the New York Times’ Most Popular API to generate headlines via Markov Chain methods. The API returns abstracts from the 20 most viewed articles of nytimes.com in the past 30 days. I used the text generation method from shiffman.net/a2z/markov/.

I used these abstracts as “seeds” for generating texts. I tried this experiment before actually applying a Markov chain text generation method on my previous project, WordEater.

The results are as below:

link to the p5.js sketch

These are parts of the results:

Running p5.js sketch to generate words from today’s NYTimes headlines

Takeaways

Since the original text data – abstract from 20 articles – was not enough, I could see a lot of the resulted text was similar or same with the original headlines.
I tried to solve this issue via reducing the N-grams and length of text.
After reducing, I could see that there were more ‘original’ text but some of them made less sense. Also, reducing the length of text made the sentences feel somewhat truncated.
Most important lesson: get more data if you want to make a plausible model.