No One’s Making Blockbusters: An EDA of IMDb Data

Edwin Chalas
6 min readDec 3, 2020

Asking the real questions

This EDA takes a look at data from IMDb to analyze trends in movie and TV genres and subgenres over time. Specifically, I’m using the “title.basics” dataset from IMDb, which was obtained in October of 2020. I have also imported a dataset of the highest grossing films globally, compiled by Aaron O’Neill on Statista. (https://www.statista.com/statistics/1072778/highest-grossing-movie-annually-historical/)

Some core questions I’d like to answer in this EDA:

  • What have the most popular (“popular”, in the context of this EDA, meaning plentiful) film genres and subgenres been in the last 60 years? How have the most popular genres changed over the decades?
  • Do the highest grossing films globally give us an idea of genre trends? In other words, did films like “Titanic” lead to a resurgence in romance films?
  • Similarly, what have the most popular TV genres and subgenres been in the last 60 years? How have these changed over the decades?

To create a nice time window for the data, I’ve limited my analysis to only films and television from 1959 to 2019.

Movie graphs

Analyzing the movie graphs

A couple of things I notice in the genre data: — Documentaries overtaking other types of films starting in the mid-2000s. There could be a ton of reasons for why — the rise of the internet, more commercially available film equipment, the success of films like Bowling for Columbine…

  • though there’s a lack of drama films being the highest-grossing once the 2000s hit, films of that genre are now the second most plentiful.
  • by contrast, though action films have been consistently the highest-grossing, since at least the 80s, these films are not very plentiful. This may be because blockbuster films (which are most often action films, and are usually the highest grossing) are usually high in production costs — making it difficult to produce a high quantity of them.
  • a dip in comedies, dramas and documentaries around 2012–2014, which I have no explanation for.
  • a bump in biographies around that same time — which I also can’t really explain.

A couple of things I note in the subgenre data: — The extreme rise and fall of documentaries between the late 2000s and mid 2010s. I have no idea why that is.

What I notice in the heatmap: — The heatmap reaffirms my observations in the genre data. Dramas have been consistently popular over time (being a higher proportion) but are slowly becoming a smaller proportion overall. Documentaries are becoming a larger proportion of films — in the 2000s and 2010s, you can see the shade of blue becoming lighter. Action and adventure movies, by contrast, are becoming a smaller proportion of films overall — the shades of blue are becoming darker. Horror films and thrillers are becoming slightly more plentiful.

Now, let’s look at the data for TV.

TV graphs

Analyzing the TV graphs

Things I notice in the genre data:

  • The giant gap between main-genre drama and comedy TV and literally everything else — wow!
  • Comedy TV shows have consistently been more popular than films, especially with their peak now. Also, I think it’s interesting that they began to overtake dramas in the 2010s — why this is I do not know.
  • The sharp decline and only recent uptick in romance TV — interesting.
  • The death of westerns after the 1960s — this can be seen in the heatmap.
  • The rise in reality TV in the 2000s (which makes sense, as shows like Survivor and the Bachelor got their start then), along with the rise of news shows in the 90s (again, makes sense as networks like CNN and Fox News started the 24 hour news cycle), and the rise of Talk shows in the 90s thru the present (Oprah and the like hitting their cultural peak during the decade)
  • The peak of dramas in the 1970s and 80s — makes sense, as shows like Dallas were at their cultural peak.

Some notes (or, how I learned to give up problem solving and love the bugs)

  • I wanted to filter out the less popular genres from the trend graphs; I ran into errors with columns in the datasets not being vectors, and with “w” in fct_lump. So yeah sorry bout that :(
  • IDK why the bar graphs besides the first movie genre one aren’t actually organized by n — I’ve used the same code in all four :(

Answering the real questions

  • What have the most popular (“popular”, in the context of this EDA, meaning plentiful) film genres and subgenres been in the last 60 years? How have the most popular genres changed over the decades?

The most popular film genres are dramas, documentaries, and comedies. Action films have slowed in quantity while documentaries have grown, leading to the current top 3.

  • Do the highest grossing films globally give us an idea of genre trends? In other words, did films like “Titanic” lead to a resurgence in romance films?

Not really. The highest grossing films in the past few decades have been action movies — presumably, blockbusters. However, action is a distant fourth in terms of quantity of films produced. Comedies and documentaries have no dots on the genre plot — but they’re in the top three genres! However, to answer the more specific question: more romance films HAVE continued to be made after “Titanic” (the purple dot in the subgenres plot).

  • Similarly, what have the most popular TV genres and subgenres been in the last 60 years? How have these changed over the decades?

Comedies and dramas have from the jump been so much more plentiful than other TV — and that’s still true. Starting in the mid-2010s, however, talk shows have jumped up and joined as a distant third. That’s just genres, tho — subgenres have had a lot more activity and change. Romance, drama and talk-shows have at various points been the top subgenre, with the top spot changing frequently over the decades. No other subgenres come close!

Data Citations

IMDb. (2020). title.basics.tsv.gz. https://datasets.imdbws.com/title.basics.tsv.gz

O’Neill, A. (2019, November 20). Highest grossing movie worldwide, annually 1915–2020. https://www.statista.com/statistics/1072778/highest-grossing-movie-annually-historical/

--

--