A data visualization of Bollywood
India is home to the world's biggest film industry, Bollywood. Based in Mumbai, this industry produces almost half of the movies made in India. It has some of the world's biggest movie stars like Amitabh Bacchhan and Shahrukh Khan. With close to 200 movies made every year, it is the largest film industry in terms of movies produced every year.
From the first movie, "Raja Harishachandra" produced in 1913, Bollywood has come a long way. From producing mostly mythological movies in its origins to producing superhero scifi movies like Ra.One, Krrish and crime thrillers like A Wednesday amongst various other genres that have now began to be made. Progress in technology, changing Indian culture and increase in young educated population has led to numerous new types of movies being made while other types being discarded.
Below is a set of visualizations I did as a way to look into how Bollywood has evolved through the years. The data is since 1920 upto 2015. The first graphic focuses on quantitative analysis i.e. number of movies made in a year and the highest grossing movie of that year. The graphics after that form a set of qualitative analysis of how the genres of movies in Bollywood have changed since 1920. I have tried my best to mention wherever I thought the data maybe misleading (mainly because of lack of data) and have made efforts to show whatever patterns I found in the best way possible. That said, I do suck a lot of times, so apologies for some places where I might have messed up. If you feel like this graphic misrepresents/underrepresents anything or have any form of feedback, you can totally rant about it to me. Details about me are at the bottom of this page.
Lets talk numbers
The following graphic shows the number of movies made each year since 1920 on a colour scale. I took the data from Wikipedia's pages on list of Bollywood movies made each year. This data may not have all the movies released that year. It has only the ones listed on Wikipedia.
How to read this visualization:
Each square represents one year. The darker/more saturated the colour, the more movies were made during that year. The years are arranged such that they are increasing horizontally. Hover over a year to get details about the number of movies made and the poster and name of the highest grossing movie of that year (this is available only for movies since 1940 due to lack of data).
How I made this visualization:
I scraped the data from Wikipedia's pages on list of Bollywood movies released every year since 1920 using Python's requests library. Then I cleaned the data using Excel and Python to get it to the format as per my requirement. I then used D3.js to make the visualization. The posters were taken from themoviedb's api and Wikipedia.
Lets talk genres
Bollywood, like any other industry in the world, has been continuously changing. The genre of movies being made has changed quite a long since 1920. Some new ones have popped up while others have vanished. The importance of these visualizations is not just the fact that they show us trends in Bollywood, they show us trends in the Indian society. Due to its large audience, the changes in Bollywood's movies can be taken as strong indicators of the emotions, beliefs and culture of the nation as a whole. The types of movies people like are those which connect to them. So that thought should be kept at the back of one's mind while reading these graphics.
One problemwith the following graphics is the period between 1965-67 for which the data for the genre was very sparse (data for only 5 movies available sparse) and hence it shows either a very low value or a very high value. As to where this period is represented, it is in the middle portion of the bar like so:
How to read this visualization:
The following graphics are to be read from left to right with leftmost being 1920 and rightmost being 2015. The thin rectangles represent one year with the colour darkness being indicative of the percentage of movies of that genre from total movies of all genres. This is not total movies as explained above but total of all genres as one movie may fall under multiple genres. Hover over a rectangle to see which year it belongs to.
During the 1920s and 30s there were lots of Devotional movies made. But as we will see below, after the independece, Bollywood saw Romance and Drama films taking up the major chunk of the movies being made and Devotional and Religious movies did not appeal to viewers anymore.
The romance era of Bollywood continues till today but it certainly reached its peak during 1990s and 2000s with movies like "Dilwale Dulhaniya Le Jayenge" and "Aashiqui" bringing in the big bucks with good (relatively) storyline leading to more and more filmmakers involving Romance in their movies. Also notice the missing data from 1965-1967 I talked about above.
The advent of Romance movies also saw an advent of Comedy movies because of the fact that there were many Romantic Comedies being made as well during that time so filmmakers also took to focussing on core Comedy movies.
Mythology and Historical movies were a huge part of Bollywood during 1920s to early 40s. The history of the British Raj and our religious texts were an important part of the everyday lives of Indians. This led to Bollywood focussing on those subjects in heavy volumes.
As we notice from the graphic, crime, action and thriller movies have always been in the major movies being made. Early movies focussed on Romance and action with the hero fighting the villain, but serious crime thriller/action movies like Baazigar and Satya being huge hits in the 1990s helped increase the action/thriller genre even more in the exclusively crime category(ofcourse the unnecessary addition of the romance aspect happens quite often but I'm ignoring the unnecessary ones for now).
We love Drama and it shows in our movies. The dramatic aspect of early Bollywood movies has always been famous for its extravagant emotions and over-the-top acting. But that said, there have been numerous great drama movies coming out of the industry and the number of good movies over bad ones has only increased.
How I made this visualization:
To make these graphics, I went back again to Wikipedia and took the genre data out of the lists using Python's requests library. A lot of genres were similar and have been combined, for example- Mythology, Religious and Costume(Yes, that was a genre too!) have been combined as they were similar kinds of movies. After that I took the percentages of the particular genre from all the total movies of all genres made. This is greater than all the movies made in a year as one movie may fall under many genres. Hence I took percentages using a sum of all genres and not total movies. Then I mapped the percentages onto a sequential colour scale with higher percentage being darker and vice versa. Then it took a bit of D3.js code to get to the final graphic.
About Me
Hi! My name is Manas Sharma. I am an undergraduate student at the Department of Design at Indian Institute of technology, Guwahati. I am interested in data visualization right from the initial phases of designing it to coding it. I also like sketching and playing/listening to music. Thanks for stopping by!
Find me on my behance profile or email me at- sharma.manas271196@gmail.com