What aspects of our movies are more interesting for spectators?
How do they talk about these aspects?


Movies are considered to be powerful instruments to influence public opinion: directors and movie production companies are actors in the process of formation of people’s thinking about huge themes such as migration.
We decided to analyze how people express their opinions about movies in a text format. In order to to that we selected a textual corpus, composed by the ten most appreciated Imdb reviews written about the first ten movies of our list.
The sentences which compose each comment were divided in two categories: technical-related and topic-related. After this first clustering, two different analysis have been done. The first one is about the technical aspects and the other one is about the topic-related aspects.


1. Focus on first 10 IMDB comments written about first 10 movies of our list 2. Comments sorting criteria: “best” (more appreciated by users)
3. Download the 100 comments pasting them into a txt file
4. Reading the comments subdividing the text corpus in two groups: sentences about technical aspects of the movie and topic-related aspects.
5. Counting the amount of characters of each one of these two groups.
6. Create a dataset in excel composed by three columns: title
b.number of characters referring to technical aspects
c.number of characters not referring to topic-related aspects
7. Creation of a bar graph using Graph tool in Illustrator.

How to read it

The purple areas indicate the amount of characters regarding technical aspects of the movies written by users while the blue areas refers the topic-related aspects.


1. Reading the previously-subdivided parts of the comments which are about the technical aspects of movies in order to find out a list of categories (casting, photography, soundtrack and direction) to clusterize these sentences.
2. Creation of an excel file composed by 10 spreadsheets (one for each movie).
3. A single spreadsheet contains data about how much characters are dedicated to each of the four categories and about the positive, neutral or negative opinion.
4. Creation of a series of glyphs in Illustrator: each one of the four wedges refers to a category (casting, photography, soundtrack or direction). 5. Each wedge is subdivided into 30 sections: the smallest triangle area refers to an amount of 100 characters while the biggest one refers to an area of 3100 characters.
6. Colorizing the triangles of each section referring to the excel dataset according with the opinion (neutral, negative or positive: white, red or green) with 20% opacity.

How to read it

These glyphs show how and how much IMDB users talk about technical aspects of a movie: the intensity of a certain area indicates that a lot of characters have been written about that argument (casting, photography, soundtrack or direction) and viceversa.
Furthermore, the tint you perceive looking at a certain area indicates how Imdb users spoke about it: in a more neutral, positive or negative way.

1. Reading the previously-subdivided parts of the comments which are about the topic-related aspects of movies in order to find out a list of categories (plot, opinion and theme) to clusterize these sentences.
2. Creating a txt file composed by all the topic-related sentences previously individuated.
3. Opening this txt file in Yoshikoder.
4. Discover what words are the most recurring thanks to the command report-> count words.
5. Selection of the 15 most recurring words considered to be interesting in order to see how users speak about topic-related.
6. Creation of one pattern (dictionary->add category->add pattern) for each selected word. Where necessary, the pattern has been created typing just the prefix of the selected word followed by an “*” (example:”immigr*”) in order to find more words in the txt.
7. Creation of a concordance (concordance->make concordance) for each pattern.
8. Excel exportation of each concordance.
9. Creation of an excel file containing every concordance.
10. Reading of each concordance in order to categorize them in the three categories (plot, opinion or theme) and in order to declare if each one speaks in a neutral, positive or negative way.
11. Creation of a scatterplot in Illustrator in which the diameter of every circle depends on how much concordances have been found on that argument (plot, theme or opinion) related to a certain word or prefix (example: “crim*”). On the other hand, the color of every circle depends on the how users spoke about that argument (in a positive, neutral or negative way).

How to read it

This visualization is about how and how much IMDB users speak about topic-related arguments. From the left to the right, selected words or prefix have been disposed in decrescent order.
From the top to the bottom, we can see the three selected categories (plot, theme and opinion): these are disposed in descendent “personal contribution” order: “plot” is the category referring to what have been said about the storyline. This category is characterized by a low level of “personal contribution”.
The opinion category includes the concordances in which the user gave a opinion about how a certain movie narrate about one or more themes. It is characterized by a medium level of “personal contribution”.
The third category is called “theme” and is characterized by a high level of “personal contribution”: this group contains all that concordances in which the users go beyond the critic of the movie in order to talk about issues and important themes giving (explicitly or not) their personal points of view.


Looking at the first visualization, we can see that Imdb users are more interested in speaking about topic-related aspects than technical aspects although Imdb is recognized as a website for experts in cinematography.
Talking about how people speak about technical aspects, we can see that the major part of attention is given to the casting. This indicates the spirit of observation users have in looking at the actors and the power that a movie with a good casting has in conveying messages. For example, Gangs of New York is the movie in which users commented the casting the most: the casting of this film is composed by stars like Leonardo di Caprio or Cameron Diaz. Furthermore, the colors indicates that comments about technical aspects are quite controversial: green is dominant, but it is often mixed with white or red.
On the other hand, looking at the third visualization, comments about topic-related aspects seem to be less disputable. The more coherent dominance of green color shows that users are almost always in agreement.
Furthermore, third visualization indicates the power of cinema in terms of communication and opinion-catalysts because the “theme” category (the one composed by sentences containing users personal points of view) is bigger than the other two ones.
Is interesting to point out the “9/11 case”: no one of the movies selected for this analysis are about 9/11, but the number “11” appears in the comments and, when it appears, is always related to “9/“ and, furthermore, it is always classifiable in the “theme” category which is the one containing the sentences in which users declare (explicitly or not) their personal points of view.