Wednesday, December 30, 2015

Lego Pricing: For which partnerships do consumers pay the most?

As for most parents out there, a Lego set was part of the Christmas wishlist, and I found myself in front of an impressive display of options. As I was looking through the boxes, I noticed something while comparing these two boxes:

Extra information: the two sets are identically priced at $39.99....
So it's not completely easy to spot on the boxes, but despite the same price, the Ninjago box contains twice as many pieces (575) as the Frozen one (292). Ninjago is a line of set produced by Lego and therefore owned by Lego, while Frozen is the result of a partnership with Disney. Quickly scanning the different boxes, I seemed to some trend there: similarly priced sets appeared to have fewer pieces for themes that were the result of external as opposed to internally derived.

Having some free time on my hands during the Christmas break, I extracted as much data as I could for the LEGO site, pulling for each set the theme, the number of pieces and the price. I was able to identify close to 700 sets which provides a reasonable size for exploring some trends. Here are all the data points with number of pieces on the x-axis and price on the y-axis, and some jitter was added but not particularly necessary (prices tend to take discrete levels but not number of pieces).

A few observations:
  • the data is densely concentrated around the origin, the outliers on the scale make it hard to determine what exactly is going on there
  • there appears to be quite some variability in number of pieces for a given price point, which confirms my initial impression from the Lego store. Looking at the $200 vertical line, we see that there are boxes at that price with fewer than 1000 pieces, and others with over 2500!
  • overall, the relationship seems pretty linear along the lines of pieces = 10 * price, every $1 gets you about 10 pieces. I was more expecting a convex shape where each incremental piece costs a little less than the previous one, similarly to Starbucks where the larger the drink, the better the size-to-price ratio). I guess this can somewhat make sense: with food/drinks, two one-size units are equivalent to a two-size unit (if a gallon costs too much I'll just buy two half-gallons), but two 300 pieces Lego sets are not equivalent to a 600 Lego set, and so I guess Lego can afford maintaining the linear relationship.
And if you're wondering about the two data points in the upper right corner:
  • at 3808 pieces and $399, we have the tough-to-find Star Wars Death Star
  • at 4634 pieces and $349, we have the Ghostbusters Firestation (to be released into 2016)

Let's focus a little more around the origin where most of the data resides (92% of sets are priced less than $100):

Along the x-axis there appears to be a category of sets (green dots) consisting of just a few pieces but priced incredibly high. These are actually of the Mindstorm category. They are actually very sophisticated Lego pieces allowing you to build robots containing touch / light sensors that are sold separately at high price points. In the rest of this post, we will exclude the Mindstorm category, as well as the Power Functions category for the same reason. The Dimensions category was also excluded given that the pieces, while not as sophisticated as for Mindstorm and Power Functions, were quite elaborate based on their interaction with the Playstation console (average pieces-to-price ratio is about 3).

There appears to be another category with it's own specific piece/price relationship (red dots). While overall it seemed that every $1 was equivalent to about $10 pieces, this category seems to have a steep $1 for 1.5 pieces. This is actually the Duplos category for younger children, and the pieces are much larger than regular Legos. That being said, I'm wondering if Lego isn't taking advantage of all the parents eager to give their toddlers a head start in the Lego environment... Duplos are also thrown out for the rest of the post.

Back to our original question, how do the different themes compare to each other, and is there a price difference between internal and external brands?
The following boxplot provides some insight in the pieces-to-price ratio within each category. I've sorted them by decreasing median (higher median is synonym with a 'good deal', many pieces for every dollar). I've also color-coded them based on whether the theme was internal (red) or external (blue) to Lego.

Glancing at the graph, the two main take-aways are that:

  • there is strong variability within each category (in Star Wars for instance, the Troop Carrier set has 565 pieces for $40, while Battle on Takodana has fewer pieces (409) for a 50% higher price)
  • there does nonetheless seem to be a trend that internal themes have a better pieces-to-price ratio
We can try to explore the difference between the two types of themes via a linear regression, with a different slope and different intercept for each type:

The conclusion of the regression analysis is that the slopes for the two lines is not statistically significant (9.67 pieces/$ for external brands, 10.15 pieces/$ for internal brands), but there was a significant difference in intercept (50 fewer pieces for an external brand at the same price).

So in summary, don't feel Lego is completely overpricing you Disney Princesses or Star Wars figurines although there is a small pricing difference. If you do want the biggest bang for your buck, take a look at the Creator theme, and in particular here's the overall pieces-to-price winner (which I ended up getting my kid!):

Happy building!

Tuesday, December 22, 2015

2015 Summer Blockbuster review

Summertime rhymes with school break, fireworks, BBQ, but just as inseparable are Hollywood's big blockbusters. Early 2015, even as early as late 2014 we had teasers and trailers for the big upcoming wave of big budget movies, many of which sequels to blockbuster sagas. Terminator 5, Jurassic Park 4 anyone?

Starting early summer I pulled daily stats for 20 of the most anticipated summer movies of 2015, and before we enter the new year, let's see how they did.

Rating Evolution

In these two earlier posts I looked at the evolution IMDB scores after movies' releases, as well as after Game of Thrones episodes aired. In both cases we observed a trend in rating decrease as time went by, although this phenomenon was much more sudden for TV episodes (a few days) than for movies (multiple weeks / months).

Here's the trend for our 20 blockbusters, all aligned according to release date, and titles ordered according to final rating:

Again, we observe the same declining trend, although the asymptote seems to be reached much sooner than for the average movie (earlier analysis).

Straight Outta Compton clearly emerges as the best-rated movie of the summer although it did not benefit from as much early marketing as most of its competitors. Straight Outta Compton also distinguishes itself from the other movies in another way. While all movies dropped an average of 0.3 rating points between release date and latest reading (not as dramatic as the 0.6 drop observed across a wider range of movies in the previous analysis already mentioned, as if summer blockbuster movies tend to decrease less and stabilize faster), Straight Outta Compton actually improved its rating by 0.1 (this is not entirely obvious from the graph, but the movie had a rating of 8.0 on its release date, jumped to 8.4 the next day, and slowly decreased to 8.1). Only two other movies saw their ratings increase, Trainwreck from 6.2 to 6.5 and Pixels from 4.8 to 5.7, the latter increase while quite spectacular still falls way short from making the movie a must-see, despite the insane amounts spent on marketing. As you might have noticed from my posts, my second hobby is basketball, and I remember this summer when not a day would go by without seeing an ad for Pixels on TV or on websites where NBA stars battled monsters from 1970 arcade games.

Which brings us to the next question: did budget have any effect on how well the movies did, either from a sales or rating perspective? Of course we are very far from establishing a causal model here so we will have to satisfy ourselves with simple correlations across five metrics of interest: IMDB rating, number of IMDB voters, movie budget, gross US sales and Metascore (aggregated score from well-established critics).

I would have expected the highest correlation to be between IMDB rating and Metascore (based on another analysis I did comparing the different rating methodologies). However, it came at second place (0.73) and I honestly had not anticipated the top correlation (0.87) between budget value and number of IMDB voters. Of course we can't read too much into this correlation that could be completely spurious, but it might be worth confirming again later with a larger sample. If I had to give a rough interpretation though, I would say that a movie's marketing spend is probably highly positively correlated with the movie's budget. So the higher the budget, the higher marketing spend and the stronger 'presence of mind' this will have on users who will be more likely to remember to rate the movie. Remember my example of all the Pixels ads? I didn't see the movie, but if I had, those ads might have eventually prompted me to rate the movie independently of how good it was, especially if those ads appeared online or even on IMDB itself.

But while we wait for that follow-up analysis, we can all start looking at the trailers for the most anticipated movies of next year, sequels and reboots leading the way once again: X-Men, Star Trek, Captain America, Independence Day...

Saturday, December 5, 2015

Is this movie any good? Figuring out which movie rating to trust

In October, FiveThirtyEight published a post cautioning us when looking at movie reviews "Be Suspicious Of Online Movie Ratings, Especially Fandango’s".

To summarize the post as concisely as possible, Walt Hickey describes how Fandango inflates movie scores by rounding up, providing the example of Ted 2 which, despite having an actual score of 4.1 in the page source, displays 4 and a half stars on the actual page. The trend is true across all movies, a movie never had a lower score displayed, and in almost 50% of cases displayed a higher score than expected.

So if Fandango ratings aren't reliable, it could be worth turning to another major source for movie ratings: the Internet Movie Database IMDB.

Pulling all IMDB data, I identified just over 11K movies that had both an IMDB rating (provided by logged-in users) and a Metascore (provided by, aggregating reviews from top critics and publications). The corresponding scatterplot indicates a strong correlation between the two:

In these instances, it is always amusing to deep-dive into some of the most extreme outliers.
To allow for fair comparisons across both scales (0-100 for Metascore, 0-10 for IMDB), we mapped both scales to the full extent of a 0-100 scale, by subtracting the minimum value, dividing by the observed range, and multiplying by 100.
So if all IMDB ratings are between 1 and 9, a movie of score 7 will be mapped to 75 ((7 - 1) / (9 - 1) * 100).

Here are the movies with highest discrepancies in favor of Metascore:
Movie Title IMDB Rating Metascore Genre IMDB (norm) Metascore (norm) Delta
Justin Bieber: Never Say Never 1.6 52 Documentary,Music 7.2 51.5 -44.3
Hannah Montana, Miley Cyrus: Best of Both Worlds Concert 2.3 59 Documentary,Music 15.7 58.6 -42.9
Justifiable Homicide 2.9 62 Documentary 22.9 61.6 -38.7
G.I. Jesus 2.5 57 Drama,Fantasy 18.1 56.6 -38.5
How She Move 3.2 63 Drama 26.5 62.6 -36.1
Quattro Noza 3.0 60 Action,Drama 24.1 59.6 -35.5
Jonas Brothers: The 3D Concert Experience 2.1 45 Documentary,Music 13.3 44.4 -31.2
Justin Bieber's Believe 1.6 39 Documentary,Music 7.2 38.4 -31.2
Sol LeWitt 5.3 83 Biography,Documentary 51.8 82.8 -31.0
La folie Almayer 6.2 92 Drama 62.7 91.9 -29.3

And here are the movies with highest discrepancies in favor of IMDB's rating:
Movie Title IMDB Rating Metascore Genre IMDB (norm) Metascore (norm) Delta
To Age or Not to Age 7.5 15 Documentary 78.3 14.1 64.2
Among Ravens 6.8 8 Comedy,Drama 69.9 7.1 62.8
Speciesism: The Movie 8.2 27 Documentary 86.7 26.3 60.5
Miriam 7.2 18 Drama 74.7 17.2 57.5
To Save a Life 7.2 19 Drama 74.7 18.2 56.5
Followers 6.9 16 Drama 71.1 15.2 55.9
Walter: Lessons from the World's Oldest People 6.4 11 Documentary,Biography 65.1 10.1 55.0
Red Hook Black 6.5 13 Drama 66.3 12.1 54.1
The Culture High 8.5 37 Documentary,News 90.4 36.4 54.0
Burzynski 7.4 24 Documentary 77.1 23.2 53.9

Quickly glancing through the table we see that documentaries are often the cause of the biggest discrepancies one way or another. IMDB is extremely harsh with music-related documentaries, Justin Bieber has two "movies" in the top 10 discrepancies. Further deep-diving in those movies surfaces some interesting insights: For those movies, approximately 95% of voters gave it either a 0 or 10, and the remaining 5% a score between 1-9. Talk about "you either love him or hate him!". Breaking votes by age and gender, provides some additional background for the rating: independently of age category, no male group gave Justin a higher than 1.8 average, whereas no female group gave less than a 2.5. As expected, females under 18 gave the highest average score 5.3 and 7.8 respectively for each movie, but with 4 to 5 times more male than female voters, the abysmal overall scores were unavoidable.

OK, I probably spent WAY more time than I would have liked discussing Justin Bieber movies...

Let's look at how these discrepancies vary against different dimensions. The first one that was heavily suggested from the previous extremes was naturally genre:

We do see that Documentaries have the widest overall range, but all genres including documentaries actually have very similar distributions.

Another thought was to split movies by country. While IMDB voters are international, Metascore is very US-centric. So for foreign movies, voters from that country might be more predominant in voting and greater discrepancies observed when comparing to US critics. However, there did not seem to be a strong correlation between country and rating discrepancy.

We do observe a little more fluctuation in the distributions when splitting by country rather than genre, but all distributions still remain very similar. The US has more extremes, but this could also be due a result from the fact that the corresponding sample size is much larger.

As a final attempt, I used the movie's release date as a final dimension. Do voters and critics reach a better consensus over time and do with see a reduced discrepancy for older movies?

The first thing that jumps out is the strong increase in variance of the delta score (IMDB - Metascore). Similarly to the United States in the previous graph, this is also most likely a result from sample size. While IMDB voters can decide to vote for very old movies, Metacritic doesn't have access to reviews from top critics in 1894 to provide Carmencita with a Metascore (although I'm not sure it would be as generous as the IMDB score of 5.8 for this one minute documentary of a movie whose synopsis is: 'Performing on what looks like a small wooden stage, wearing a dress with a hoop skirt and white high-heeled pumps, Carmencita does a dance with kicks and twirls, a smile always on her face.')

But the more subtle trend is the evolution of the median delta value. Over time, it seems that the delta between IMDB score and Metascore has slowly increased, from a Metascore advantage to an IMDB advantage. From a critics perspective, it would appear as if users underrated old movies and overrated more recent ones. However the increase in trend has stabilized when we entered the 21st century.

I couldn't end this post without a mention to Rotten Tomatoes, also highly popular with movie fans. While Metacritic takes a more nuanced approach to how it scores a movie based on the reviews, Rotten Tomatoes only rates it as 0 (negative review) or 1 (positive review) and takes the average. In his blog post, Phil Roth explores the relationship between Metascores and Rotten Tomatoes and, as expected, finds a very strong relationship between the two:

So to quickly recap, there is clearly a strong correlation between the two ratings, but with enough variance that both provide some information. For some people, their tastes might better align with top critics whereas others might have similar satisfaction levels as their peers. Personally, I've rarely been disappointed by a movie with an IMDB score greater than 7.0, and I guess that I'll continue to use that rule of thumb.

And definitely won't check Fandango.