I came across two lists named "hottest actors" and "hottest actresses". I was thinking it would be fun to know where they are from and the age profile.
I first scratched info about their birth place from the internet appended to the existing csv file.
then I first plot a histogram of the age for actors:
The median is 1977 and the mean is also 1977(with rounding). The distribution has heavy lower tail but light upper tail. This is reasonable because of people's preference.
Then I subset people from USA and plot their birth state on the map as below:
I didn't bother binning the result because there is not so many levels. Clearly, three states have more "hottest actors". They are CA, TX and NY. 
Next for the actresses:
The median is 1980 and the mean is 1980.59. The distribution, however, is unknown. QQplot shows some granularity but I don't think this matters.
Finally, the actresses maps:
I binned the data. Not quite surprisingly, CA, TX, NY still has the largest number of "hottest actresses". Since this database is larger than the previous actor one, so we can find IL, NJ, FL pop out( I guess due to cities like Chicago, Miami, Orlando, etc)
You can find the original list on IMDB and I am not going to post it here. It is not difficult to extract birth place data from website. (Same old regular expression tricks).
And you can find some reference on how to draw spatial plot using ggplot from Here.




 
 
1 comment:
You know CA, TX, and NY are states with the largest population in USA. No wonder there are more actors and actresses in these three states. If you look at the number of actors/actresses per population. It may change the results.
Post a Comment