There are a number of ways in which the data at this site can be applied to help reveal patterns in the way the music business changed during the 20th century. Externally we have seen analysis of the length of popular songs and the beats per minute (at //www.docstoc.com/docs/19643220/Trends-i n-BPM-of-Popular-Music-1900-2009). We have also done some calculations ourselves, for example estimating historic album sales. This page contains some results that we have found interesting but don't have an obvious place within the rest of the site. The topics discussed here are:
As with all the complex calculations described on the site you can decide to try a different approach. If your analysis shows something interesting tell us about it.
The fact that we have a listing of the 9,200 songs that achieved the greatest success in every year from 1900 to 2014 means that we can see how the choice of words changes. This reflects the names of songs that became hits, not necessarily the names of songs written.
By far the most common word in song titles is, as one could have guessed, "love", There were just two decades where another word was in more titles, the 1900s (where the words "old" and "sweet" were more common) and the 1920s (where "blues" was the word of the decade). Since the 1970s the word "don't" has been the second most common. A different word in the third slot every decade "night", "heart", "baby" and "girl" so far.
Looking in more detail we can see that the popularity of "love" was growing for most of the century, by its peak (say 1973 to 1996) it was so dominent that it could halve in size (by 1998) and still be the number one word.
Between 1945 and 2010 the average length of song titles went from 4.5 words to 2.5 words. The trend towards ever shorter titles is driven by the number of songs with single words going from 7% to 23% over the same period.
One interesting question is whether the chart data here can help to illuminate any long term trends in chart success. For example are songs getting longer?
Obviously this data can only address the topics that it measures, that is the popularity of songs, artists and titles. The distribution of chart entries is clearly tied closely with the years being considered, so as with the determination of the most charted songs we have to compensate for that. When considering sales we have to find a way to estimate historic markets.
Once the scores have been adjusted to give each year an equal chance of producing the top songs it is interesting to see which years they actually come from. The plot above takes the top 6000 songs and shows the years that they came from, the top 500 in the far row, 501-1000 in the next and so on down to the positions 6001-6500 in the nearest row. The years 1900-1920 have been removed because the charts are too sparse to be valid.
Clearly the years 1940-1945 were not good years for general musical success, a tiny number of songs got all the attention and no-one else got anywhere, we guess people were focused on some other priorities. The absolute peak point seems to be 1965-1973, but in fact the whole of the period 1957-1989 is fairly high up. The start of the era clearly relates to the success of some key artists in the 1950s, in particular Elvis Presley, we suspect that most people would agree that 1956 is the real start of modern music. But what about the end at 1989? Here are three different possible reasons why the peak drops off after 1989:
Given the way the calculation is made the second choice seems unlikely, which of the other two you pick probably says more about your own biases than the charts.
As can be seen elsewhere the volume of source data increases dramatically as the 20th century progresses. The plot above also shows how the "average" score changes. We have ignored the top 5 songs since the presence of a "White Christmas" or "Over the Rainbow" could radically skew the results from a particular year. The blue line shows the average of the scores for the songs in positions 5-10. In fact the ratio between the scores and the number of entries gives a good indication of how "universal" musical tastes are, in years where all markets are buying the same few artists the scores will be high, when each market has its own tastes they will be low.
If we plot this ratio through the 20th century there are some interesting correlations with the key events and trends:
There are a number of overall statistics that the charts reveal, for example the chart above shows the number of artists with a given count of hits. About half of all the artists listed have just one song that became a hit. The curve then drops off with only a tiny proportion having more than 10 hits. There are many different curves that could fit this profile, which one fits best? The way to answer that question is to change the vertical scale from a linear one to a log one.
Using this scale a geometric progression would align the points in a straight line. Clearly the points fit some other smooth curve. We tried using an inverse function (1/N), that is the Zipf numbers. However while the shape is close the relative scaling is wrong. If we alternately assume that the number of artists with N hits will be proportional to 1/N squared the curve fits almost exactly.
To us this is a surprising result, we are not sure what underlying truth it is revealing. Clearly it demonstrates that the probability of an artist's next song being a hit is dependent on how many hits the artist already has (otherwise the line would be straight).
So what happens if we look at the number of charts that each song is listed in? The dramatically different number of charts for each decade mean that we need to split the data by decade. When we do we can see that for decades where the number of available charts is small, like the 1930s, the data points fall along a strait line. So where we have few charts they seem to be sampling seperate 'pools' of potential hits. The probability of a success in one chart is not a good predictor of success in the others.
However when the number of potential charts is significantly bigger than the number being checked for the points fall along the same '1/N squared' line. In other words, as one would expect, success in one chart increases the chances of success in the others.
This seems to be a general pattern for each of the decades (within the constraints of the data available). Could this just be a reflection of the number of chart entries we have? To answer that we can look at the number of chart entries in each chart by decade...
We can see that in the plot above. It shows the number of charts with 2, 4, 8 and so on entries. When we show the sizes using this logarithmic scale there is no obvious distinctive profile. That is to say that the sizes of the charts seem to be fairly evenly scattered. If the success of a song in a particular chart was entirely down to some kind of measurable intrinsic quality of the song, that is if each chart was measuring the same thing, we would expect the number of charts a song was in to reflect the number of charts that it was a candidate for. In other words the number of charts for each song should drop geometrically. The fact that is drops by 1/N squared instead seems to demonstrate that success in any one chart is significantly influenced by previous success in other charts.
We are sure that this distinctive curve is telling us how much influence each of the charts has, but our combined knowledge of statistics is not sufficient to work it out how much. If you have any insight we'd be happy to hear from you.
The Billboard song charts that come from Bullfrog are the most comprehensive and cover the longest period of all the charts we use. One feature of these charts that we don't take account of when scoring a song is the number of weeks each track spent in the charts. If we group the songs by decade we can draw a set of curves that illustrates these values.
This plot illustrates a number of features, lets cover the three most obvious ones. First of all it is clear that before 1958 (when the "Hot 100" chart started) the most common length of time in the chart was just a single week. In this early phase the shorter chart length and the more informal ways that the information was gathered probably explains the "exponential decay" shape.
The next obvious feature is the clear spike at 20 weeks that appears in the 1990s. This comes from a decision that Billboard took in 1991 to exclude most songs after they had spent 20 weeks in the charts (these songs are moved to an alternate chart called "Hot Singles Recurrents".
The final clear pattern is the way that, since the 1960s the songs have progressively had longer and longer times in the chart and consequently the number of songs in any one year has declined.