Each of the 208 source charts is held separately, every chart always has entries for "artist" and "title" usually they also have entries for "position" and "date" and may also contain all sorts of extra information such as "duration", "written by", "web page" or "film".
The 404,562 items in these charts are consolidated, to provide a complete set of attributes for each of the 191,362 items (135,652 songs and 55,710 albums). The most difficult aspect of this task is matching names, they are often misspelt in the source charts, punctuation is usually inconsistent and the list of "featured" artists is always in a different order. Programming a system to recognise that "Uncle Albert" by "Paul McCartney" and "Admiral Halsey" by "Wings" are actually the same song is not trivial.
In general the approach that has been taken is to consolidate entries if they appear similar, having too many false connections is usually better than missing out on them. For this reason there are quite a few places where entries have been changed to bring things together, for example all of Prince's songs are listed under the name Prince rather than splitting them into Prince & the New Power Generation, Prince (symbol), Love Symbol, The Artist Formerly Known As... and so on.
Assigning a Score
When we originally started gathering chart information we didn't want to allocate an arbitrary score to each song, our goal was to provide a list of the achievements of each record without imposing any kind of artificial order. We quickly discovered, however, that a rough indication of the importance of each item was an essential guide to presentation.
So how can each entry be compared? The most obvious option is to work out a scoring mechanism and list songs from the highest score to the lowest. That is the obvious way to do it, but is it the best way? An alternative way to order two songs might be to measure how many charts rate one higher than the other and list the songs that win the most comparisions. There might be stats gurus who can work out which of two scoring algorithms deliver the most reliable results, but we don't know any. So we created a computer model of different ways to generate random charts and measured how well different approaches did. So far the best overall approach we have found is to assign a score for each chart and add these up to create an overall measure of success.
Given that we are assigning a "score" for each item, how should this be calculated. We considered this question in three parts:
Score for each appearance
One obvious way to allocate a score would be to just count the number of chart appearances, but we felt that an entry at the number one slot is more notable that one lower down the chart. An alternative that we have seen used is to give 1 point for a 99th position, 2 for 98th and so on up to 98 points for a 2nd and 99 for a number one. Again that doesn't feel right, surely a number one record is significantly more notable that a number 2, and what about charts with 200 entries?
So an interesting question is what is the basis of the score, what are we trying to estimate? Well, if the score reflects a rough estimate of the "notability" of the song, a combination of sales, airplay and mindspace then we should adopt an approach that models, for example, sales figures.
There are two different curves that have been suggested as being good descriptions of sales, one is an "exponential decay" that suggests the Nth best selling item sells Y**N times as many as the top one, where Y is a number between 0 and 1 and ** is the "power of" operator. The other estimate is Zipf's law, which says that the 2nd best seller sells half, the 3rd one third and so on. This suggests two different ways to score a chart entry:
In fact, for reasonable selections of the parameters, both approaches deliver roughly similar results, but the parameters X and Y values do emphasise different records.
Here is a "phase diagram" (based on the 1.1 version of the data) which shows which song comes out top as different values of X and Y are selected. The black circles indicate six combinations of parameters that could each be considered reasonable. Here is a comparison of the top 10s that result for these values.
As you can see the results are roughly similar, however their differences show just how arbitrary any scoring mechanism is. So if we select the simpler algorithm, the one based on Zipf's Law, and middling parameters, we have:
This means that 3 number one hits are equivalent to 4 at number two, 5 at number five and roughly equivalent to 6 at number 100. That feels like it is fair, it is fast to calculate and gives results that are as good as any other scoring approach.
Weighing each chart
When we originally calculated the total scores from these charts we felt that the simplest way to combine charts from different locations and eras was to just add the scores together. After all "The Wisdom of Crowds" shows that having a larger number of contributing charts provides a more reliable result (provided that every effort is made to remove systematic bias). We had tried weighing charts based on "notability", but that just leads to debates about which sources to trust and market share. In addition it overemphasises regions with few entries (for example Japan in the 1990s), so before version 2.0 of the data all charts were given an equal weight of 1.
So an equal weighting was used here for a number of years, however some users pointed out a number of odd features. For example the influence of European markets in the 1990s was overwhealming the US entries from the same era. At the same time that each individual European country was getting its own chart (or at least making the charts available) the USA was focusing all chart activity on the single Neilson/ Billboard charts, so magazines like Cashbox and radio stations stopped publishing their own.
So the scores need to be scaled somehow, but scaled in a rather special way. If the scores are scaled to end up distributing "points" in some way then periods that have very few entries will assign all their points to those few songs, so the chart would be dominated by songs that were a success in places with few charts. So we calculate a factor that takes into account the size of the music market for a region and the number of existing charts (details below).
There are all sorts of ways that we could combine the scores of from different charts. The simplest approach is just to add them up. This is what we do.
The preceding approach works well for charts that have a position attribute. But what do we do for those that don't? What about:
Artists, Years, Decades and hits in Europe
In order to combine scores to rate, for example, artists, we take the obvious approach of summing the items they produced. This is the algorithm used for all the "normal" web pages.
If we are doing a special calculation (that is for one of the FAQ pages) we often adjust the scores to take into account the large number of recent charts. So, for example, when working out the most successful song or greatest song act, we employ an adjustment to normalise the scores. This is normally calculated by averaging the score of entries in the fifth to tenth positions and using the result to rescale the scores. Some experimentation has shown that this produces reasonable results.
Working out which year to assign an entry to is also surprisingly hard. The year of each song is deduced directly from the chart entries, rather than relying on some kind of unreliable external source. The year is extracted from the date in all the song's chart entries and the song's year is set to the median of these values. This usually generates a reasonable estimate of the year.
Putting it all together
Once the individual song scores have been calculated they are processed to generate the various web pages and the links between them. These are all static pages to reduce both the load on the underpowered web server and the security risks.
As the diagram shows the process also generates some summary statistics and other test data. This is used both to spot when new data has introduced issues and to simplify the task of identifying entries that need to be reviewed.
The version 1 data provided a reasonable listing of success for any one year, however a number of users pointed out that there were some strange features. For example during the 1990s the European charts were more influential than the US ones, despite those markets generating a smaller revenue. We decided to review the chart scaling factors, could we just choose a factor to make all periods deliver the same number of "points"? The scaling has to be more complex than that, the first issue is that the reason for having so few chart entries for the period from 1900 to 1920 is that there is less modern interest in the music of that era than in later years. Judging by the small size of the "music business" at the time, this disinterest reflects contemporary views. Scaling these types of years by a single factor makes it too likely that a single entry will, purely by chance, have a large enough lead over its competitors to end up with a really high score.
So let's say the scaling factor has a value of 1, up to the point where a year has more than a certain number of entries, and then the factor decreases so that a year with a really large number of entries approaches some limiting score. Tweaking the score limit and the linear limit allows us to test different approaches. Setting the score limit to twice the linear limit seems to deliver results that have the appropriate profiles.
So for any periods where there are few entries the scores have to be scaled by a factor of 1, otherwise songs that are a big success in a small market would dominate the results. But if a region has a large number of entries then these should be scaled by a factor less than 1, to ensure the influence of that region is in line with the size of the music market. So the weighing factor has to depend on the number of entries in a region for a given period. The scaling factor cannot be set by the highest scoring song in a region, that would just make every year's result flat, while there should be a chance for a big hit to have a bigger influence. The score is summed for the 20th to 100th highest scoring songs of each year in each region and this factor defines the scaling.
The charts we have are heavily biased towards the USA for the early decades and towards Europe for the later periods. One option would be to consolidate the whole world together, but that causes the 1990s and 2000s to reflect Europe's tastes rather than the world's. Another option is to consolidate entries by country and then scale the results by the size of the music market, but that doesn't work because there are too many locations that just have no data available.
So we have to pick an approach that avoids both being too broad and being too narrow. The approach we selected was to split the world into four regions, the USA (about 35%), other English speaking countries (about 20%), the rest of Europe (about 25%) and the rest of the world (about 20%). By selecting these regions and using the scaling approach we can balance out the scarcity of evidence against the regions that have so many more.
This approach does give rise to some unexpected results. When there is hardly any evidence, as for example the "rest of the world" during the 1970s, the values can be highly scaled, so a single number 1 entry in a Japanese chart causes a song to be listed in the top 100 for that year. We have adjusted the factors to ensure that this, to us clearly undesirable, result does not occur. Some entries in 1940 year charts are perhaps overly influenced by the Australian charts, but its not so clear to us that this is wrong and we've not found a simple way to overcome that manifestation.
The final process is a fairly convoluted approach that has been tuned to deliver results that seem reasonable in a wide range of circumstances. We have, of course, deliberately avoided allowing any post-calculation "tweaks" that promote or penalise particular songs, acts, genres or periods. The first release using this calculation was released as version 2.0 in October 2011. Within days of publication user feedback had pointed out some minor issues and this caused us to add some of the refinements outlined above before delivering version 2.1 in November 2011. Since November 2011 we have continued to look for further unwanted results this has caused us to create version 2.2 (which reduced the impact of the uncertain years before 1930 and (in Jan 2014) version 2.3 that readjusted the song year weights to reflect new charts added in the last year.
As more chart data has been added the factors (and algortithms) have been tweaked to remove unwanted effects. The results from the Jan 2014 adjustment (creating version 2.3 of the data) are shown above. The overall end positions are broadly compatible with the base 2.0 results above.
As with all the calculations described on the site you can decide to try a different approach, the available CSV File gives you the data you need. If your algorithm illustrates something interesting we would like to hear about it.