One question we often recieve asks why we have incorporated a number of charts that are
not "notable". That is charts which were generated by individuals or groups that
are not as objective as, for example, the RIAA.
Of course the first thing to note is that the official music industry groups probably have
the most incentive of anyone to adjust the figures, and, at least up to the 1970s,
there is evidence that the "official" charts were quite heavily manipulated.

Another point is that, as the reader, it is up to you to decide which chart entries
you want to take note of. Each entry on the site has a complete list of its
song chart entries or
album chart entries. You can decide which ones are
important to you and which to ignore. When checking on individual achievements, for
example the peak position that a particular song reached in a given chart, all
the data you need is presented.

However many people are more interested in an overall "score" rather than an
individual chart entry. For example asking which female solo artist had the most
success worldwide during the 1990s.

The fact of the matter is that a completely accurate and objective measure of
overall worldwide musical success is not available from anyone for most of the
period this site covers. Even as recently as the 1980s
sales figures
are particularly prone to exageration. Anyone who claims to know, for example, the real number
of copies of "Thriller" that were sold is either delusional, or more likely lying
to you (and that comment is intended to include our estimates of sales).

James Surowiecki pointed out in his book "The Wisdom of Crowds",
that it is possible to combine a collection of rough estimates in order to come up with
a collective view that is more accurate than any of the individual biased contributors.
We have searched for lists that indicate relative success, and consolidated them
to provide a higher quality, more objective overall result.
For this to work we have to ensure that the various obvious
types of systematic bias are reduced as
much as possible:

Of course in a perfect world we would only utilise well audited charts from
the appropriate authorities for each region and period, but these are just not available.
So we do the best we can incorporating as wide a range of inputs as possible and
combine them using a transparent and
validated approach. We also test the results to attempt to detect any systematic bias.
Finally we listing all the inputs for each entry so the reader
can select to ignore any of our inputs.

We believe that the final result represents the best current estimate of historical
success of the songs, albums and artists listed. They are not perfect, but they
are much better than most and, more importantly, easily accessible.

As with all the complex calculations described on the site you can decide to try a
different approach. If you can
think of a way to improve our results
we would be interested to hear about it.

We claim that having a large collection of questionable data provides
a better estimate of reality than relying on a small sampling of "notable"
charts. This view was explained by James Surowiecki in his book
"The Wisdom of Crowds". A lot of questionable claims have been made for
this process, this section is intended to explain how we are exploiting this
effect, and to explore its limitations.

The first person to express the idea was Galton, he noticed that the entries
for a village "guess the weight of the bullock" competition varied
wildly, but the average of all the guesses was more accurate than any individual.
The process works particularly well for situations where all participants
are estimating a numerical value (or a ranking, or anything else that can be
expressed along a scale). It does not apply where discrete values are being
sought, for example when writing an encyclopaedia entry the consensus is not
always more accurate than the best contributor.

So in our case we are looking at a collection of music charts that rank
different entries. Clearly this is a scale, rather than discrete values,
even certifications like Gold and Platinum can be converted into numeric
values (the number of claimed sales). How accurate should we expect a
collection of say 10 or 20 non-notable charts as against 2 or 3 notable
ones? The only way to test this is to set up a random
trial.

Assume we have two songs, A and B, and that A is better than B by some
amount. Each chart can be thought of as a process that takes the real
quality score, adds on a random component and then ranks the songs
according to this total. Let's start by normalising the random component of the score,
for testing we'll have these values randomly distributed with an average
of 0.0 and a standard distribution of 1.0, using the Normal distribution.
Suppose we have N charts, each will assign an ordering to the two songs,
how often will the "wrong" song win in the majority of charts?
In other words what is the probability that more charts, by chance, claim that
song B has a higher ranking. This obviously depends on how much better song A
is and how many charts we use.

The plot above shows the probability of this type of error when the scenario
is run a few 100,000 times. Obviously if the two songs have a tiny quality difference
in comparison to the chart variation the probability of a incorrect estimate is 50%.
For the best quality charts lets assume that the standard deviation of the chart
differences is the same size as the song quality differences, that gives a score
difference of 1.0 and a probability of a single chart getting the ordering of a
pair of songs "wrong" as 24%. A low quality chart will have larger
errors, lets say the error is twice the size, that makes the score difference
0.5 and having 7 such charts gives an error probability of 22%. This ratio
seems to hold for all the values of the quality difference, whether a chart
introduces random variation of 2 times or 0.5 times the quality score a set of
7 charts with twice the level of randomness will beat its performance.

In other words having 7 charts with double the error rates is better than
having a single high quality chart. Is the random variability of "notable"
charts half that of the "non-notable" ones? Our experience is that
most "non-notable" charts are not much more variable than
the official ones, and of course where a chart is clearly biased (like the
VHS ones) we ignore them anyway. Of course we want to use charts of the highest
quality we can get, but for most regions and periods they just aren't
available. Dismissing charts because they don't come from an
"industry approved organisation" is unjustifiable, a collection of
poorer quality charts, properly handled, can
deliver more accurate results than a single high quality one.