Save Page: Save to del.icio.us Digg It Save to Google Save to StumbleUpon Save to Reddit Save to Yahoo! My Web 2.0
Save to FURL Save to ma.gnolia Save to Shadows Save To Slashdot Bookmarks Save to Spurl Save to DZone Save to Netvouz Save to BlinkList Save to Newsvine Save to Simpy

How the site is generated?

Each of the 87 source charts is held separately, the charts must always have entries for "artist" and "title" usually have entries for "position" and "date" and may also contain extra information such as "duration", "written by", "web page" or "film".

The 270,032 entries in these charts are consolidated, to provide a complete set of attributes for each of the 143,418 items (112,554 songs and 30,864 albums). The most difficult aspect of this task is matching names, they are often misspelt in the source charts, punctuation is usually inconsistent and the list of "featured" artists is always in a different order. Programming a system to recognise that "Uncle Albert" by "Paul McCartney" and "Admiral Halsey" by "Wings" are actually the same song is not trivial.

In general the approach that has been taken is to consolidate entries if they appear similar, having too many false connections is usually better than missing out on them. For this reason there are quite a few places where strict accuracy has been sacrificed to bring things together, for example all of Prince's songs are listed under the name Prince.

The next step is to generate a consolidated score from the chart entry information. There are a variety of possible ways this could have been done, in this case it was decided that the simplest approach was to generate scores from each entry and sum them. The individual entry scores clearly should depend on the position within the source chart, a number one song getting more "points" than the second placed one ad so on.

One reasonable way to generate a score is to use a power law. The score is set to XXX+YYYposition and then each chart is weighed by a factor that takes into account which chart was the source. The weighting can emphasise charts that are solidly based on sales, attempt to match music revenue in each country or follow any number of other reasonable strategies. Using a set of apparently reasonable variations of the weighting, the XXX and YYY values generates the following top 10 lists.

  XXX=0.5 MaxY=0.9 XXX=1.0 MaxY=0.2 XXX=0.5 MaxY=0.01 XXX=0.0 MaxY=0.8
# Subjective charts 20% Equal weights Normalised Totals Subjective charts 20%
1 Roy Orbison - Oh, Pretty Woman Bee Gees - Stayin' Alive Daniel Boone - Beautiful Sunday Celine Dion - My Heart Will Go On
2 Abba - Dancing Queen John Lennon - Imagine Elton John - Candle in the Wind '97 The Beatles - Hey Jude
3 Pink Floyd - Another Brick in the Wall (part 2) Roy Orbison - Oh, Pretty Woman Mariah Carey - All I Want For Christmas is You USA For Africa - We Are the World
4 The Beatles - Hey Jude Abba - Dancing Queen Simon & Garfunkel - The Sounds of Silence Frank Sinatra - Strangers in the Night
5 The Beatles - Let it Be The Beatles - Hey Jude Whitney Houston - I Will Always Love You The Monkees - I'm a Believer
6 Michael Jackson - Billie Jean The Rolling Stones - (I Can't Get No Satisfaction) Celine Dion - To Love You More Bryan Adams - (Everything I Do I Do it For You)
7 Procol Harum - A Whiter Shade of Pale The Beatles - Let it Be Irene Cara - Flashdance... What a Feeling Whitney Houston - I Will Always Love You
8 Bryan Adams - (Everything I Do I Do it For You) Michael Jackson - Billie Jean The Beatles - Let it Be Stevie Wonder - I Just Called to Say I Love You
9 Celine Dion - My Heart Will Go On Queen - Bohemian Rhapsody Jerry Wallace - The Lovers of the World Nancy Sinatra - These Boots Are Made For Walking
10 The Beatles - Help! Nirvana - Smells Like Teen Spirit The Nolans - I'm in the Mood For Dancing Spice Girls - Wannabe

As the wide range of different results shows the parameter values have a big influence on the resulting chart. Using this approach suitable parameters have to be picked to generates a "medium" chart.

  XXX=0.5 MaxY=0.01
# Subjective charts 20%
1 The Beatles - Hey Jude
2 The Rolling Stones - (I Can't Get No Satisfaction)
3 John Lennon - Imagine
4 Bee Gees - Stayin' Alive
5 Queen - Bohemian Rhapsody
6 Abba - Dancing Queen
7 Michael Jackson - Billie Jean
8 Nirvana - Smells Like Teen Spirit
9 The Beatles - Let it Be
10 Roy Orbison - Oh, Pretty Woman

Some reviewers of music charts have claimed that Zipf's distribution is a better fit to music charts than a power law, that is the second placed song has half the sales, the third has a third etc. This suggests a different, simpler scoring algorithm, if each entry's score is 1+1/position then each song gets credit for having an entry and the number one song gets most points. The simplest chart weighting is to give equal values to every chart. When these simple parameters are used the result is towards the middle of the range demonstrated above. This is the scoring algorithm that has been used here.

Song Years

Working out which year to assign an entry to is also surprisingly hard. The year of each song is deduced directly from the chart entries, rather than relying on some kind of unreliable external source. The year is extracted from the date in all the song's chart entries and the song's year is set to the median of these values. This usually generates a reasonable estimate of the year.

Once the individual song scores have been calculated they are processed to generate the various web pages and the links between them. These are all static pages to reduce both the load on the underpowered web server and the security risks.

Back to Introduction